Comprehensive Assessment of Supervised Machine Learning Models for Prediction of Oil Recovery Factor and NPV in Surfactant-Polymer Flooding: Bayesian Optimization and Stacking Ensembles

Document Type : Research Paper

Authors

Department of Petroleum and Geo-Energy Engineering, Amirkabir University of Technology (AUT), Tehran, Iran

10.22078/jpst.2025.5709.1979

Abstract

Surfactant-polymer (SP) flooding is recognized as an effective chemical enhanced oil recovery (EOR) method, where accurate prediction of oil recovery factor (RF) and net present value (NPV) is vital for field development planning and economic analysis. This study systematically evaluates a range of supervised machine learning algorithms—including CatBoost, artificial neural networks (ANN), XGBoost, LightGBM, and gradient boosting regressor (GBR)—for forecasting RF and NPV based on experimental SP flooding data. Baseline model results were established using default hyperparameters, followed by comprehensive two-stage hyperparameter tuning using grid search and Bayesian optimization with Optuna, along with five-fold cross-validation to ensure robustness. CatBoost and ANN consistently achieved the highest predictive accuracy. In addition, ensemble stacking was then performed by combining top-performing models, further enhancing prediction reliability and generalization. Additional post-processing using quantile adjustment (linear residual correction) addressed residual bias and improved calibration between predicted and observed values. Furthermore, model performance was benchmarked using standard statistical metrics and comparative graphical analysis. Also, the results demonstrate that integrating well-established supervised learning methods with systematic optimization, stacking, and output calibration offers a robust and practical framework for accurate prediction of SP flooding outcomes. Moreover, this approach provides valuable support for data-driven decision-making in EOR project design and evaluation. Furthermore, the proposed framework achieved strong predictive accuracy in the all-stacking ensemble with cross-validation, yielding an R² of 0.978 and AAPRE of 2.71 for recovery factor, and an R² of 0.944 and AAPRE of 6.18 for net present value. Ultimately, then applying quantile adjustment to the all-stacking ensemble, the performance remained competitive, with an R² of 0.964 and AAPRE of 3.61 for recovery factor, and an R² of 0.924 and AAPRE of 7.94 for net present value, further demonstrating the robustness of the approach.

Keywords


  1. Al-Dousari, M. M., & Garrouch, A. A. (2013). An artificial neural network model for predicting the recovery performance of surfactant polymer floods. Journal of Petroleum Science and Engineering, 109, 51–62. https://doi.org/10.1016/j.petrol.2013.08.012.##
  2. Zerpa, L. E., Queipo, N. V., Pintos, S., & Salager, J.-L. (2005). An optimization methodology of alkaline–surfactant–polymer flooding processes using field scale numerical simulation and multiple surrogates. Journal of Petroleum Science and Engineering, 47(1-2), 197–208. https://doi.org/10.1016/j.petrol.2005.03.002. ##
  3. Larestani, A., Mousavi, S. P., Hadavimoghaddam, F., Ostadhassan, M., & Hemmati-Sarapardeh, A. (2022). Predicting the surfactant-polymer flooding performance in chemical enhanced oil recovery: Cascade neural network and gradient boosting decision tree. Alexandria Engineering Journal. Advance online publication. https://doi.org/10.1016/j.aej.2022.01.023. ##
  4. Karambeigi, M. S., Zabihi, R., & Hekmat, Z. (2011). Neuro-simulation modeling of chemical flooding. Journal of Petroleum Science and Engineering, 78(2), 208–219. https://doi.org/10.1016/j.petrol.2011.07.012. ##
  5. Kamari, A., Gharagheizi, F., Shokrollahi, A., Arabloo, M., & Mohammadi, A. H. (2016). Integrating a robust model for predicting surfactant–polymer flooding performance. Journal of Petroleum Science and Engineering, 137, 87–96. https://doi.org/10.1016/j.petrol.2015.10.034. ##
  6. Hou, J., Li, Z., Cao, X., & Song, X. (2009). Integrating genetic algorithm and support vector machine for polymer flooding production performance prediction. Journal of Petroleum Science and Engineering, 68(1–2), 29–39. https://doi.org/10.1016/j.petrol.2009.05.017. ##
  7. Dang, C., Nghiem, L., Nguyen, N., Yang, C., Chen, Z., & Bae, W. (2018). Modeling and optimization of alkaline–surfactant–polymer flooding and hybrid enhanced oil recovery processes. Journal of Petroleum Science and Engineering, 169, 578–601. https://doi.org/10.1016/j.petrol.2018.06.017. ##
  8. Prasanphanich, J. (2009). Gas reserves estimation by Monte Carlo simulation and chemical flooding optimization using experimental design and response surface methodology (Master’s thesis). University of Texas at Austin. ##
  9. Yin, Z., Nan, Z., Cao, Z., & Zhang, G. (2021). Evaluating the applicability of a quantile–quantile adjustment approach for downscaling monthly GCM projections to site scale over the Qinghai-Tibet Plateau. Atmosphere, 12(11), 1170. https://doi.org/10.3390/atmos12111170. ##
  10. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: unbiased boosting with categorical features. Advances in Neural Information Processing Systems, 31, 6638–6648. ##
  11. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30, 3146–3154.
  12. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://doi.org/10.1145/2939672.2939785. ##
  13. Pravin, P. S., Tan, J. Z. M., Yap, K. S., & Wu, Z. (2022). Hyperparameter optimization strategies for machine learning-based stochastic energy efficient scheduling in cyber-physical production systems. Digital Chemical Engineering, 4, 100047. https://doi.org/10.1016/j.dche.2022.100047. ##
  14. Kakimoto, Y., Omae, Y., Toyotani, J., & Takahashi, H. (2022). Fast screening framework for infection control scenario identification. Mathematical Biosciences and Engineering, 19(12), 12316–12333. https://doi.org/10.3934/mbe.2022574. ##
  15. Pavlyshenko, B. (2018, August). Using stacking approaches for machine learning models. In 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP) (pp. 255–258). IEEE. https://doi.org/10.1109/DSMP.2018.8478510. ##