Reliable predictions of oil formation volume factor based on transparent and auditable machine learning approaches

David A. Wood, Abouzar Choubineh

Abstract view|299|times       PDF download|66|times Supplementary file download|36|times


Neural-network, machine-learning algorithms are effective prediction tools but can behave as black boxes in many applications by not easily providing the exact calculations and relationships among the underlying input variables (which may or may not be independent of each other) involved each of their predictions. The transparent open box (TOB) learning network algorithm overcomes this limitation by providing the exact calculations involved in all its predictions and achieving acceptable and auditable levels of prediction accuracy. The TOB network, based on an optimized data-matching algorithm, can be applied in spreadsheet or fully-coded configurations. This algorithm offers significant benefits to analysis and prediction of many complex and difficult to measure non-linear systems. To demonstrate its prediction performance, the algorithm is applied to the prediction of crude oil formation volume factor at bubble point (Bob) using published datasets of 166, 203 and 237 data records involving 4 variables (reservoir temperature, gas-oil ratio, oil gravity and gas specific gravity). Two of these datasets display uneven and irregular data coverage. The TOB network demonstrates high prediction accuracy for Bob (Root Mean Square Error (RMSE) ~ 0.03; R2 > 0.95) for the more evenly distributed dataset. The performance of the TOB readily reveals the risk of overfitting such datasets. With its high levels of transparency and inhibitions to being overfitted, the TOB learning network offers an insightful approach to machine learning applied to predicting complex non-linear systems. Its results complement and benchmark the prediction contributions of neural networks and empirical correlations. In doing so it provides further insight to the underlying data.

Cited as: Wood, D.A., Choubineh, A. Reliable predictions of oil formation volume factor based on transparent and auditable machine learning approaches. Advances in Geo-Energy Research, 2019, 3(3): 225-241, doi: 10.26804/ager.2019.03.01.


Machine learning transparency; non-correlation-based machine learning; oil formation volume factor prediction; sparse data impacts; avoidance of overfitting

Full Text:

PDF Supplementary file


Al-Marhoun, M.A. New correlation for formation volume factor of oil and gas mixtures. J. Can. Petrol. Technol. 1992, 31(3): 22-26.

Al-Marhoun, M.A. PVT correlations for Middle East crude oils. J. Petrol. Tech. 1998, 40(05): 650-666.

Arabloo, M., Amooie, M.A., Hemmati-Sarapardeh, A., et al. Application of constrained mult i-variable search methods for prediction of PVT properties of crude oil systems. Fluid Phase Equilibr. 2014, 363: 121-130.

Atkeson, C.G., Moore, A.W., Schaal. S. Locally weighted learning for control. Dordrecht, Netherlands: Springer, 1997.

Auret, L., Aldrich, C. Interpretation of nonlinear relationships between process variables by use of random forests. Miner. Eng. 2012, 35: 27-42.

Birattari, M., Bontempi, G., Bersini, H. Lazy learning meets the recursive least squares algorithm. Advances in Neural Information Processing Systems 1999, 12: 375-381.

Bishop, C.M. Neural networks for pattern recognition, 2nd ed. UK: Oxford University Press, 1995.

Bontempi, G., Birattari, M., Bersini, H. Lazy learning for local modelling and control design. Int. J. Control. 1999, 72(7-8): 643-658.

Broomhead, D.H., Lowe, D. Radial basis functions, multi-variable functional interpolation and adaptive networks. Royal Signals and Radar Establishment Malvern (United Kingdom) 1988, 25(3): 1-8.

Carbone, R., Armstrong, J.S. Evaluation of extrapolative forecasting methods: results of a survey of academicians and practitioners. J. Forecast. 2010, 1(2): 215-217.

Chen, G.H., Shah, D. Explaining the success of nearest neighbor methods in prediction. Found. Tren. Mach. Learn. 2018, 10(5-6): 337-588.

Choubineh, A., Ghorbani, H., Wood, D.A., et al. Improved predictions of wellhead choke liquid critical-flow rates: modelling based on hybrid neural network training learning based optimization. Fuel 2017, 207: 547-560.

Dokla, M., Osman, M. Correlation of PVT properties for UAE crudes (includes associated papers 26135 and 26316). SPE Form. Eval. 1992, 7(1): 41-46.

Dutta, S., Gupta, J.P. PVT correlations for Indian crude using artificial neural networks. J. Petrol Sci. Eng. 2010, 72(1-2): 93-109.

El-Hoshoudy, A.N., Desouky, S.M. Numerical prediction of oil formation volume factor at bubble point for black and volatile oil reservoirs using non-linear regression models. Petroleum and Petrochemical Engineering Journal 2018, 2(2): 000145.

Elkatatny, S., Mahmoud, M. Development of new correlations for the oil formation volume factor in oil reservoirs using artificial intelligent white box technique. Petrol. 2018, 4(2): 178-186.

Espinoza, M., Suykens, J.A.K., Moor, B.D. Least squares support vector machines and primal space estimation. IEEE Cat. No. 03CH37475 in IEEE 42nd Conference on Decision and Control, Maui, USA, 9-12 December, 2003.

Fattah, K.A., Lashin, A. Improved oil formation volume factor (Bo) correlation for volatile oil reservoirs: An integrated non-linear regression and genetic programming approach. J. King Saud. Uni. -Eng. Sci. 2018, 30(4): 398-404.

Frontline Solvers. Standard excel solver - limitations of nonlinear optimization. excel-solver-limitations-nonlinear-optimization, 2018.

Garcia, S., Derrac, J., Cano, J., et al. Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE. T. Pattern Anal. 2012, 34(3): 417-435.

Gharbi, R.B., Elsharkawy, A.M. Neural network model for estimating the PVT properties of Middle East crude oils. Spe Reserv. Eval. Eng. 1999, 2(03): 255-265.

Glaso, O. Generalized pressure-volume-temperature correlations. J. Petrol. Tech. 1980, 32(05): 785-795.

Haykin, S. Neural networks: a comprehensive introduction, 3rd edition. New York, USA: Pearson / Prentice Hall, 1999.

Heinert, M. Artificial neural network show to open the black boxes. App. Art. Intell. Eng. Geo. 2008, 5: 42-62.

Hyndman, R.J., Koehler, A.B. Another look at measures of forecast accuracy. International Journal of Forecasting 2006, 22 (4): 679-688.

Irene, A.I., Sunday, I.S. Forecasting oil formation volume factor for API gravity ranges using artificial neural network. Adv. Petrol. Explor. Develop. 2013, 5(1): 14-21.

Jang, J.S.R. ANFIS: adaptive-network-based fuzzy inference system. IEEE. T. Syst. Man Cy-s. 1993, 23(3): 665-685.

Jang, J.S.R., Sun, C.T., Mizutani, E. Neuro-fuzzy and soft computing-a computational approach to learning and machine intelligence. IEEE. T. Automat. Contr. 1997, 42(10): 1482-1484.

Jarrahian, A., Moghadasi, J., Heidaryan, E. Empirical estimating of black oils bubblepoint (saturation) pressure. J. Petrol. Sci. Eng. 2015, 126: 69-77.

Karimnezhad, M., Heidarian, M., Kamari, M., et al. A new empirical correlation for estimating bubble point oil formation volume factor. J. Petrol. Sci. Eng. 2014, 18: 329-335.

Katz, D.L. Prediction of shrinkage of crude oils. Paper API 42-137 Presented at the American Petroleum Institute, New York, USA, 1 January, 1942.

Lehmann, E.L., Casella, G. Theory of point estimation (2nd ed.). New York, USA: Springer, 1998.

Lever, J., Krywinski, M., Altman, N. Points of significance: model selection and overfitting. Nat. Methods 2016, 13: 703-704.

Liang, P., Bose, N.K. Neural network fundamentals with graphs, algorithms, and applications. New York, McGraw-Hill, 1996.

Mahmood, M.A., Al-Marhoun, M.A. Evaluation of empirically derived PVT properties for Pakistani crude oils. J. Petrol. Sci. Eng. 1996, 16(4): 275-290.

Makridakis, S. Accuracy measures: theoretical and practical concerns. Inter. J. Forec. 1993, 9(4): 527-529.

Moghadam, J.N., Salahshoor, K., Kharrat, R. Introducing a new method for predicting PVT properties of Iranian crude oils by applying artificial neural networks. Petrol. Sci. Technol. 2011, 29(10): 1066-1079.

Mood, A., Graybill, F., Boes, D. Introduction to the Theory of Statistics (3rd ed.). New York, McGraw-Hill, 1974.

Oloso, M.A., Hassan, M.G., Bader-El-Den, M.B., et al. Hybrid functional networks for oil reservoir PVT characterisation. Expert Syst. Appl. 2017, 87: 363-369.

Omar, M.I., Todd, A.C. Development of new modified black oil correlations for Malaysian crudes. Paper SPE25338MS Presented at the SPE Asia Pacific Oil and Gas Conference, Singapore, 8-10 February, 1993.

Pearson, K. On the dissection of asymmetrical frequency curves. Phil. Trans. Roy. Soc. A 1894, 185: 71-110.

Petrosky Jr, G.E., Farshad, F. Pressure-volume-temperature correlations for Gulf of Mexico crude oils. Paper SPE26644MS Presented at the SPE Annual Technical Conference and Exhibition, Houston, Texas, 3-6 October, 1993.

Rammay, M.H., Abdulraheem, A. PVT correlations for Pakistani crude oils using artificial neural network. J. Petrol. Explor. Prod. Tech. 2017, 7(1): 217-233.

Rao, R.V., Savsani, V.J., Vakharia, D.P. Teachinglearning based optimization: an optimization method for continuous non-linear large-scale problems. Inform. Sciences 2012, 183(1): 1-15.

Samworth, R.J. Optimal weighted nearest neighbour classifiers. The Ann. Stat. 2012, 40(5): 2733-2763.

Schmidhuber, J. Deep learning in neural networks: An overview. Neural Networks 2015, 61: 85-117.

Shakhnarovich, G., Darrell, T., Indyk, P. Nearest-neighbor methods in learning and vision: theory and practice (neural information processing). The MIT Press, 2006.

Standing, M.B. A pressurevolumetemperature correlation for mixtures of California oils and gases. Paper API47275 Presented at the API Drilling and Production Practice, New York, 1 January, 1947.

Vapnik, V. Statistical learning theory. New York, USA: Wiley, 1998.

Varotsis, N., Gaganis, V., Nighswander, J., et al. A novel non-iterative method for the prediction of the PVT behavior of reservoir fluids. Paper SPE56745MS Presented at the SPE Annual Technical Conference and Exhibition, Houston, Texas, 3-6 October, 1999.

Vazquez, M., Beggs, H.D. Correlations for fluid physical property prediction. Paper SPE6719MS Presented at the SPE Annual Fall Technical Conference and Exhibition, Denver, Colorado, 9-12 October, 1977.

Wood, D.A. A transparent open-box learning network provides insight to complex systems and a performance benchmark for more-opaque machine learning algorithms. Adv. Geo-Energ. Res. 2018a, 2(2): 148-162.

Wood, D.A. Transparent open-box learning network provides auditable predictions for coal gross calorific value. Mod. Earth Sys. Env. 2018b, 1-25.

Wood, D.A., Choubineh, A., Vaferi, B. Transparent open-box learning network provides auditable predictions: pool boiling heat transfer coefficient for alumina-water-based nanofluids. J. Therm. Anal. Calorim. 2019, 136(3): 1395-1414.

Wright, S. Correlation and causation. J. Agri. Res. 1921, 20: 557-585.


  • There are currently no refbacks.

Copyright (c) 2019 The Author(s)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Copyright ©2018. All Rights Reserved