research-article
Authors: Yifeng Xiao, Tongxi Wang, Hua Xiang
Volume 189, Issue C
Published: 08 August 2024 Publication History
Metrics
Total Citations0Total Downloads0Last 12 Months0
Last 6 weeks0
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
- View Options
- References
- Media
- Tables
- Share
Abstract
Oil source correlation can be used to identify the origin of crude oil by linking crude oil to source rocks; however, the manual methods, which are limited by the sample or parameter quantity or imbalanced datasets, are facing uncertainties. Although the existing multivariate statistical techniques can alleviate this problem, they are facing difficulties in processing imbalanced datasets and quantifying source beds. Therefore, a novel oil-source correlation analysis model called SVM-SelectKBest combining a support vector machine (SVM) with a feature selection algorithm to mitigate the common issue of dataset imbalance in oil-source correlations is proposed in this paper. The SVM-SelectKBest offers advantages over normal SVM by dynamically selecting the most relevant features and fine-tuning model parameters to achieve higher accuracy and better generalizability in complex datasets. SVM compensates for class imbalances by heavily penalizing the misclassification of the minority class, and SelectKBest streamlines the feature set to enhance SVM's effectiveness on critical variables. Furthermore, a shallow neural network (SensoryAttentionNet) is introduced into the proposed model to address the issue of quantifying the source bed proportions in crude oil. The result show that SVM-SelectKBest has better performance in identifying key geochemical parameters and discriminating oil source correlation, its accuracy in unbalanced datasets is improved by near 40% compared to SVM. The model obtains 25 key geochemical parameters such as C17 n-heptadecane, Pr pristane, and C18 n-octadecane, it achieves F1 scores of 1.0 on the training, validation, and test sets. SensoryAttentionNet also performs robustly, with a low variance of 0.05 between its predicted and actual values. All the results indicate the effectiveness of the proposed method in dealing with the imbalance problem in oil-source source correlation datasets and in determining the proportional contribution of source beds in crude oil.
Highlights
•
A support vector machine model combined with feature selection is presented.
•
The issue of dataset imbalance in oil-source correlations is mitigated.
•
A neural network for quantifying the proportion of source beds is presented.
•
The challenge of quantifying the proportion of source beds is resolved.
•
The Selection of geochemical parameters are optimized by feature selection.
References
[1]
B.E. Boser, I.M. Guyon, V.N. Vapnik, A training algorithm for optimal margin classifiers, in: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Association for Computing Machinery, Pittsburgh, Pennsylvania, USA, 1992, pp. 144–152,.
Digital Library
[2]
D. Bo, L. Jiang, W. Zhao, Y. Jiang, H. Liu, H. Ou, Geochemical characteristics and oil source correlation of minfeng area, dongying depression, China, Geofluids 2021 (2021) 1–11,.
[3]
G. Brauwers, F. Frasincar, A general survey on attention mechanisms in deep learning, IEEE Trans. Knowl. Data Eng. (2021),.
Digital Library
[4]
C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (1995) 273–297,.
[5]
J.H. Christensen, G. Tomasi, A.B. Hansen, Chemical fingerprinting of petroleum biomarkers using time warping and PCA, Environ. Sci. Technol. 39 (2005) 255–260,.
[6]
J.A. Curiale, Oil–source rock correlations – limitations and recommendations, Org. Geochem. 39 (2008) 1150–1161,.
[7]
C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, ACM transactions on intelligent systems and technology (TIST). 2 (2011) 1–27,.
Digital Library
[8]
J.H. Christensen, G. Tomasi, 16 - a multivariate approach to oil hydrocarbon fingerprinting and spill source identification, in: S.A. Stout, Z. Wang (Eds.), Standard Handbook Oil Spill Environmental Forensics, second ed., Academic Press, Boston, 2016, pp. 747–788,.
[9]
X. Cheng, Z. Mao, R. Mao, Z. Li, Q. Guan, X. Chen, Families of reservoired crude oils from the cangdong sag, bohai bay basin, China, Org. Geochem. 122 (2018) 115–125,.
[10]
H. Drucker, C.J.C. Burges, L. Kaufman, A. Smola, V. Vapnik, Support vector regression machines, Adv. Neural Inf. Process. Syst. 9 (1996).
[11]
M.M. El Nady, N.M. Lotfy, D.A. Mousa, Multivariate statistical analysis for monitoring the hydrocarbon potentiality of the source rocks in the North Western Desert, Egypt, Petrol. Sci. Technol. 34 (2016) 1496–1502,.
[12]
W.S. El Diasty, S.Y. El Beialy, R.M. El Attar, A. Khairy, K.E. Peters, D.J. Batten, Oil-source correlation in the west esh El mellaha, southwestern margin of the gulf of suez rift, Egypt, J. Petrol. Sci. Eng. 180 (2019) 844–860,.
[13]
W.S. El Diasty, S.Y. El Beialy, A.R. Mostafa, A.A. Abo Ghonaim, K.E. Peters, Chemometric differentiation of oil families and their potential source rocks in the Gulf of Suez, Nat. Resour. Res. 29 (2020) 2063–2102,.
[14]
C.-W. Hsu, C.-J. Lin, A comparison of methods for multiclass support vector machines, IEEE Trans. Neural Network. 13 (2002) 415–425,.
Digital Library
[15]
G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, G. Bing, Learning from class-imbalanced data: review of methods and applications, Expert Syst. Appl. 73 (2017) 220–239,.
Digital Library
[16]
Y. Li, T. Li, H. Liu, Recent advances in feature selection and its applications, Knowl. Inf. Syst. 53 (2017) 551–577,.
Digital Library
[17]
B. Li, D. He, M. Li, L. Chen, K. Yan, Y. Tang, Biomarkers and carbon isotope of monomer hydrocarbon in application for oil–source correlation and migration in the moxizhuang–yongjin block, junggar basin, NW China, ACS Omega 7 (2022) 47317–47329,.
[18]
Z.S. Mashhadi, A.R. Rabbani, Organic geochemistry of crude oils and Cretaceous source rocks in the Iranian sector of the Persian Gulf: an oil–oil and oil–source rock correlation study, Int. J. Coal Geol. 146 (2015) 118–144,.
[19]
A.P. Murray, K.E. Peters, Quantifying multiple source rock contributions to petroleum fluids: bias in using compound ratios and neglecting the gas fraction, AAPG (Am. Assoc. Pet. Geol.) Bull. 105 (2021) 1661–1678,.
[20]
D.A. Otchere, T.O.A. Ganat, J.O. Ojero, B.N. Tackie-Otoo, M.Y. Taki, Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions, J. Petrol. Sci. Eng. 208 (2022),.
[21]
A.H. Siddiqi, Wavelets in oil industry, AIP Conf. Proc. 1463 (2012) 52–102,.
[22]
B. Shi, X. Chang, Y. Xu, Y. Wang, L. Mao, Y. Wang, Origin and migration pathway of biodegraded oils pooled in multiple-reservoirs of the Chepaizi Uplift, Junggar Basin, NW China: insights from geochemical characterization and chemometrics methods, Mar. Petrol. Geol. 122 (2020),.
[23]
M. Safaei-Farouji, M.R. Kamali, H. Rahimpour-Bonab, T. Gentzis, B. Liu, M. Ostadhassan, Organic geochemistry, oil-source rock, and oil-oil correlation study in a major oilfield in the Middle East, J. Petrol. Sci. Eng. 207 (2021),.
[24]
V.W. Samuel, C. Bogdan, M. James, W. Samuel, AnotherSamWilson/miceforest: Release for Zenodo, Zenodo, 2022,.
[25]
A.A. Taha, A. Hanbury, Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool, BMC Med. Imag. 15 (2015) 29,.
[26]
P. Thanh Noi, M. Kappas, Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using Sentinel-2 imagery, Sensors 18 (2017) 18,.
[27]
V.N. Vapnik, The Nature of Statistical Learning Theory, Springer science & business media, 1999.
Digital Library
[28]
C.C. Walters, J.M. Moldowan, The Biomarker Guide: Biomarkers and Isostopes in the Environment and Human History, Cambridge University Press, 2005.
[29]
Y.-P. Wang, F. Zhang, Y.-R. Zou, Z.-W. Zhan, P. Peng, Chemometrics reveals oil sources in the fangzheng fault depression, NE China, Org. Geochem. 102 (2016) 1–13,.
[30]
Y.-P. Wang, X. Zhan, T. Luo, Y. Gao, J. Xia, S. Wang, Y.-R. Zou, Oil chemometrics and geochemical correlation in the weixinan sag, beibuwan basin, south China sea, Energy Explor. Exploit. 38 (2020) 2695–2710,.
[31]
S. Yin, X. Lin, Y. Huang, Z. Zhang, X. Li, Application of improved support vector machine in geochemical lithology identification, Earth Science Informatics 16 (2023) 205–220,.
[32]
L. Zhang, G. Bai, Y. Zhao, Data-processing and recognition of seepage and microseepage anomalies of acid-extractable hydrocarbons in the south slope of the Dongying depression, eastern China, Mar. Petrol. Geol. 57 (2014) 385–402,.
[33]
W. Zhang, S. Zhu, S. He, Y. Wang, Screening of oil sources by using comprehensive two-dimensional gas chromatography/time-of-flight mass spectrometry and multivariate statistical analysis, J. Chromatogr. A 1380 (2015) 162–170,.
[34]
Z.-W. Zhan, Y.-R. Zou, J.-T. Shi, J.-N. Sun, P. Peng, Unmixing of mixed oil using chemometrics, Org. Geochem. 92 (2016) 1–15,.
[35]
L. Zhang, G. Bai, X. Zhao, L. Zhou, S. Zhou, W. Jiang, Z. Wang, Oil-source correlation in the slope of the qikou depression in the bohai bay basin with discriminant analysis, Mar. Petrol. Geol. 109 (2019) 641–657,.
[36]
S. Zhou, G. Bai, L. Zhang, W. Yang, X. Zhao, F. Jin, Q. Wang, Y. Shi, Z. Li, Z. Wang, Y. Wang, Identifying oil sources in the wen'an slope of the baxian depression, the bohai bay basin, north China, Mar. Petrol. Geol. 128 (2021),.
Recommendations
- Treatment of Alkaline Wastewater from Oil Refinery Using Circulating Biological Aerated Filter
CDCIEM '11: Proceedings of the 2011 International Conference on Computer Distributed Control and Intelligent Environmental Monitoring
The alkaline wastewater from oil refinery is a high concentration alkaline wastewater, which contains a great deal of sulfide phenol and oil pollutants. A novel two stage circulating biological aerated filter (CBAF) was used to treat the alkaline ...
Read More
- Support Vector Machines: Data Analysis, Machine Learning and Applications
Read More
- Wavelet twin support vector machines based on glowworm swarm optimization
Twin support vector machine is a machine learning algorithm developing from standard support vector machine. The performance of twin support vector machine is always better than support vector machine on datasets that have cross regions. Recently ...
Read More
Comments
Information & Contributors
Information
Published In
Computers & Geosciences Volume 189, Issue C
Jul 2024
108 pages
ISSN:0098-3004
Issue’s Table of Contents
Elsevier Ltd.
Publisher
Pergamon Press, Inc.
United States
Publication History
Published: 08 August 2024
Author Tags
- Oil-source correlation
- Feature selection
- Dataset imbalance
- Proportion of source beds
Qualifiers
- Research-article
Contributors
Other Metrics
View Article Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
Total Citations
Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 03 Aug 2024
Other Metrics
View Author Metrics
Citations
View Options
View options
Get Access
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in
Full Access
Get this Publication
Media
Figures
Other
Tables