DATA MINING IN ORGANIC GEOCHEMISTRY: CASE STUDY IN POTIGUAR BASIN

Mineração de dados na Geoquímica Orgânica: estudo de caso na Bacia Potiguar

Autores/as

  • Sarah BARRÓN TORRES Pontifical Catholic University (PUC-Rio)
  • Ítalo de Oliveira MATIAS Pontifical Catholic University (PUC-Rio)
  • Francisco Fábio de Araújo PONTE Pontifical Catholic University (PUC-Rio)
  • Erica Tavares de MORAIS Petrobras Research and Development Center (CENPES)
  • Ygor dos Santos ROCHA Petrobras Research and Development Center (CENPES).
  • Mario Duncan RANGEL Petrobras Research and Development Center (CENPES).
  • Fabiano Galdino LEAL Petrobras Research and Development Center (CENPES).

DOI:

https://doi.org/10.5016/geociencias.v41i1.16161

Resumen

The amount of data from geochemical analysis using samples collected in oil wells grows simultaneously to the investment in the exploration and production sector. On the other hand, the treatment and interpretation of these results are still very dependent on experts and demand time. With the generation of extensive databases, data mining presents itself as a good alternative to explore them through statistical methods and computational algorithms, providing technological differential and agility to the system. In an experimental way, with data from 200 oils from the Potiguar Basin, these tools were implemented, with the consequent suggestion of a workflow that would, in the end, return a reasonable accuracy in predicting their genetic classification. Using multidimensional scaling (MDS) and clustering (dendrogram and k-means types) from 60 initial attributes, the optimal set was reduced to 26. Applying Machine Learning, 92.50% of median accuracy were obtained in the Decision Tree algorithm, 95.00% in Random Forest and 87.50% in Artificial Neural Network. Comparing to an analysis previously presented at the pertinent literature, the benefits in terms of efficiency can be realized with the adoption of the methodology herein proposed.

 

Keywords: Organic geochemistry; Data Mining; Multivariate Statistics; Workflow.

Biografía del autor/a

Sarah BARRÓN TORRES, Pontifical Catholic University (PUC-Rio)

Pontifical Catholic University (PUC-Rio), Informatics Division, Software Engineering Laboratory (LES). Rua Marquês de São Vicente, 225 - Gávea, Rio de Janeiro - RJ, Brazil.

Ítalo de Oliveira MATIAS, Pontifical Catholic University (PUC-Rio)

Pontifical Catholic University (PUC-Rio), Informatics Division, Software Engineering Laboratory (LES). Rua Marquês de São Vicente, 225 - Gávea, Rio de Janeiro - RJ, Brazil.

Francisco Fábio de Araújo PONTE, Pontifical Catholic University (PUC-Rio)

Pontifical Catholic University (PUC-Rio), Informatics Division, Software Engineering Laboratory (LES). Rua Marquês de São Vicente, 225 - Gávea, Rio de Janeiro - RJ, Brazil.

Erica Tavares de MORAIS, Petrobras Research and Development Center (CENPES)

Petrobras Research and Development Center (CENPES). Avenida Horácio Macedo, 950 – Cidade Universitária, Ilha do Fundão, Rio de Janeiro – RJ, Brazil.

Ygor dos Santos ROCHA, Petrobras Research and Development Center (CENPES).

Petrobras Research and Development Center (CENPES). Avenida Horácio Macedo, 950 – Cidade Universitária, Ilha do Fundão, Rio de Janeiro – RJ, Brazil. 

Mario Duncan RANGEL, Petrobras Research and Development Center (CENPES).

Petrobras Research and Development Center (CENPES). Avenida Horácio Macedo, 950 – Cidade Universitária, Ilha do Fundão, Rio de Janeiro – RJ, Brazil. E-mails: mduncan@petrobras.com.br

Fabiano Galdino LEAL, Petrobras Research and Development Center (CENPES).

Petrobras Research and Development Center (CENPES). Avenida Horácio Macedo, 950 – Cidade Universitária, Ilha do Fundão, Rio de Janeiro – RJ, Brazil. E-mail: fabianoleal@petrobras.com.br

Descargas

Publicado

2022-05-25

Número

Sección

Artigos