OpenTox Virtual Conference 2021 Session 15
Building adverse outcome pathways with natural language processing
1Marie Corradi, MSc; 2Eefje S. Poppelaars, PhD; 1Jan-Willem Lankhaar, PhD.; 1Marc A.T. Teunis, PhD*
1) University of Applied Sciences, Utrecht; Innovative Testing Group, Utrecht, The Netherlands 2) Vivaltes B.V., Utrecht, The Netherlands
*Presenting author/Corresponding author
Extracting information from unstructured text sources in a structured fashion is a challenging task. Due to the rapidly increasing volumes of data, previously used paradigms such as systematic reviews are becoming impractical and unfeasible. The advancement of machine learning techniques provides alternative solutions to this challenge (Hahn & Oleynik, 2020). Particularly, the use of neural network topology in models for Natural Language Processing (NLP) is becoming increasingly popular and is providing faster ways to gain more insights from unstructured data sources.
Qualitative and quantitative information about effects (phenotypes), chemical properties, molecular pathways, key-initiating events, in vitro assays, and dosages are contained within a large volume of unstructured sources such as scientific literature, in-house testing reports, and chemical and toxicological databases such as those of ECHA and ToxBank. To extract and organize relevant information from these sources, it could be valuable to employ deep learning NLP techniques. This approach has already shown its value in some areas of toxicology, such as nanotoxicology (Lewinski & Mcinnes, 2015), drug-induced liver injury (Choi et al., 2019), and more generally in developing innovative testing systems for hazard classification (Oki et al., 2016). To support the data extraction for the ONTOX project we propose the following approach:
Tier 1: Named Entity Recognition (NER)
In our previous research (Corradi et al., in prep), we developed a named entity recognition model using deep learning. This model can recognize a number of entities relevant for the toxicological context (Efroni et al., 2020), such as chemical, phenotype, and organism.
Tier 2: Semantic relationships (Dependencies)
We need to be able to extract entities within their ‘language’ context. This will allow us to add qualitative (biological) and quantitative meaning to the entities. Building semantic networks and relational graph visualizations will help provide context to chemical-organism interactions.
Tier 3: Connecting to phenotypic and molecular ontologies and underlying pathways
In tier 3, we aim to link recognized entities to existing ontologies, such as a phenotype entity to its identifier within the Mammalian Phenotype Ontology. This linkage between the ‘unstructured’ biological context of an entity and its structured ontological counterpart opens the possibility to go from entities to molecular pathways and processes. When we have translated the entities to molecular and physiological ontology terms, we can map the underlying mechanisms that lead to toxicological effects. From there we could build (quantitative) adverse outcome pathways.
This work is carried out within the ONTOX Horizon 2020 Project. See: https://ontox-project.eu/ and https://cordis.europa.eu/project/id/963845
Choi, Y. H., Han, C. Y., Kim, K. S., & Kim, S. G. (2019). Future Directions of Pharmacovigilance Studies Using Electronic Medical Recording and Human Genetic Databases. Toxicological Research, 35(4), 319– 330. https://doi.org/10.5487/TR.2019.35.4.319
Efroni, S., Song, M., Labatut, V., Emmert-Streib, F., Perera, N., & Dehmer, M. (2020). Named Entity Recognition and Relation Detection for Biomedical Information Extraction. Frontiers in Cell and Developmental Biology | Www.Frontiersin.Org, 1, 673. https://doi.org/10.3389/fcell.2020.00673
Hahn, U., & Oleynik, M. (2020). Medical Information Extraction in the Age of Deep Learning. Yearb Med Inform, 2020, 208–228. https://doi.org/10.1055/s-0040-1702001
Lewinski, N. A., & Mcinnes, B. T. (2015). Using natural language processing techniques to inform research on nanotechnology. Beilstein J. Nanotechnol, 6, 1439–1449. https://doi.org/10.3762/bjnano.6.149
Oki, N. O., Nelms, M. D., Bell, S. M., Mortensen, H. M., & Edwards, S. W. (2016). Accelerating Adverse Outcome Pathway Development Using Publicly Available Data Sources.
https://doi.org/10.1007/s40572-016-0079-y