Skip to main content
Contact Info
Marc Teunis
University of Life Sciences Utrecht

OpenTox Virtual Conference 2021 Session 15

Building adverse outcome pathways with natural  language processing 

1Marie Corradi, MSc; 2Eefje S. Poppelaars, PhD; 1Jan-Willem Lankhaar, PhD.; 1Marc A.T. Teunis, PhD* 

1) University of Applied Sciences, Utrecht; Innovative Testing Group, Utrecht, The Netherlands 2) Vivaltes B.V., Utrecht, The Netherlands 

*Presenting author/Corresponding author 

Extracting information from unstructured text sources in a structured fashion is a challenging task.  Due to the rapidly increasing volumes of data, previously used paradigms such as systematic reviews are becoming impractical and unfeasible. The advancement of machine learning techniques provides alternative solutions to this challenge (Hahn & Oleynik, 2020). Particularly, the use of neural network topology in models for Natural Language Processing (NLP) is becoming increasingly popular and is providing faster ways to gain more insights from unstructured data sources.  

Qualitative and quantitative information about effects (phenotypes), chemical properties, molecular pathways, key-initiating events, in vitro assays, and dosages are contained within a large volume of unstructured sources such as scientific literature, in-house testing reports, and chemical and toxicological databases such as those of ECHA and ToxBank. To extract and organize relevant information from these sources, it could be valuable to employ deep learning NLP techniques. This approach has already shown its value in some areas of toxicology, such as nanotoxicology (Lewinski  & Mcinnes, 2015), drug-induced liver injury (Choi et al., 2019), and more generally in developing innovative testing systems for hazard classification (Oki et al., 2016). To support the data extraction  for the ONTOX project we propose the following approach: 

Tier 1: Named Entity Recognition (NER) 

In our previous research (Corradi et al., in prep), we developed a named entity recognition model using deep learning. This model can recognize a number of entities relevant for the toxicological context (Efroni et al., 2020), such as chemical, phenotype, and organism.  

Tier 2: Semantic relationships (Dependencies) 

We need to be able to extract entities within their ‘language’ context. This will allow us to add qualitative (biological) and quantitative meaning to the entities. Building semantic networks and relational graph visualizations will help provide context to chemical-organism interactions.  

Tier 3: Connecting to phenotypic and molecular ontologies and underlying pathways 

In tier 3, we aim to link recognized entities to existing ontologies, such as a phenotype entity to its identifier within the Mammalian Phenotype Ontology. This linkage between the ‘unstructured’  biological context of an entity and its structured ontological counterpart opens the possibility to go from entities to molecular pathways and processes. When we have translated the entities to molecular and physiological ontology terms, we can map the underlying mechanisms that lead to toxicological effects. From there we could build (quantitative) adverse outcome pathways.  

This work is carried out within the ONTOX Horizon 2020 Project. See: https://ontox-project.eu/ and  https://cordis.europa.eu/project/id/963845

Choi, Y. H., Han, C. Y., Kim, K. S., & Kim, S. G. (2019). Future Directions of Pharmacovigilance Studies Using  Electronic Medical Recording and Human Genetic Databases. Toxicological Research, 35(4), 319– 330. https://doi.org/10.5487/TR.2019.35.4.319 

Efroni, S., Song, M., Labatut, V., Emmert-Streib, F., Perera, N., & Dehmer, M. (2020). Named Entity  Recognition and Relation Detection for Biomedical Information Extraction. Frontiers in Cell and  Developmental Biology | Www.Frontiersin.Org, 1, 673. https://doi.org/10.3389/fcell.2020.00673 

Hahn, U., & Oleynik, M. (2020). Medical Information Extraction in the Age of Deep Learning. Yearb Med  Inform, 2020, 208–228. https://doi.org/10.1055/s-0040-1702001 

Lewinski, N. A., & Mcinnes, B. T. (2015). Using natural language processing techniques to inform research on nanotechnology. Beilstein J. Nanotechnol, 6, 1439–1449. https://doi.org/10.3762/bjnano.6.149 

Oki, N. O., Nelms, M. D., Bell, S. M., Mortensen, H. M., & Edwards, S. W. (2016). Accelerating Adverse  Outcome Pathway Development Using Publicly Available Data Sources.  

https://doi.org/10.1007/s40572-016-0079-y