OpenTox Euro 2019 talk: Data integration into GENEVESTIGATOR: overcoming challenges and creating opportunities
Although genome-wide RNA expression analysis has become a routine tool in biomedical research, extracting valuable biological insight from thousands of published studies and underlying data remains a major challenge for two main reasons: the heterogeneity in annotations and technology, and the unreliable quality levels.
GENEVESTIGATOR® is an analysis tool and database, containing manually curated gene expression data from public studies, making use of controlled vocabularies for several biological dimensions such as tissues, genotypes, diseases and treatments. To avoid bias in the results a strict quality control ensures to only integrate high quality samples and experiments into the database. This, together with the use of global normalization procedures, adds further value to the gene expression data compendium, and helps to avoid bias in the results. The powerful search engine and the user-friendly interface empowers biologists to perform on-the-fly calculations for single experiments or across the whole data compendium.
Over 120’000 samples from important data sources for toxicological and pharmacological applications such as the LINCS project, CMAP, TG-GATE and Drug Matrix representing over 2’250 compounds can be analyzed simultaneously in order to:
• Identify gene lists regulated in response to a drug treatment
• Find out how a target is regulated across all pharmacology / toxicology datasets
• Detect clusters of compounds with similar biological effects
• Search for diseases with gene signatures opposite to a drug treatment
Curated data from GENEVESTIGATOR are also well suited for downstream AI applications.