Open TG-GATEs data access with OpenTox
Open TG-GATEs is a well-known large-scale toxicogenomics database, assembled by Japan’s Toxicogenomics Project (TGP) during 2002-2011 as a public-private partnership. The data is free for anyone to access and use for any purpose. However, effort associated with data normalisation and raw data processing remains a barrier to use in practice, and thus, in many cases, additional access methods can be very valuable.
We are happy to announce that Douglas Connect now provides, as part of our ongoing work to develop the OpenTox data platforms, two additional such access methods, which we expect to become valuable to researchers and professionals. The first is the OpenTox Open TG-GATEs API, which makes the dataset available as JSON data. The second is through OpenTox Garuda gadgets, developed in partnership with the Systems Biology Institute (SBI). Both of these access methods are freely available to the public. Below, we describe briefly how to access data by using these new methods.
1. The OpenTox Open TG-GATEs API
As part of the OpenTox API suite, the Open TG-GATEs (OTG) API makes the database available in JSON format. It is possible to browse the API endpoints interactively together with other OpenTox APIs in the OpenTox Data Explorer (pictured), available at https://opentox-data-explorer.cloud.douglasconnect.com. At the moment, this is probably one of the simplest methods currently available anywhere to explore the content of Open TG-GATEs. In particular, samples and pathologies may be explored with ease, and searching is also supported.
To make use of OTG data, researchers will typically want to follow a two-step process: 1. Identify samples of interest, and 2. download gene expression data. While the data explorer may be highly useful during step 1, it does not yet support gene expression data download, and for this direct API calls are necessary. As an example, the following call will retrieve averaged log-2 fold data for 6 samples and filter by p-value, maximum 0.05:
To obtain all 6 samples individually, it is possible to set valueTypeFilter=log2fold. Finally, the absolute and absmean value types are also supported.
To discover samples using the API, assuming this has not been done with the data explorer, an API call such as the following may be used:
This finds Rat samples of type in vivo, Liver dose level “High”, at 24 hrs after administration.
API calls and parameters are documented in a Swagger specification: https://open-tggates-api.cloud.douglasconnect.com/v2/swagger.json . A complete API reference will be published in the near future, and in the meantime, we urge users to contact us directly with any questions they might have.
2. OpenTox Garuda gadgets
Garuda is a platform for scientific biology developed by the Systems Biology Institute, Tokyo. As part of an ongoing effort to bring OpenTox databases and APIs to the Garuda platform, Douglas Connect and SBI jointly release the Sample Finder and the Data Fetcher gadgets for Open TG-GATEs. Under the hood, these gadgets actually function as clients for the API described above, with the added benefit that the Garuda context makes many kinds of downstream analysis available once gene expression data has been obtained. For users of the Garuda community edition (available free of charge), these gadgets are available on the Garuda Gateway.
In the Sample Finder gadget (pictured), by choosing combinations of relevant parameters, samples may be discovered. For example, it is possible to define a filter such as all in vivo samples from liver (primary hepatocytes) at 24 hr exposure time.
When sample information has been obtained, users will probably want to send the samples to the data fetcher gadget in order to look at gene expression values for the samples. This can be done by selecting the list of Sample IDs in the result table, by clicking anywhere on that column. The column then becomes highlighted (it must be the only highlighted column). Then, by clicking the Discover button, one may discover compatible gadgets that can receive the sample list. One of these should be the Data Fetcher gadget. Double click this gadget to send the samples there. In the pop-up dialog that appears, select sample list to indicate that what is being sent is a sample list.
The Data Fetcher gadget fetches a table of gene expression data from the Open TG-GATEs database. In this table, the columns correspond to samples discovered in step 1 (or groups of samples), and the rows correspond to genes. The values in the table are gene expression values, as log-2 fold or absolute value, individual samples or averages. It is possible to request a set of genes of interest, or to filter genes by p-value for significance. It is also possible to request all genes that the API knows about.
In the simplest case, it is possible to obtain all genes in the database by specifying a blank text file as the gene list. Genes may also be specified by probe IDs (affymetrix), by Entrez gene IDs, and by gene symbols. A text file where each line contains a gene identifier may be supplied.
Once both sample and gene lists have been defined, the Launch button may be clicked to obtain data. The data will be displayed in table form. At this point, the Discover button may be used to send the data downstream to other gadgets, for example to export the table or to analyse the gene set.
A full user manual for these gadgets is currently in preparation. The tutorial videos available on the DouglasConnect YouTube channel, https://www.youtube.com/douglasconnect, may also be helpful.
Use of Open TG-GATEs data must adhere to the database license, as described at http://toxico.nibiohn.go.jp/english/agreement.html.