Dataset

Provides access to chemical compounds and their features (e.g. structural, physical-chemical, biological, toxicological properties)

Description

URI Method Parameters  Result Status codes
Get a list of available datasets  /dataset GET [subjectid]
Query parameters (optional, to be defined by service providers)
List of URIs 
or RDF for the metadata only
200,404,503 
Get a dataset /dataset/{id} GET [subjectid] Representation of the dataset in a supported MIME type 200,404,503 
Query a dataset /dataset/{id}  GET [subjectid]
compound_uris[] and/or feature_uris[] to select compounds and features; further query parameters may be defined by service providers 
Representation of the query result in a supported MIME type 200,404,503 
Get metadata for a dataset /dataset/{id}/metadata GET [subjectid]  Representation of the dataset metadata in a supported MIME type  200,404,503  
Update metadata for a dataset /dataset/{id}/metadata  PUT [subjectid]   RDF or application/www-form-urlencoded   
Get a list of all compounds in a dataset /dataset/{id}/compounds GET [subjectid]   List of compound URIs  200,404,503
Get a list of all features in a dataset /dataset/{id}/features GET [subjectid]   RDF or List of feature URIs (pointing to feature definitions/ontologies)   200,404,503
Create a new dataset /dataset POST Dataset representation in a supported MIME type. MIME type to be specified via Content-type header.
  • Content-type:application/www-form-urlencoded dataset_uri , feature_uris[] and compound_uris[] parameters are used to specify subset of a dataset, as in GET operation;
  • File upload via Content-type:multipart/form-data: file parameter
  • File upload metadata: parameters as in opentox.owl
  • [subjectid]

[subjectid]

New URI /dataset/{id} or redirect to task URI (for large uploads)  200,202,400,503 
Create a new dataset using a reference featureURI list /dataset POST File upload via Content-type:multipart/form-data: file parameter
Reference featureURI list (http://server/path/dataset/{dataset_id}/feature)
Matcher (e.g. name of feature)
New URI /dataset/{id} or redirect to task URI (for large uploads)   200,202,400,503 
Update a dataset
  • /dataset/{id}
PUT 
  • Data representation in a supported MIME type;
  • entries for existing compound/feature pairs will be overwritten, entries for new compound/features will be added
  • File upload metadata: Dublin core annotation parameters, as in opentox,owl#Dataset

  • Content-type:application/www-form-urlencoded dataset_uri , feature_uris[] and compound_uris[] parameters are used to specify subset of a dataset, as in GET operation;
  • File upload via Content-type:multipart/form-data: file parameter
  • File upload metadata: Dublin core annotation parameters, as in opentox,owl#Dataset
  • [subjectid]
   
Remove a dataset
  • /dataset/{id}
DELETE compound_uris[] and/or feature_uris[]; further query parameters may be defined to select the data to be deleted

[subjectid]

  200,404,503
Remove a part of the dataset  
  • /dataset/{id}
DELETE 
  • [subjectid]
  200,404,503

(subject id-optional parameter that contains the OpenSSO A&A token needed to access protected services) 


 

Queries

Subsets of a dataset (e.g. all data for a certain feature, all data for a set of compounds)) are accessed through query parameters. This allows us to pass full URIs as parameters and circumvents the problem of no-unique IDs (e.g. for /dataset/{id}/compound/{compound_id} URIs). The query parameters compound_uris[] and feature_uris[] are mandatory, more advanced queries (e.g. similarity searches) may be implemented by individual services.


 

Examples

Get all features of two compounds
curl -X GET http://my_dataset_service/dataset_id?compound_uris[]=compound1_uri&compound_uris[]=compound2_uri
Get a single feature of a single compound
curl -X GET http://my_dataset_service/dataset_id?compound_uris[]=compound_uri&feature_uris[]=feature_uri
Remove a compound from a dataset
curl -X DELETE http://my_dataset_service/{dataset_id}?compound_uris[]=<compound_uris>
Upload an sdf to ambit server
curl -X POST -H 'Content-Type:chemical/x-mdl-sdfile' --data-binary @filename.sdf http://ambit.uni-plovdiv.bg:8080/ambit2/dataset
Get compound URIs of a dataset
curl -X GET -H 'Accept:text/uri-list' http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/{dataset_id}
Upload an sdf that contains features of an already existing dataset (if a feature name in filename.sdf matches a feature name from associated_features the corresponding URI is assigned)
curl -X POST -H 'Content-Type:chemical/x-mdl-sdfile' --data-binary @filename.sdf -d 'associated_features=http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/{datsaet_id}/feature' -d 'matcher=Name' http://ambit.uni-plovdiv.bg:8080/ambit2/dataset

Together with a little bit of RDF processing you can use queries also for set operations (e.g. subsets, split, merge, intersection).

PS Take care to URI encode parameters that are sent via GET.


 

Dataset representation

RDF specification

Metadata

Features

RDF dataset representation

The feature URI points to a Feature object, which allows retrieval of the Feature object as RDF and provides information about the name, units, source and the type of the feature. The feature type is denoted by a mandatory link to an ontology via owl:sameAs or directly subclassing a class from an ontology.

This allows Feature URI to point directly to an existing (fixed) ontology, or to a web service, providing access to dynamically created Feature objects.

Conformers

Conformer URIs (see Compound API) can be used instead of compound URIs. The Resolving the parent structure should be done via the compound webservice.


 

Examples

Multi Cell Call prediction from J48 (N3 notation):

example:DatasetPredicted
      a       ot:Dataset ;
      dc:identifier "http://myservice/dataset/{datasetid}"^^xsd:string ;
      dc:title "Multi Cell Call prediction from J48"^^xsd:string ;
      ot:dataEntry
              [ a       ot:DataEntry ;
                ot:compound example:benzene ;
                ot:values
                        [ a       ot:FeatureValue ;
                          ot:feature example:MultiCellCallPredicted ;
                          ot:value "true"^^xsd:boolean
                        ];
                ot:values
                        [ a       ot:FeatureValue ;
                          ot:feature example:MultiCellCall ;
                          ot:value "true"^^xsd:boolean
                        ] ;
              ] .

example:benzene
      a       ot:Compound ;
      dc:identifier "http://myservice/compound/{compoundid1}"^^xsd:string .

example:MultiCellCallPredicted
      a       ot:Feature ;
      dc:identifier "http://myservice/feature/{featureid3}"^^xsd:string ;
      dc:title "MultiCellCall"^^xsd:string ;
      ot:hasSource example:WekaJ48 .

Single compound with a single substructure:

<https://ambit.uni-plovdiv.bg:8443/ambit2/dataset/2407>
      a       ot:Dataset ;
      ot:dataEntry
              [ a       ot:DataEntry ;
                ot:compound <https://ambit.uni-plovdiv.bg:8443/ambit2/compound/17285> ;
                ot:values
                        [ a       ot:FeatureValue ;
                          ot:feature <https://ambit.uni-plovdiv.bg:8443/ambit2/feature/171898> ;
                          ot:value "true"^^xsd:boolean
                        ]
              ] .
 
<https://ambit.uni-plovdiv.bg:8443/ambit2/compound/17285>
      a       ot:Compound .
 
<https://ambit.uni-plovdiv.bg:8443/ambit2/feature/171898>
      a       ot:Substructure ;
      dc:title "FCF" ;
      ot:smarts "FCF" ;
      ot:hasSource <https://ambit.uni-plovdiv.bg:8443/ambit2/model/26469> ;
 
ot:Substructure
      a       owl:Class ;
      rdfs:subClassOf ot:Feature .

Retrieving metadata:

curl -H "Accept:text/n3" http://apps.ideaconsult.net:8080/ambit2/dataset/112/metadata

<http://apps.ideaconsult.net:8080/ambit2/dataset/112>

      a       ot:Dataset ;
      dc:title "ToxCast_ToxRefDB_20091214.txt" ;
      dcterms:license <http://www.opendatacommons.org/licenses/pddl> .

Adding metadata to an existing dataset (using RDF)

curl -X PUT -H "Content-type:application/rdf+xml" -d @mymetadata.rdf http://host/dataset/{id}/metadata 

Adding metadata to an existing dataset (using web form)

curl -X PUT -d "license=http://www.opendatacommons.org/licenses/pddl" -d "title=blabla" http://host/dataset/{id}/metadata

Specifying the metadata on upload (mimics sending a multipart web form)

curl -X POST -F "file=@alkanes.csv" \
-F "license=http://www.opendatacommons.org/licenses/pddl/" 
-F "title=Alkanes" http://host/dataset/

 

Supported MIME types

Mandatory

  • application/rdf+xml (default)
  • application/www-form-urlencoded
  • multipart/form-data

Optional

  • other RDF serialization formats
  • application/xml
  • text/xml
  • text/x-yaml
  • text/x-json
  • application/json
  • text/csv
  • text/arff
  • text/html
  • chemical/x-mdl-sdfile
  • multipart/form-data for file uploads

 

HTTP status codes

Interpretation Nr Name
Success 200 OK
Asynchronous task started 202 Accepted
Dataset not found 404 Not Found
Incorrect MIME type 400 Bad request 
Service not available 503 Service unavailable