Provenance service
FOXDEN provenance service is responsible for keep track of provenance information in FOXDEN infrastructure.
As a dataset moves through its lifecycle, this service automatically documents its lineage, recording its transformations and its relationships to other data prod- ucts. This provenance ensures the reproducibility of scientific results.
Provenance GET APIs
/datasetsprovides access to FOXDEN dataset provenance information/filesget list of files/parentsprovides access to parent datasets/childrenlists all children datasets/osinfoprovides access to OS information about datasets/environmentsprovides information about environments used in dataset creation/scriptslists all scripts information/packagesprovides list of (Python) packages used in a specific environment/provenanceget provenance information about given did
Provenance POST/PUT/DELETE APIs
All Provenance POST/PUT/DELETE APIs require authorization with appropriate
scope, write scope used for POST/PUT requests and delete scope is used for
DELETE API.
/datasetcreate, update and delete FOXDEN dataset/filecreate, update and delete FOXDEN dataset files/parentcreate, update and delete FOXDEN parent dataset/osinfocreate, update and delete OS related information/environmentcreate, update and delete environment information/scriptcreate, update and delete script information
Example
Here are examples of GET HTTP requests
# look-up all datasets
curl -v http://localhost:8310/datasets
# look-up concrete dataset=/x/y/z
dataset=/x/y/z
curl -v http://localhost:8310/dataset$dataset
# look-up files from a dataset
curl -v "http://localhost:8310/file?dataset=$dataset"
protected APIs
- HTTP POST requests
/datasetcreate new dataset data/filecreate new file data
- HTTP PUT requests
/datasetupdate dataset data/fileupdate file data
- HTTP DELETE requests
/dataset/*namedelete dataset/file/*namedelete file
Example
Here is an example of HTTP POST request
# record.json
{
"parent_did": "/beamline=aaa/btr=bbb/cycle=ccc/sample_name=sss",
"did": "/beamline=aaa/btr=bbb/cycle=ccc/sample_name=sss/test=child",
"processing": "processing string, e.g. glibc-123-python-123",
"osinfo": {"name": "linux-cc7", "kernel": "1-2-3", "version": "cc7-123"},
"environments": [
{"name": "galaxy", "version": "version", "details": "details",
"parent_environment": "conda-123", "os_name": "linux-cc7"},
{"name": "conda-123", "version": "version", "details": "details",
"parent_environment": null, "os_name": "linux-cc7",
"packages": [
{"name": "numpy", "version": "123"},
{"name": "matplotlib", "version": "987"}
]
}
],
"scripts": [
{"name": "reader", "options": "-reader_options", "parent_script": null, "order_idx": 1},
{"name": "chap", "options": "-chap_options", "parent_script": "myscript", "order_idx": 2}
],
"input_files": [
{"name": "/tmp/file1.png"},
{"name": "/tmp/file2.png"}
],
"output_files": [
{"name": "/tmp/file1.png"}
],
"site": "Cornell",
"buckets": ["bucketABC"]
}
# inject new record
curl -v -X POST -H "Authorization: Bearer $token" \
-H "Content-type: application/json" \
-d@./record.json \
http://localhost:8310/dataset
For more (and up-to-date examples) please see data integration area of this
repository and look-up JSON input in int_provenance.json file.