Generating submission summary report: generate_summary.py
TODO: see if this is still useful to wranglers
This tool generates two summary metadata reports for a submission/project. The first report counts the number specific entities in the submission/project. Currently, the entities counted is hard-coded and not complete. The second report contains a summary of other useful metadata pieces that can be used to populate projects pages. It is also not complete.
Pre-requisites and installation
- python3
- git
- requirements of the repo
To install the tool, clone the ingest-broker repository.
Usage
Move to the ingest broker directory.
cd ingest-broker
Install the requirements of the package or run in docker (instructions for how to do this in the main README)
pip install -r requirements.txt
Check out the usage of generate_summary.py
by using --help
:
mfreeberg$ python generate_summary.py --help
usage: generate_summary.py [-h] H T U O
Process some integers.
positional arguments:
H the url of the ingest API (e.g
http://api.ingest.dev.data.humancellatlas.org)
T the type of summary (project or submission)
U the uuid of the project/submission
O summary output format
optional arguments:
-h, --help show this help message and exit
Run the generate_summary.py
script supplying the ingest API url, the type of summary (project or submission), the (project or submission envelope, respectively) UUID, and the desired output format (json or tsv). If you choose json, the report will be printed to the screen. If you choose tsv, the report be written to report.tsv
.
mfreeberg$ python3 generate_summary.py http://api.ingest.dev.data.humancellatlas.org project 763e071c-34ed-4db5-9006-8929ccdf5b26 tsv
mfreeberg$ cat report.tsv
entity count
dissociation_protocol 1
enrichment_protocol 1
library_preparation_protocol 1
sequencing_protocol 1
process 5096
donor_organism 8
specimen_from_organism 8
cell_suspension 2544
sequence_file 5088
Also generated is a file scrape.tsv
which contains a bit more key pieces of metadata that can be useful to fill out project pages.
mfreeberg$ cat scrape.tsv
cell_type ['pancreatic A cell', 'acinar cell', 'type B pancreatic cell', 'pancreatic D cell', 'pancreatic ductal cell', 'mesenchymal cell']
num_total_estimated_cells 2544
organ ['pancreas', 'islet of Langerhans']
organoid_organ_model []
genus_species ['Homo sapiens']
num_donors 8
num_specimens 8
num_cell lines 0
num_organoids 0
num_cell suspension 2544
library_construction_approach ['Smart-seq2']
num_fastqs 5088
project_title ['Single cell transcriptome analysis of human pancreas reveals transcriptional signatures of aging and somatic mutation patterns.']
contact_names/emails ['Martin, Enge', 'Laura,,Huerta', 'Matthew,,Green', 'martin.enge@gmail.com', 'lauhuema@ebi.ac.uk', 'hewgreen@ebi.ac.uk']
If you would like additional metadata reported by this tool, please make a request via a GitHub issue in the ingest-central repository.