Downloading and converting SRA objects from SRA
Sometimes when converting published datasets from GEO, there are only single fastq files available for download when there should be two (R1/R2) or three files (I1/R1/R2) per run depending on the type of sequencing.
Prerequisites
virtualenvpython3sratoolkit
To download all sra objects from a project with pysradb
pysradb is a convenient way to prefetch all runs from a given project accession. Example usage to download all runs from an SRP accession
- Activate your desired virtualenv
- Install pysradb with
pip install pysradb1.
pysradb download -y -t 3 --out-dir ./pysradb_downloads -p SRP063852
This command saves sra object files in a folder structure of SRP/SRX/SRR.
To convert each SRR sra object to separate fastqs, we need to use the fastq-dump command in the sratoolkit
fastq-dump --split-files <path to file/accession>
You can use a bash for loop to iterate over all the runs that were downloaded.
It may be possible to also use fasterq-dump, but we haven’t tried this yet. Feel free to try it and expand this snippet.