Downloading and converting SRA objects from SRA

Sometimes when converting published datasets from GEO, there are only single fastq files available for download when there should be two (R1/R2) or three files (I1/R1/R2) per run depending on the type of sequencing.

Prerequisites

To download all sra objects from a project with pysradb

pysradb is a convenient way to prefetch all runs from a given project accession. Example usage to download all runs from an SRP accession

  1. Activate your desired virtualenv
  2. Install pysradb with pip install pysradb 1.
pysradb download -y -t 3 --out-dir ./pysradb_downloads -p SRP063852

This command saves sra object files in a folder structure of SRP/SRX/SRR.

To convert each SRR sra object to separate fastqs, we need to use the fastq-dump command in the sratoolkit

fastq-dump --split-files <path to file/accession>

You can use a bash for loop to iterate over all the runs that were downloaded.

It may be possible to also use fasterq-dump, but we haven’t tried this yet. Feel free to try it and expand this snippet.