Download ENA files using aspera
ENA offers an option to download data files through Aspera, which is usually faster than accessing the files through each link.
Install
To install:
- Download aspera on your home directory on EC2 using the following command:
wget https://ak-delivery04-mul.dhe.ibm.com/sar/CMA/OSA/08q6g/0/ibm-aspera-cli-3.9.6.1467.159c5b1-linux-64-release.sh - Run the file to install aspera:
sh ibm-aspera-cli-3.9.6.1467.159c5b1-linux-64-release.sh - After installing, export the path to your .bashrc file by running
vim .bashrcand copying thisexportstatement to the end of the file:export PATH=~/.aspera/cli/bin:$PATH. The next time you log into the EC2, you will be able to run the commands without any additional step.
Download files
Once installed, downloading the files locally is easy by following the instructions on ENA’s ReadTheDocs page. Alternatively, you can follow these steps if you need to download a full dataset:
- Locate the project page (e.g. https://www.ebi.ac.uk/ena/browser/view/PRJEB40448)
- Download the Json report at the bottom of the page and upload it to your own
/data/folder in the EC2 - Open a virtual session (The next step will take some time, so it’s better to leave it running under a virtual session)
cdto your/data/folder and run the following command:cat <name_of_report_file> | jq '.[].fastq_ftp' | grep -E -o "ftp\.[^;]*fastq\.gz" | sed 's/ftp.sra.ebi.ac.uk\///g' | \ xargs -I{} sh -c "\ ascp -QT -l 300m -P33001 \ -i ~/.aspera/cli/etc/asperaweb_id_dsa.openssh \ era-fasp@fasp.sra.ebi.ac.uk:{} \ \$( echo {} |\ awk -F/ '{print $6}' )"This command will read the report, isolate the file names and start downloading them.
Useful tips
You can pass the argument -P to parallelize xargs. This will run several downloads in parallel. Example:
cat <name_of_report_file> | jq '.[].fastq_ftp' | grep -E -o "ftp\.[^;]*fastq\.gz" | sed 's/ftp.sra.ebi.ac.uk\///g' | xargs -I{} -P [enter parallelisation number] sh -c "ascp -QT -l 300m -P33001 -i ~/.aspera/cli/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:{} \$( echo {} | awk -F/ '{print $6}' )"
The last {} is the filename that will be used for download. If you want to create a specific folder for the files, you can create the folder and append it to the argument, following the next example (where \"my_cool_fastq/\" would be the name of the folder) Example:
cat <name_of_report_file> | jq '.[].fastq_ftp' | grep -E -o "ftp\.[^;]*fastq\.gz" | sed 's/ftp.sra.ebi.ac.uk\///g' | xargs -I{} sh -c "ascp -QT -l 300m -P33001 -i ~/.aspera/cli/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:{} \$( echo {} | awk -F/ '{print \"my_cool_fastq/\" $6}' )"