Downloading data from Node
Node allows to download up to 2GB via https so ftps
is the only viable option to retrieve a dataset. No instructions for that are available in Node so this SOP is the result of communication with the Node support team
Requirements:
- Node credentials: required to authenticate and connect to the ftps server. You can register through Node or using this link
lftp
: it’s already installed in the EC2 instance, to install in on a Mac see the instructions here- a list of the target run IDs: after logging into Node this can be downloaded for any public dataset
Step 1 : Connecting to the server
At the moment the ssl certificate for the FTP server is self-signed, so the option set ssl:verify-certificate no
is needed. This should change in the future, and when that happens the SOP will have to be updated to allow certificate verification.
$ lftp
lftp :~> set ftp:ssl-force true
lftp :~> set ssl:verify-certificate no
lftp :~> connect ftps://fms.biosino.org:2122
lftp fms.biosino.org:~> login your-username
Password: ****
The username is the full email you used to sign in to Node
Step 2 : Find the files
For each project there is a folder in the format /Public/byrun/OEXX/OERXXXX/
where X is a single digit. The project folder contains a folder per run accession, each of them containing the fastq files for that run.
You can list the content of the project folder to check if all the expected files are there. As an example command: with glob you can specify the pattern to identify all the files and list them with find.
lftp your-username@fms.biosino.org:~> glob find /Public/byrun/OEXX/OERXXXX/OERXXXX*/*.fq.gz
Step 3 : Get the files
This step can require a lot of time so it’s better to execute it from a screen terminal.
Select the files with the same pattern as above and download them with pget
to your local directory. You can check your local directory with lpwd
and change it with the command lcd
lftp your-username@fms.biosino.org:~> glob pget /Public/byrun/OEXX/OERXXXX/OERXXXX*/*.fq.gz