Update ENA runs

TOC {:toc}
Overview

This is the SOP for fixing datasets in the issue: ebi-ait/hca-ebi-wrangler-central#250

Requirements

Gain access to the EBI cluster
```
ssh ebi-cli.ebi.ac.uk
```
Install gsutil in your environment in the EBI cluster and log in using your EMBL-EBI google account. You could follow instructions from https://cloud.google.com/storage/docs/gsutil_install to install gs_util. See more details here about setting up your access to Terra staging area.
Get DSP’s Webin credentials. (Only an ingest developer has access to this atm)
Clone the ingest-archiver repository. The scripts that will be used is in ena directory of that repo.
```
git clone https://github.com/ebi-ait/ingest-archiver.git
pip install -r requirements.txt
```
Get your JWT Token from Ingest UI.
1. Log in in Ingest UI https://contribute.data.humancellatlas.org/login using the account which has the WRANGLER role.
2. In Chrome, right click and select Inspect to open the developer console. Select the Network tab.
3. Refresh the page, https://contribute.data.humancellatlas.org/home
4. Check the Authorization in headers of the request to https://api.ingest.archive.data.humancellatlas.org/auth/account
5. Copy the token after Bearer prefix:
```
Authorization: Bearer <copy the very long string of random characters>
```
6. That token has 1 hr validity. The token will be needed in the submitter script later.

Steps

1 - Suppress sequencing runs

Get the list of sequencing runs to be suppressed. This can be downloaded as TSV/JSON from the ENA Browser.

File a ticket via ENA helpdesk to suppress the old sequencing runs. Guide on answering the form questions:

Submitter: Broker
Query is related to: Suppression
I work on: Humans
Organisms classification: Not applicable
The work is: Other/not sure (Raw sequencing reads)

2 - Clear sequencing runs

Clear the sequencing run accessions in file metadata. The following should not be in the file metadata json: "insdc_run_accessions": [ "ERR6449905" ]

Update clear_run_accession_from_files.py to have a jwt token from the Ingest UI then run the following: python clear_run_accession_from_files.py <submission-uuid>

3 - Submit new sequencing runs

Make sure that the metadata in Ingest contains sequencing experiment accessions. The submitter script will raise an error if any of the assay processes has no accession. The assay processes in the submission should have the following property:
```
 "insdc_experiment": {
   "insdc_experiment_accession": "ERX4319109"
 }
```
Download all files from Ingest / Terra upload area to any directory inside /nfs/production/hca/ in the EBI cluster. gsutil can be used for downloading the files The files may also be in the hca-util upload area but we should make sure they’re valid. Using Ingest/Terra upload area means the files have already been validated before. Please prefer downloading the Terra upload area as downloading from Ingest upload area will incur cost to our AWS account.

Checksum all the files.

 gsutil hash -hm gs://broad-dsp-monster-hca-prod-ebi-storage/prod/<project_uuid>/data/* | grep -A1 "hex" | awk -F"/" '{printf $4 $1}' | awk -F"--" '{for (i=1;i<NF;i++)print $i}' | awk -F":" '{print $1 $3}' > <md-filename>.txt

Upload the files to Webin FTP upload area (could be in parallel with checksumming)

$ cd <directory where you downloaded the files>
$ lftp webin2.ebi.ac.uk -u <webin-user>
$ > # input webin-password
$ mkdir parent-dir
$ cd parent-dir
$ mput *

Please refer to ENA documentation for more details

Run the submit_10x_fastq_files.py script. The receipt.xml and report.json file should be available after running the script. The receipt.xml will contain the ENA REST API response. The report.json will contain some report on which files were updated with the run accessions from ENA response.
```
python submit_10x_fastq_files.py <submission-uuid> <md5-filename> <jwt-token-from-ingest-ui> [--ftp_dir <parent-dir>]
```
Verify that the new runs were submitted. They should be visible in the Webin Portal but it may take 48 hours before they become available in the ENA browser

Table of contents

Overview

Requirements

Steps

1 - Suppress sequencing runs

2 - Clear sequencing runs

3 - Submit new sequencing runs