How to curate scATAC-seq Data
Overview
This guide is intended to hightlight the required metadata for scATAC-seq data. Optional metadata fields should be evaluated on an individual project basis given the experimental design and available metadata.
The required metadata includes:
(1) ontology terms: library construction method, sequencing method
(2) analysis file types that we require based on our review of scATAC-seq technology/publications (not evaluated during metadata schema validation)
(1) Ontology terms
(2) Analysis file(s)
-
Raw DNA sequencing data derived from sequencing of scATAC-seq libraries.
-
A “peaks” file containing the genomic coordinates corresponding to scATAC-seq peak identification. There is no strict format requirement. The peaks might be recorded in a bed file or a simple txt file, for example.
Barcodes
The cell barcode read and umi barcode read should be double-checked on a dataset basis, as this might change depending on whether 10X is used, and also how the user names their files. This link describes the barcode reads and lengths for 10X scATAC-seq.