How to curate CITE-seq Gene Expression & Surface Protein Data

Overview

This guide is intended to hightlight the required metadata for CITE-seq data. Optional metadata fields should be evaluated on an individual project basis given the experimental design and available metadata.

Required metadata includes:

(1) ontology terms: library construction method, sequencing method

(2) analysis file types that we require based on our review of CITE-seq technology/publications (not evaluated during metadata schema validation)

(3) sequence file types that we require based on our review of CITE-seq technology/publications (not evaluated during metadata schema validation)

(1) Ontology terms

(2) Analysis file(s)

CITE-seq (cell surface protein profiling) EFO:0030008

  • gene expression matrix for scRNA-seq data
  • list of antibody derived tags (ADT): barcode sequence and name
  • counts matrix for antibody derived tag (ADT) library

CITE-seq (sample multiplexing) EFO:0030009

  • gene expression matrix for scRNA-seq data
  • list of hashtag oligos (HTO): barcode sequence and name
  • counts matrix for hashtag oligo (HTO) library

CITE-seq (cell surface protein profiling) EFO:0030008 & CITE-seq (sample multiplexing) EFO:0030009

  • gene expression matrix for scRNA-seq data
  • list of antibody derived tags (ADT): barcode sequence and name
  • counts matrix for antibody derived tag (ADT) library
  • list of hashtag oligos (HTO): barcode sequence and name
  • counts matrix for hashtag oligo (HTO) library

(3) Sequence file(s)

CITE-seq (cell surface protein profiling) EFO:0030008

  • raw sequencing fastq: scRNA-seq data
  • raw sequencing fastq: ADT barcodes

n.b. a separate cell suspension id should be created for each library type

CITE-seq (sample multiplexing) EFO:0030009

  • raw sequencing fastq: scRNA-seq data
  • raw sequencing fastq: HTO barcodes

n.b. a separate cell suspension id should be created for each library type

CITE-seq (cell surface protein profiling) EFO:0030008 & CITE-seq (sample multiplexing) EFO:0030009

  • raw sequencing fastq: scRNA-seq data
  • raw sequencing fastq: ADT barcodes
  • raw sequencing fastq: HTO barcodes

n.b. a separate cell suspension id should be created for each library type