Datasets

Ageing Drosophila brain

This is sourced from Davie et al. (Cell 2018, GSE 107451) and contains scRNA-seq data from a collection of fly brain cells along with each cell’s age (in days). It is a useful dataset for exploring a common scenario in multi-modal integration: scRNA-seq data aligned to a 1-dimensional secondary modality. Please see the example in Visualization where this dataset is used.

import schema
adata = schema.datasets.fly_brain()

Paired RNA-seq and ATAC-seq from mouse kidney cells

This is sourced from Cao et al. (Science 2018, GSE 117089) and contains paired RNA-seq and ATAC-seq data from a collection of mouse kidney cells. The AnnData object provided here has some additional processing done to remove very low count genes and peaks. This is a useful dataset for the case where one of the modalities is very sparse (here, ATAC-seq). Please see the example in Paired RNA-seq and ATAC-seq where this dataset is used.

import schema
adata = schema.datasets.scicar_mouse_kidney()