Skip to main content

Data Types and File Types

Vitessce supports several data types which denote in an abstract sense the type of observations contained in a file (e.g., matrix, dataframe, image). For each data type, Vitessce may support multiple file types (datasets[].files[].fileType) which denote specific schemas and file formats that Vitessce knows how to read.

For example, a file that conforms to the obsEmbedding data type must contain embedding coordinates for each cell (assuming each observation represents a cell) computed via e.g. t-SNE or UMAP. Depending on which file format is more convenient, you may choose either obsEmbedding.csv or obsEmbedding.anndata.zarr.

Data Types and File Types

Data TypeFile TypesConvert from...
obsEmbedding
Per-observation 2D embedding coordinates. Typically used to store dimensionality reductions performed on cell-by-biomarker expression matrices.
obsPoints
Spatially-resolved 2D coordinates without a specified size. For example, individual RNA molecule x-y coordinates measured by FISH. (Supported by spatialBeta view.)
obsSpots
Spatially-resolved 2D coordinates with a specified size. For example, spot-based or bead-based spatial transcriptomics such as from 10x Visium. (Supported by spatialBeta view.)
obsSegmentations
Per-observation segmentation polygons or bitmasks. For example, cell or organelle segmentations.
obsLocations
2D coordinates representing precise locations corresponding to segmentations. For example, cell segmentation centroid coordinates to support lasso selection interactions.
obsSets
Lists or hierarchies of sets of observations. For example, cell type annotations or unsupervised clustering results.
obsLabels
Per-observation string labels. For example, alternate cell identifiers.
image
Multi-scale multiplexed imaging data, including OME-TIFF files and OME-NGFF Zarr stores.
obsFeatureMatrix
Observation-by-feature matrix. Typically used to store cell-by-gene expression matrices.
featureLabels
Per-feature string labels. For example, alternate gene identifiers.
genomic-profiles
Genomic profiles, such as ATAC-seq profiles.
sampleEdges
Tuples of (observationId, sampleId) to map observations to samples.
sampleSets
Lists or hierarchies of sets of samples.

Joint File Types

A joint file type is a pseudo-file type (pseudo in the sense that it does not correspond to any one data type) which allows a single file definition (and therefore a single URL) in the Vitessce configuration to expand to define multiple files of the atomic (i.e., non-joint) types listed in the table above.

This is motivated by the fact that one file may store information corresponding to more than one data type. For instance, AnnData files may store not only the obs-by-feature matrix (adata.X) but also multiple 2D embedding coordinates (adata.obsm['X_umap'] and adata.obsm['X_pca']), spatial coordinates (adata.obsm['X_spatial']), and cell type annotations (e.g., adata.obs['cell_type']). Rather than defining five different files (corresponding to the atomic file types obsFeatureMatrix.anndata.zarr, obsEmbedding.anndata.zarr, obsEmbedding.anndata.zarr, obsLocations.anndata.zarr, and obsSets.anndata.zarr, respectively), one anndata.zarr joint file definition can be used instead.

Note that joint file type expansion is currently not performed recursively (i.e., a joint file type expansion function must return a list of atomic file definitions).