Skip to main content

Data Types and File Types

Vitessce supports several data types which denote in an abstract sense the type of observations contained in a file (e.g., matrix, dataframe, image). For each data type, Vitessce may support multiple file types (datasets[].files[].fileType) which denote specific schemas and file formats that Vitessce knows how to read.

For example, a file that conforms to the obsEmbedding data type must contain embedding coordinates for each cell (assuming each observation represents a cell) computed via e.g. t-SNE or UMAP. Depending on which file format is more convenient, you may choose either obsEmbedding.csv or obsEmbedding.anndata.zarr.

Data Types and File Types

Data TypeFile TypesConvert from...
Per-observation 2D embedding coordinates. Typically used to store dimensionality reductions performed on cell-by-biomarker expression matrices.
Spatially-resolved 2D coordinates without a specified size. For example, individual RNA molecule x-y coordinates measured by FISH. (Supported by spatialBeta view.)
Spatially-resolved 2D coordinates with a specified size. For example, spot-based or bead-based spatial transcriptomics such as from 10x Visium. (Supported by spatialBeta view.)
Per-observation segmentation polygons or bitmasks. For example, cell or organelle segmentations.
2D coordinates representing precise locations corresponding to segmentations. For example, cell segmentation centroid coordinates to support lasso selection interactions.
Lists or hierarchies of sets of observations. For example, cell type annotations or unsupervised clustering results.
Per-observation string labels. For example, alternate cell identifiers.
Multi-scale multiplexed imaging data, including OME-TIFF files and OME-NGFF Zarr stores.
Observation-by-feature matrix. Typically used to store cell-by-gene expression matrices.
Per-feature string labels. For example, alternate gene identifiers.
Genomic profiles, such as ATAC-seq profiles.
Tuples of (observationId, sampleId) to map observations to samples.
Lists or hierarchies of sets of samples.

Joint File Types

A joint file type is a pseudo-file type (pseudo in the sense that it does not correspond to any one data type) which allows a single file definition (and therefore a single URL) in the Vitessce configuration to expand to define multiple files of the atomic (i.e., non-joint) types listed in the table above.

This is motivated by the fact that one file may store information corresponding to more than one data type. For instance, AnnData files may store not only the obs-by-feature matrix (adata.X) but also multiple 2D embedding coordinates (adata.obsm['X_umap'] and adata.obsm['X_pca']), spatial coordinates (adata.obsm['X_spatial']), and cell type annotations (e.g., adata.obs['cell_type']). Rather than defining five different files (corresponding to the atomic file types obsFeatureMatrix.anndata.zarr, obsEmbedding.anndata.zarr, obsEmbedding.anndata.zarr, obsLocations.anndata.zarr, and obsSets.anndata.zarr, respectively), one anndata.zarr joint file definition can be used instead.

Note that joint file type expansion is currently not performed recursively (i.e., a joint file type expansion function must return a list of atomic file definitions).