Detailed look into the mapping between AnnData and SingleCellExperiment objects

{anndataR} allows users to convert to and from SingleCellExperiment and AnnData objects. This can be done with or without extra user input as to which fields and slots of the respective objects should be converted and should be put where. Please take into account that lossless conversion is not always possible between AnnData and SingleCellExperiment, and please inspect the object before and after inspection to ensure that all data is correctly converted.

library(anndataR)
library(SingleCellExperiment)

We first generate a sample dataset to work with.

ad <- generate_dataset(
  n_obs = 10L,
  n_var = 20L,
  x_type = "numeric_matrix",
  layer_types = c("integer_matrix", "numeric_rsparse"),
  obs_types = c("integer", "numeric", "factor"),
  var_types = c("character", "numeric", "logical"),
  obsm_types = c("numeric_matrix", "numeric_csparse"),
  varm_types = c("numeric_matrix", "numeric_csparse"),
  obsp_types = c("numeric_matrix", "numeric_csparse"),
  varp_types = c("integer_matrix", "numeric_matrix"),
  uns_types = c("vec_integer", "vec_character", "df_integer"),
  format = "AnnData"
)

# add PCA reduction
ad$obsm[["X_pca"]] <- matrix(1:50, 10, 5)
ad$varm[["PCs"]] <- matrix(1:100, 20, 5)

ad$obsm[["X_umap"]] <- matrix(1:20, 10, 2)

Convert AnnData objects to SingleCellExperiment objects

Implicit conversion

{anndataR} will try to make a reasonable guess of which AnnData slots should end up in which SingleCellExperiment slots. A SingleCellExperiment object (this is converted from an AnnData object) consists of assays, colData, rowData, metadata, reducedDims, colPairs, rowPairs and metadata.

Each of these slots can be customized by the user by providing a mapping. We will go more into detail on these user-specified mappings in the mapping section.

By default, anndataR will try to guess a reasonable mapping. If you do not want this to happen, and you want nothing to be converted to a slot, you can pass an empty list.

Here, we showcase what happens if you do not provide any mapping for the conversion.

sce <- to_SingleCellExperiment(ad)
sce
#> class: SingleCellExperiment 
#> dim: 20 10 
#> metadata(3): vec_integer vec_character df_integer
#> assays(3): counts integer_matrix numeric_rsparse
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(3): character numeric logical
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(3): integer numeric factor
#> reducedDimNames(2): pca umap
#> mainExpName: NULL
#> altExpNames(0):

In the following subsections, we detail how each of these implicit conversions work.

assays

In an AnnData object, count matrices can be present in the X slot or in the layers slot. In a SingleCellExperiment object, count matrices are stored in the assays slot, as a named list.

By default, the X slot and all the elements of the layers slot will be stored in the assays slot of the SingleCellExperiment object. We try to guess the name of the X slot: if there is no counts layer present, we will use the X slot as the counts assay. If there is a counts layer present, we will name the X slot the data assay. All other layers are stored as assays with their layer name.

In the below example, we will convert an AnnData object with a counts layer and two other layers to a SingleCellExperiment object. In order for the implicit conversion to work, we will not provide a layers_mapping. We explicitly pass empty lists to the other mapping arguments for clarity in the resulting object. They ensure that nothing gets converted to the respective slots.

sce_layers <- to_SingleCellExperiment(
  ad,
  colData_mapping = list(),
  rowData_mapping = list(),
  reduction_mapping = list(),
  colPairs_mapping = list(),
  rowPairs_mapping = list(),
  metadata_mapping = list()
)

sce_layers
#> class: SingleCellExperiment 
#> dim: 20 10 
#> metadata(0):
#> assays(3): counts integer_matrix numeric_rsparse
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(0):
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(0):
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

We can see that indeed, the X slot got stored as counts and the layers got stored as assays with the same name.

reductions

A dimensionality reduction can consist of multiple parts that are stored separately in the AnnData object. Take for example the very common PCA reduction. Usually, the principal components are stored in the obsm slot (usually called X_pca), the loadings in the varm slot (usually called PCs) and the explained variance in the uns slot (usually called pca).

In a SingleCellExperiment object, the reduced dimensions are usually stored in the reducedDims slot, as either an element of a named list, or as a LinearEmbeddingMatrix. In the first case, only the reduced dimensions are stored, in the second case, the loadings and associated metadata are stored as well.

We guess the mapping of the reductions the same way as we do for the Seurat conversion: - If the obsm slot contains a slot starting with X_, we will store this as a reducedDims slot and no associated varm or uns slots. - If the obsm slot contains a X_pca slot, we will also store the associated loadings (in varm) in a LinearEmbeddingMatrix.

sce_dimred <- to_SingleCellExperiment(
  ad,
  colData_mapping = list(),
  rowData_mapping = list(),
  colPairs_mapping = list(),
  rowPairs_mapping = list(),
  metadata_mapping = list()
)

reducedDims(sce_dimred)
#> List of length 2
#> names(2): pca umap

We can see that indeed, the pca dimred got converted to a LinearEmbeddingMatrix, comprising of the information in the obsm and varm slots. The umap dimred got stored as a reducedDims slot, and consists only of the reduced dimensions in the obsm slot.

colData, rowData, colPairs, rowPairs, metadata

The other SingleCellExperiment slots are easy one-to-one mappings of AnnData slots. We will assume that all colData is stored in the obs slot, all rowData is stored in the var slot, all colPairs are stored in the obsp slot and all rowPairs are stored in the varp slot, and all metadata is stored in the uns slot.

sce_implicit <- to_SingleCellExperiment(
  ad,
  assays_mapping = list(),
  reduction_mapping = list()
)

sce_implicit
#> class: SingleCellExperiment 
#> dim: 20 10 
#> metadata(3): vec_integer vec_character df_integer
#> assays(0):
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(3): character numeric logical
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(3): integer numeric factor
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

Explicit conversion

Each of the conversions can be customized, up to a point, by providing a mapping. You provide this in the form of a named list, where the names are the names of the SingleCellExperiment slots and the values are the names of the AnnData slots.

We will give an example for each of the mappings.

assays_mapping

sce_assays <- to_SingleCellExperiment(
  ad,
  assays_mapping = list(counts = "X", layer1 = "integer_matrix", layer2 = "numeric_rsparse"),
  colData_mapping = list(),
  rowData_mapping = list(),
  reduction_mapping = list(),
  colPairs_mapping = list(),
  rowPairs_mapping = list(),
  metadata_mapping = list()
)
sce_assays
#> class: SingleCellExperiment 
#> dim: 20 10 
#> metadata(0):
#> assays(3): counts layer1 layer2
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(0):
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(0):
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

You can see that the X slot got stored as counts, and the layers got stored as assays with the names layer1 and layer2.

colData, rowData, colPairs, rowPairs, metadata

sce <- to_SingleCellExperiment(
  ad,
  assays_mapping = list(),
  colData_mapping = list(coldata1 = "integer", coldata2 = "numeric"),
  rowData_mapping = list(rowdata1 = "character", rowdata2 = "logical"),
  reduction_mapping = list(),
  colPairs_mapping = list(colPairs_dense = "numeric_matrix", colPairs_sparse = "numeric_csparse"),
  rowPairs_mapping = list(rowPairs1 = "integer_matrix", rowPairs2 = "numeric_matrix"),
  metadata_mapping = list(vector1 = "vec_integer", vector2 = "vec_character", df = "df_integer")
)

sce
#> class: SingleCellExperiment 
#> dim: 20 10 
#> metadata(3): vector1 vector2 df
#> assays(0):
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(2): rowdata1 rowdata2
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(2): coldata1 coldata2
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

Here you can see that the anndata slots (specified as values in the mapping) get stored in the corresponding SingleCellExperiment slots (the columns of which are the names of the mapping). e.g. the integer column of the AnnData obs in the AnnData object gets stored in the coldata1 column of the colData of the SingleCellExperiment object.

reductions_mapping

sce <- to_SingleCellExperiment(
  ad,
  assays_mapping = list(counts = "X"),
  colData_mapping = list(),
  rowData_mapping = list(),
  reduction_mapping = list("pca" = list("obsm" = "X_pca", "varm" = "PCs"), "umap" = list("obsm" = "X_umap")),
  colPairs_mapping = list(),
  rowPairs_mapping = list(),
  metadata_mapping = list()
)
sce
#> class: SingleCellExperiment 
#> dim: 20 10 
#> metadata(0):
#> assays(1): counts
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(0):
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(0):
#> reducedDimNames(2): pca umap
#> mainExpName: NULL
#> altExpNames(0):

Here, we explicitly provide a mapping for the reductions slot. We specify that we will store the reduction, characterized by the X_pca data in the obsm slot and the PCs data in the varm slot, as a LinearEmbeddingMatrix in the reducedDims slot under the name pca. We will also store the reduction characterized by the X_umap data in the obsm slot as a reducedDims slot under the name umap.

Convert SingleCellExperiment objects to AnnData objects

The reverse, converting SingleCellExperiment objects to AnnData objects works in a similar way. There’s an implicit conversion, where we attempt a standard conversion, but the user can always provide an explicit mapping as well.

ad <- from_SingleCellExperiment(sce)

Implicit conversion

layers

If there is no layer_mapping or x_mapping provided, we will try to guess the mapping. We will simply map all the assays in the SingleCellExperiment object to the layers slot of the AnnData object. Watch out: if there is no x_mapping provided, none of the assays will be stored in the X slot of the AnnData object, and it will remain empty.

ad_assays <- from_SingleCellExperiment(
  sce,
  obs_mapping = list(),
  var_mapping = list(),
  obsm_mapping = list(),
  varm_mapping = list(),
  obsp_mapping = list(),
  varp_mapping = list(),
  uns_mapping = list()
)
ad_assays
#> AnnData object with n_obs × n_vars = 10 × 20
#>     layers: 'counts'

obsm and varm

If there is no reduction_mapping provided, we will try to guess the mapping. This considers both the obsm_mapping and the varm_mapping arguments. By default, we will not map anything to the varm slot, as there is no direct equivalent in the SingleCellExperiment object. However, if the reducedDims slot contains a LinearEmbeddingMatrix, we will store the loadings in the varm slot.

We will store the reduced dimensions in the obsm slot, with the name of the reducedDims prepended by an X_ as the name of the obsm slot.

ad_reductions <- from_SingleCellExperiment(
  sce,
  obs_mapping = list(),
  var_mapping = list(),
  obsp_mapping = list(),
  varp_mapping = list(),
  uns_mapping = list()
)
ad_reductions
#> AnnData object with n_obs × n_vars = 10 × 20
#>     obsm: 'X_pca', 'X_umap'
#>     varm: 'pcas'
#>     layers: 'counts'

obs, var, obsp, varp, uns

The conversion of obs, var, obsp, varp and uns is straightforward: there’s a one-to-one mapping between the SingleCellExperiment slots and the AnnData slots. We assume that all colData is stored in the obs slot, all rowData is stored in the var slot, all colPairs are stored in the obsp slot and all rowPairs are stored in the varp slot, and all metadata is stored in the uns slot.

ad <- from_SingleCellExperiment(
  sce,
  x_mapping = "counts"
)
ad
#> AnnData object with n_obs × n_vars = 10 × 20
#>     obsm: 'X_pca', 'X_umap'
#>     varm: 'pcas'

Explicit conversion

It’s also possible to provide an explicit mapping for the conversion from SingleCellExperiment to AnnData. For all of the simpler mappings, such as layers_mapping, obs_mapping, var_mapping, obsp_mapping, varp_mapping and uns_mapping, you can provide a named list where the names are the names of the AnnData columns and the values are the names of the SingleCellExperiment columns in their respective slots.

The obsm_mapping and varm_mapping are a bit more complex, as they can contain multiple elements. Each of the obsm_mapping and varm_mapping elements should be a named list, where the names are the names of the AnnData columns where the values are stored. The values should be a named list as well, containing as a first element the name of the SingleCellExperiment slot where the data is stored, and as a second element the name of the slot in the SingleCellExperiment slot where the data is stored.

ad_obsm <- from_SingleCellExperiment(
  sce,
  layers_mapping = list(),
  obs_mapping = list(),
  obsm_mapping = list(X_pca = c("reducedDim", "pca"), X_umap = c("reducedDim", "umap")),
  varm_mapping = list(PCs = c("reducedDim", "pca")),
  obsp_mapping = list(),
  varp_mapping = list(),
  uns_mapping = list()
)

ad_obsm
#> AnnData object with n_obs × n_vars = 10 × 20
#>     obsm: 'X_pca', 'X_umap'
#>     varm: 'PCs'