{anndataR} allows users to convert to and from SingleCellExperiment and AnnData objects. This can be done with or without extra user input as to which fields and slots of the respective objects should be converted and should be put where. Please take into account that lossless conversion is not always possible between AnnData and SingleCellExperiment, and please inspect the object before and after inspection to ensure that all data is correctly converted.
We first generate a sample dataset to work with.
ad <- generate_dataset(
n_obs = 10L,
n_var = 20L,
x_type = "numeric_matrix",
layer_types = c("integer_matrix", "numeric_rsparse"),
obs_types = c("integer", "numeric", "factor"),
var_types = c("character", "numeric", "logical"),
obsm_types = c("numeric_matrix", "numeric_csparse"),
varm_types = c("numeric_matrix", "numeric_csparse"),
obsp_types = c("numeric_matrix", "numeric_csparse"),
varp_types = c("integer_matrix", "numeric_matrix"),
uns_types = c("vec_integer", "vec_character", "df_integer"),
format = "AnnData"
)
# add PCA reduction
ad$obsm[["X_pca"]] <- matrix(1:50, 10, 5)
ad$varm[["PCs"]] <- matrix(1:100, 20, 5)
ad$obsm[["X_umap"]] <- matrix(1:20, 10, 2)
{anndataR} will try to make a reasonable guess of
which AnnData slots should end up in which SingleCellExperiment slots. A
SingleCellExperiment object (this is converted from an AnnData object)
consists of assays
, colData
,
rowData
, metadata
, reducedDims
,
colPairs
, rowPairs
and
metadata
.
Each of these slots can be customized by the user by providing a mapping. We will go more into detail on these user-specified mappings in the mapping section.
By default, anndataR will try to guess a reasonable mapping. If you do not want this to happen, and you want nothing to be converted to a slot, you can pass an empty list.
Here, we showcase what happens if you do not provide any mapping for the conversion.
sce <- to_SingleCellExperiment(ad)
sce
#> class: SingleCellExperiment
#> dim: 20 10
#> metadata(3): vec_integer vec_character df_integer
#> assays(3): counts integer_matrix numeric_rsparse
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(3): character numeric logical
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(3): integer numeric factor
#> reducedDimNames(2): pca umap
#> mainExpName: NULL
#> altExpNames(0):
In the following subsections, we detail how each of these implicit conversions work.
In an AnnData object, count matrices can be present in the
X
slot or in the layers
slot. In a
SingleCellExperiment object, count matrices are stored in the
assays
slot, as a named list.
By default, the X
slot and all the elements of the
layers
slot will be stored in the assays
slot
of the SingleCellExperiment object. We try to guess the name of the
X
slot: if there is no counts
layer present,
we will use the X
slot as the counts
assay. If
there is a counts
layer present, we will name the
X
slot the data
assay. All other layers are
stored as assays with their layer name.
In the below example, we will convert an AnnData object with a
counts
layer and two other layers to a SingleCellExperiment
object. In order for the implicit conversion to work, we will not
provide a layers_mapping
. We explicitly pass empty lists to
the other mapping arguments for clarity in the resulting object. They
ensure that nothing gets converted to the respective slots.
sce_layers <- to_SingleCellExperiment(
ad,
colData_mapping = list(),
rowData_mapping = list(),
reduction_mapping = list(),
colPairs_mapping = list(),
rowPairs_mapping = list(),
metadata_mapping = list()
)
sce_layers
#> class: SingleCellExperiment
#> dim: 20 10
#> metadata(0):
#> assays(3): counts integer_matrix numeric_rsparse
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(0):
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(0):
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
We can see that indeed, the X
slot got stored as
counts
and the layers
got stored as
assays
with the same name.
A dimensionality reduction can consist of multiple parts that are
stored separately in the AnnData object. Take for example the very
common PCA
reduction. Usually, the principal components are
stored in the obsm
slot (usually called
X_pca
), the loadings in the varm
slot (usually
called PCs
) and the explained variance in the
uns
slot (usually called pca
).
In a SingleCellExperiment object, the reduced dimensions are usually
stored in the reducedDims
slot, as either an element of a
named list, or as a LinearEmbeddingMatrix
. In the first
case, only the reduced dimensions are stored, in the second case, the
loadings and associated metadata are stored as well.
We guess the mapping of the reductions the same way as we do for the
Seurat conversion: - If the obsm
slot contains a slot
starting with X_
, we will store this as a
reducedDims
slot and no associated varm
or
uns
slots. - If the obsm
slot contains a
X_pca
slot, we will also store the associated loadings (in
varm
) in a LinearEmbeddingMatrix.
sce_dimred <- to_SingleCellExperiment(
ad,
colData_mapping = list(),
rowData_mapping = list(),
colPairs_mapping = list(),
rowPairs_mapping = list(),
metadata_mapping = list()
)
reducedDims(sce_dimred)
#> List of length 2
#> names(2): pca umap
We can see that indeed, the pca
dimred got converted to
a LinearEmbeddingMatrix
, comprising of the information in
the obsm
and varm
slots. The umap
dimred got stored as a reducedDims
slot, and consists only
of the reduced dimensions in the obsm
slot.
The other SingleCellExperiment slots are easy one-to-one mappings of
AnnData slots. We will assume that all colData
is stored in
the obs
slot, all rowData
is stored in the
var
slot, all colPairs
are stored in the
obsp
slot and all rowPairs
are stored in the
varp
slot, and all metadata
is stored in the
uns
slot.
sce_implicit <- to_SingleCellExperiment(
ad,
assays_mapping = list(),
reduction_mapping = list()
)
sce_implicit
#> class: SingleCellExperiment
#> dim: 20 10
#> metadata(3): vec_integer vec_character df_integer
#> assays(0):
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(3): character numeric logical
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(3): integer numeric factor
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
Each of the conversions can be customized, up to a point, by providing a mapping. You provide this in the form of a named list, where the names are the names of the SingleCellExperiment slots and the values are the names of the AnnData slots.
We will give an example for each of the mappings.
sce_assays <- to_SingleCellExperiment(
ad,
assays_mapping = list(counts = "X", layer1 = "integer_matrix", layer2 = "numeric_rsparse"),
colData_mapping = list(),
rowData_mapping = list(),
reduction_mapping = list(),
colPairs_mapping = list(),
rowPairs_mapping = list(),
metadata_mapping = list()
)
sce_assays
#> class: SingleCellExperiment
#> dim: 20 10
#> metadata(0):
#> assays(3): counts layer1 layer2
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(0):
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(0):
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
You can see that the X
slot got stored as
counts
, and the layers
got stored as assays
with the names layer1
and layer2
.
sce <- to_SingleCellExperiment(
ad,
assays_mapping = list(),
colData_mapping = list(coldata1 = "integer", coldata2 = "numeric"),
rowData_mapping = list(rowdata1 = "character", rowdata2 = "logical"),
reduction_mapping = list(),
colPairs_mapping = list(colPairs_dense = "numeric_matrix", colPairs_sparse = "numeric_csparse"),
rowPairs_mapping = list(rowPairs1 = "integer_matrix", rowPairs2 = "numeric_matrix"),
metadata_mapping = list(vector1 = "vec_integer", vector2 = "vec_character", df = "df_integer")
)
sce
#> class: SingleCellExperiment
#> dim: 20 10
#> metadata(3): vector1 vector2 df
#> assays(0):
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(2): rowdata1 rowdata2
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(2): coldata1 coldata2
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
Here you can see that the anndata slots (specified as values in the
mapping) get stored in the corresponding SingleCellExperiment slots (the
columns of which are the names of the mapping). e.g. the
integer
column of the AnnData obs
in the
AnnData object gets stored in the coldata1
column of the
colData
of the SingleCellExperiment object.
sce <- to_SingleCellExperiment(
ad,
assays_mapping = list(counts = "X"),
colData_mapping = list(),
rowData_mapping = list(),
reduction_mapping = list("pca" = list("obsm" = "X_pca", "varm" = "PCs"), "umap" = list("obsm" = "X_umap")),
colPairs_mapping = list(),
rowPairs_mapping = list(),
metadata_mapping = list()
)
sce
#> class: SingleCellExperiment
#> dim: 20 10
#> metadata(0):
#> assays(1): counts
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(0):
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(0):
#> reducedDimNames(2): pca umap
#> mainExpName: NULL
#> altExpNames(0):
Here, we explicitly provide a mapping for the reductions
slot. We specify that we will store the reduction, characterized by the
X_pca
data in the obsm
slot and the
PCs
data in the varm
slot, as a
LinearEmbeddingMatrix
in the reducedDims
slot
under the name pca
. We will also store the reduction
characterized by the X_umap
data in the obsm
slot as a reducedDims
slot under the name
umap
.
The reverse, converting SingleCellExperiment objects to AnnData objects works in a similar way. There’s an implicit conversion, where we attempt a standard conversion, but the user can always provide an explicit mapping as well.
If there is no layer_mapping
or x_mapping
provided, we will try to guess the mapping. We will simply map all the
assays in the SingleCellExperiment object to the layers
slot of the AnnData object. Watch out: if there is no
x_mapping
provided, none of the assays
will be
stored in the X
slot of the AnnData object, and it will
remain empty.
If there is no reduction_mapping
provided, we will try
to guess the mapping. This considers both the obsm_mapping
and the varm_mapping
arguments. By default, we will not map
anything to the varm
slot, as there is no direct equivalent
in the SingleCellExperiment object. However, if the
reducedDims
slot contains a
LinearEmbeddingMatrix
, we will store the loadings in the
varm
slot.
We will store the reduced dimensions in the obsm
slot,
with the name of the reducedDims
prepended by an
X_
as the name of the obsm
slot.
The conversion of obs
, var
,
obsp
, varp
and uns
is
straightforward: there’s a one-to-one mapping between the
SingleCellExperiment slots and the AnnData slots. We assume that all
colData
is stored in the obs
slot, all
rowData
is stored in the var
slot, all
colPairs
are stored in the obsp
slot and all
rowPairs
are stored in the varp
slot, and all
metadata
is stored in the uns
slot.
It’s also possible to provide an explicit mapping for the conversion
from SingleCellExperiment to AnnData. For all of the simpler mappings,
such as layers_mapping
, obs_mapping
,
var_mapping
, obsp_mapping
,
varp_mapping
and uns_mapping
, you can provide
a named list where the names are the names of the AnnData columns and
the values are the names of the SingleCellExperiment columns in their
respective slots.
The obsm_mapping
and varm_mapping
are a bit
more complex, as they can contain multiple elements. Each of the
obsm_mapping
and varm_mapping
elements should
be a named list, where the names are the names of the AnnData columns
where the values are stored. The values should be a named list as well,
containing as a first element the name of the SingleCellExperiment slot
where the data is stored, and as a second element the name of the slot
in the SingleCellExperiment slot where the data is stored.
ad_obsm <- from_SingleCellExperiment(
sce,
layers_mapping = list(),
obs_mapping = list(),
obsm_mapping = list(X_pca = c("reducedDim", "pca"), X_umap = c("reducedDim", "umap")),
varm_mapping = list(PCs = c("reducedDim", "pca")),
obsp_mapping = list(),
varp_mapping = list(),
uns_mapping = list()
)
ad_obsm
#> AnnData object with n_obs × n_vars = 10 × 20
#> obsm: 'X_pca', 'X_umap'
#> varm: 'PCs'