--- title: "Detailed look into the mapping between AnnData and SingleCellExperiment objects" author: - name: Louise Deconinck package: anndataR output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Detailed look into the mapping between AnnData and SingleCellExperiment objects} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` **{anndataR}** allows users to convert to and from SingleCellExperiment and AnnData objects. This can be done with or without extra user input as to which fields and slots of the respective objects should be converted and should be put where. Please take into account that lossless conversion is not always possible between AnnData and SingleCellExperiment, and please inspect the object before and after inspection to ensure that all data is correctly converted. ```{r deps, message=FALSE, warning=FALSE} library(anndataR) library(SingleCellExperiment) ``` We first generate a sample dataset to work with. ```{r construct_anndata} ad <- generate_dataset( n_obs = 10L, n_var = 20L, x_type = "numeric_matrix", layer_types = c("integer_matrix", "numeric_rsparse"), obs_types = c("integer", "numeric", "factor"), var_types = c("character", "numeric", "logical"), obsm_types = c("numeric_matrix", "numeric_csparse"), varm_types = c("numeric_matrix", "numeric_csparse"), obsp_types = c("numeric_matrix", "numeric_csparse"), varp_types = c("integer_matrix", "numeric_matrix"), uns_types = c("vec_integer", "vec_character", "df_integer"), format = "AnnData" ) # add PCA reduction ad$obsm[["X_pca"]] <- matrix(1:50, 10, 5) ad$varm[["PCs"]] <- matrix(1:100, 20, 5) ad$obsm[["X_umap"]] <- matrix(1:20, 10, 2) ``` # Convert AnnData objects to SingleCellExperiment objects ## Implicit conversion **{anndataR}** will try to make a reasonable guess of which AnnData slots should end up in which SingleCellExperiment slots. A SingleCellExperiment object (this is converted from an AnnData object) consists of `assays`, `colData`, `rowData`, `metadata`, `reducedDims`, `colPairs`, `rowPairs` and `metadata`. Each of these slots can be customized by the user by providing a mapping. We will go more into detail on these user-specified mappings in the mapping section. By default, anndataR will try to guess a reasonable mapping. If you do not want this to happen, and you want nothing to be converted to a slot, you can pass an empty list. Here, we showcase what happens if you do not provide any mapping for the conversion. ```{r implicit} sce <- to_SingleCellExperiment(ad) sce ``` In the following subsections, we detail how each of these implicit conversions work. ### assays In an AnnData object, count matrices can be present in the `X` slot or in the `layers` slot. In a SingleCellExperiment object, count matrices are stored in the `assays` slot, as a named list. By default, the `X` slot and all the elements of the `layers` slot will be stored in the `assays` slot of the SingleCellExperiment object. We try to guess the name of the `X` slot: if there is no `counts` layer present, we will use the `X` slot as the `counts` assay. If there is a `counts` layer present, we will name the `X` slot the `data` assay. All other layers are stored as assays with their layer name. In the below example, we will convert an AnnData object with a `counts` layer and two other layers to a SingleCellExperiment object. In order for the implicit conversion to work, we will not provide a `layers_mapping`. We explicitly pass empty lists to the other mapping arguments for clarity in the resulting object. They ensure that nothing gets converted to the respective slots. ```{r implicit_layers} sce_layers <- to_SingleCellExperiment( ad, colData_mapping = list(), rowData_mapping = list(), reduction_mapping = list(), colPairs_mapping = list(), rowPairs_mapping = list(), metadata_mapping = list() ) sce_layers ``` We can see that indeed, the `X` slot got stored as `counts` and the `layers` got stored as `assays` with the same name. ### reductions A dimensionality reduction can consist of multiple parts that are stored separately in the AnnData object. Take for example the very common `PCA` reduction. Usually, the principal components are stored in the `obsm` slot (usually called `X_pca`), the loadings in the `varm` slot (usually called `PCs`) and the explained variance in the `uns` slot (usually called `pca`). In a SingleCellExperiment object, the reduced dimensions are usually stored in the `reducedDims` slot, as either an element of a named list, or as a `LinearEmbeddingMatrix`. In the first case, only the reduced dimensions are stored, in the second case, the loadings and associated metadata are stored as well. We guess the mapping of the reductions the same way as we do for the Seurat conversion: - If the `obsm` slot contains a slot starting with `X_`, we will store this as a `reducedDims` slot and no associated `varm` or `uns` slots. - If the `obsm` slot contains a `X_pca` slot, we will also store the associated loadings (in `varm`) in a LinearEmbeddingMatrix. ```{r implicit_pca} sce_dimred <- to_SingleCellExperiment( ad, colData_mapping = list(), rowData_mapping = list(), colPairs_mapping = list(), rowPairs_mapping = list(), metadata_mapping = list() ) reducedDims(sce_dimred) ``` We can see that indeed, the `pca` dimred got converted to a `LinearEmbeddingMatrix`, comprising of the information in the `obsm` and `varm` slots. The `umap` dimred got stored as a `reducedDims` slot, and consists only of the reduced dimensions in the `obsm` slot. ### colData, rowData, colPairs, rowPairs, metadata The other SingleCellExperiment slots are easy one-to-one mappings of AnnData slots. We will assume that all `colData` is stored in the `obs` slot, all `rowData` is stored in the `var` slot, all `colPairs` are stored in the `obsp` slot and all `rowPairs` are stored in the `varp` slot, and all `metadata` is stored in the `uns` slot. ```{r implicit_easy} sce_implicit <- to_SingleCellExperiment( ad, assays_mapping = list(), reduction_mapping = list() ) sce_implicit ``` ## Explicit conversion Each of the conversions can be customized, up to a point, by providing a mapping. You provide this in the form of a named list, where the names are the names of the SingleCellExperiment slots and the values are the names of the AnnData slots. We will give an example for each of the mappings. ### assays_mapping ```{r explicit_assays} sce_assays <- to_SingleCellExperiment( ad, assays_mapping = list(counts = "X", layer1 = "integer_matrix", layer2 = "numeric_rsparse"), colData_mapping = list(), rowData_mapping = list(), reduction_mapping = list(), colPairs_mapping = list(), rowPairs_mapping = list(), metadata_mapping = list() ) sce_assays ``` You can see that the `X` slot got stored as `counts`, and the `layers` got stored as assays with the names `layer1` and `layer2`. ### colData, rowData, colPairs, rowPairs, metadata ```{r explicit col_data} sce <- to_SingleCellExperiment( ad, assays_mapping = list(), colData_mapping = list(coldata1 = "integer", coldata2 = "numeric"), rowData_mapping = list(rowdata1 = "character", rowdata2 = "logical"), reduction_mapping = list(), colPairs_mapping = list(colPairs_dense = "numeric_matrix", colPairs_sparse = "numeric_csparse"), rowPairs_mapping = list(rowPairs1 = "integer_matrix", rowPairs2 = "numeric_matrix"), metadata_mapping = list(vector1 = "vec_integer", vector2 = "vec_character", df = "df_integer") ) sce ``` Here you can see that the anndata slots (specified as values in the mapping) get stored in the corresponding SingleCellExperiment slots (the columns of which are the names of the mapping). e.g. the `integer` column of the AnnData `obs` in the AnnData object gets stored in the `coldata1` column of the `colData` of the SingleCellExperiment object. ### reductions_mapping ```{r explicit_reductions} sce <- to_SingleCellExperiment( ad, assays_mapping = list(counts = "X"), colData_mapping = list(), rowData_mapping = list(), reduction_mapping = list("pca" = list("obsm" = "X_pca", "varm" = "PCs"), "umap" = list("obsm" = "X_umap")), colPairs_mapping = list(), rowPairs_mapping = list(), metadata_mapping = list() ) sce ``` Here, we explicitly provide a mapping for the `reductions` slot. We specify that we will store the reduction, characterized by the `X_pca` data in the `obsm` slot and the `PCs` data in the `varm` slot, as a `LinearEmbeddingMatrix` in the `reducedDims` slot under the name `pca`. We will also store the reduction characterized by the `X_umap` data in the `obsm` slot as a `reducedDims` slot under the name `umap`. # Convert SingleCellExperiment objects to AnnData objects The reverse, converting SingleCellExperiment objects to AnnData objects works in a similar way. There's an implicit conversion, where we attempt a standard conversion, but the user can always provide an explicit mapping as well. ```{r construct sce} ad <- from_SingleCellExperiment(sce) ``` ## Implicit conversion ### layers If there is no `layer_mapping` or `x_mapping` provided, we will try to guess the mapping. We will simply map all the assays in the SingleCellExperiment object to the `layers` slot of the AnnData object. Watch out: if there is no `x_mapping` provided, none of the `assays` will be stored in the `X` slot of the AnnData object, and it will remain empty. ```{r to_ad_implicit_assays} ad_assays <- from_SingleCellExperiment( sce, obs_mapping = list(), var_mapping = list(), obsm_mapping = list(), varm_mapping = list(), obsp_mapping = list(), varp_mapping = list(), uns_mapping = list() ) ad_assays ``` ### obsm and varm If there is no `reduction_mapping` provided, we will try to guess the mapping. This considers both the `obsm_mapping` and the `varm_mapping` arguments. By default, we will not map anything to the `varm` slot, as there is no direct equivalent in the SingleCellExperiment object. However, if the `reducedDims` slot contains a `LinearEmbeddingMatrix`, we will store the loadings in the `varm` slot. We will store the reduced dimensions in the `obsm` slot, with the name of the `reducedDims` prepended by an `X_` as the name of the `obsm` slot. ```{r to_ad_implicit_reductions} ad_reductions <- from_SingleCellExperiment( sce, obs_mapping = list(), var_mapping = list(), obsp_mapping = list(), varp_mapping = list(), uns_mapping = list() ) ad_reductions ``` ### obs, var, obsp, varp, uns The conversion of `obs`, `var`, `obsp`, `varp` and `uns` is straightforward: there's a one-to-one mapping between the SingleCellExperiment slots and the AnnData slots. We assume that all `colData` is stored in the `obs` slot, all `rowData` is stored in the `var` slot, all `colPairs` are stored in the `obsp` slot and all `rowPairs` are stored in the `varp` slot, and all `metadata` is stored in the `uns` slot. ```{r implicit_all} ad <- from_SingleCellExperiment( sce, x_mapping = "counts" ) ad ``` ## Explicit conversion It's also possible to provide an explicit mapping for the conversion from SingleCellExperiment to AnnData. For all of the simpler mappings, such as `layers_mapping`, `obs_mapping`, `var_mapping`, `obsp_mapping`, `varp_mapping` and `uns_mapping`, you can provide a named list where the names are the names of the AnnData columns and the values are the names of the SingleCellExperiment columns in their respective slots. The `obsm_mapping` and `varm_mapping` are a bit more complex, as they can contain multiple elements. Each of the `obsm_mapping` and `varm_mapping` elements should be a named list, where the names are the names of the AnnData columns where the values are stored. The values should be a named list as well, containing as a first element the name of the SingleCellExperiment slot where the data is stored, and as a second element the name of the slot in the SingleCellExperiment slot where the data is stored. ```{r to_ad_explicit_obsm} ad_obsm <- from_SingleCellExperiment( sce, layers_mapping = list(), obs_mapping = list(), obsm_mapping = list(X_pca = c("reducedDim", "pca"), X_umap = c("reducedDim", "umap")), varm_mapping = list(PCs = c("reducedDim", "pca")), obsp_mapping = list(), varp_mapping = list(), uns_mapping = list() ) ad_obsm ```