--- title: "Detailed look into the mapping between AnnData and Seurat objects" author: - name: Louise Deconinck package: anndataR output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Detailed look into the mapping between AnnData and Seurat objects} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` **{anndataR}** provides a way to convert between `AnnData` and `Seurat` objects. This vignette will provide a detailed look into the mapping between the two objects. The conversion can be done with or without extra user input as to which fields and slots of the respective objects should be converted and put into the other object. Please take into account that lossless conversion is not always possible between AnnData and Seurat objects due to differences in the object structure. Have a look at our known issues section for more information and please inspect the object before and after conversion to make sure that the conversion was successful. ```{r deps} suppressMessages({ library(anndataR) library(Seurat) }) ``` We will first generate a sample dataset to work with: ```{r construct_anndata} ad <- generate_dataset( n_obs = 10L, n_var = 20L, x_type = "numeric_matrix", layer_types = c("integer_matrix", "numeric_rsparse"), obs_types = c("integer", "numeric", "factor"), var_types = c("character", "numeric", "logical"), obsm_types = c("numeric_matrix", "numeric_csparse"), varm_types = c("numeric_matrix", "numeric_csparse"), obsp_types = c("numeric_matrix", "numeric_csparse"), varp_types = c("integer_matrix", "numeric_matrix"), uns_types = c("vec_integer", "vec_character", "df_integer"), format = "AnnData" ) # add PCA reduction ad$obsm[["X_pca"]] <- matrix(1:50, 10, 5) ad$varm[["PCs"]] <- matrix(1:100, 20, 5) ad$obsm[["X_umap"]] <- matrix(1:20, 10, 2) ad$obsp[["connectivities"]] <- ad$obsp[["numeric_matrix"]] ad$obsp[["connectivities_sparse"]] <- ad$obsp[["numeric_csparse"]] ``` # Convert AnnData objects to Seurat objects By default, anndataR will try to guess a reasonable mapping. If you do not want this to happen, and you want nothing of that specific slot to be converted, you can pass an empty list. We will first detail how anndataR guesses a reasonable mapping. In the section after that, we will detail how you can provide your own mappings. ## Implicit mapping Here, we showcase what happens if you do not provide any mapping information for the conversion. ```{r to_Seurat_implicit} seurat_obj <- to_Seurat(ad) seurat_obj ``` In the following subsections, we detail how each of these implicit conversions work. If we explicitly don't want to convert a certain slot, we can pass an empty list. ### assay_name An anndata object can only have one assay, while a Seurat object can have multiple assays. If you want to work with multiple assays, you can use (https://mudata.readthedocs.io/en/latest/)[MuData] objects. The `assay_name` parameter will only determine what the name of the assay in the Seurat object will be. By default, this will be "RNA". ```{r to_Seurat explicit} to_Seurat( adata = ad, assay_name = "Other_assay", object_metadata_mapping = list(), layers_mapping = list(counts = NULL), assay_metadata_mapping = list(), reduction_mapping = list(), graph_mapping = list(), misc_mapping = list() ) ``` ### object_metadata_mapping Seurat allows for metadata to be stored on the object level, corresponding to cell (or observation) level metadata. This corresponds with the `obs` slot in the AnnData object. If no information is provided, anndataR will try to convert the whole `obs` slot. You can see that Seurat adds some columns by default (`orig.ident`, `nCount_RNA`, `nFeature_RNA`). ```{r to_Seurat_implicit_object_metadata} seurat_object <- to_Seurat( adata = ad, assay_name = "RNA", layers_mapping = list(counts = NULL), assay_metadata_mapping = list(), reduction_mapping = list(), graph_mapping = list(), misc_mapping = list() ) seurat_object[[]] ``` ### layers_mapping Seurat allows for multiple layers to be stored in the object. This corresponds with the `layers` slot in the AnnData object. If no mapping is provided, we will try to guess which layer the `X` anndata slot corresponds to by checking whether another layer called `counts` or `data` exists in the AnnData object. The other layers of the AnnData object will be copied by name. ```{r to_Seurat_implicit_layers} seurat_object <- to_Seurat( adata = ad, assay_name = "RNA", object_metadata_mapping = list(), assay_metadata_mapping = list(), reduction_mapping = list(), graph_mapping = list(), misc_mapping = list() ) Layers(seurat_object[["RNA"]]) ``` ### assay_metadata_mapping Besides metadata on the object level, Seurat also stores metadata on the assay level. This corresponds to gene (or variable or feature) level metadata. This corresponds with the `var` slot in the AnnData object. If no information is provided, anndataR will try to convert the whole `var` slot. ```{r to_Seurat_implicit_assay_metadata} seurat_object <- to_Seurat( adata = ad, assay_name = "RNA", object_metadata_mapping = list(), layers_mapping = list(counts = NULL), reduction_mapping = list(), graph_mapping = list(), misc_mapping = list() ) seurat_object[["RNA"]][[]] ``` ### reduction_mapping Reductions are a bit more complicated. Some reductions have a standard way of being saved in an AnnData objecct, some don't. We assume that a PCA will always be stored as `X_pca` in the `obsm` slot, with the loadings as `PCs` in the `varm` slot. For other reductions, we will assume that they will start with `X_` in the `obsm` slot, with the characters after the `X_` being the name of the reduction. Elements in the `obsm` slot not starting with `X_` will not be converted. ```{r to_Seurat_implicit_reduction} seurat_object <- to_Seurat( adata = ad, assay_name = "RNA", object_metadata_mapping = list(), layers_mapping = list(counts = NULL), assay_metadata_mapping = list(), graph_mapping = list(), misc_mapping = list() ) Reductions(seurat_object) ``` ### graph_mapping Graphs of cells are stored in the `obsp` slot in the AnnData object. If a graph is stored with the name `connectivities`, we will store it as `nn` in the Seurat object. Graphs starting with `connectivities_{name}` will be stored as `{name}` in the Seurat object. Other graphs will not be converted. ```{r to_Seurat_implicit_graph} seurat_object <- to_Seurat( adata = ad, assay_name = "RNA", object_metadata_mapping = list(), layers_mapping = list(counts = NULL), assay_metadata_mapping = list(), reduction_mapping = list(), misc_mapping = list() ) Graphs(seurat_object) ``` ### misc_mapping The misc slot in the Seurat object is used to store any other information that does not fit in the other slots. By default, it will copy the `uns` slot from the AnnData object. ```{r to_Seurat_implicit_misc} seurat_object <- to_Seurat( adata = ad, assay_name = "RNA", object_metadata_mapping = list(), layers_mapping = list(counts = NULL), assay_metadata_mapping = list(), reduction_mapping = list(), graph_mapping = list() ) Misc(seurat_object) ``` ## Explicit mapping ### assay_name As stated in the section on implicit mapping, an anndata object can only have one assay, while a Seurat object can have multiple assays. The name you pass here will only determine what the assay is called in the Seurat object. By default, it is "RNA". ```{r to_Seurat explicit_assay_name} seurat_obj <- to_Seurat( adata = ad, assay_name = "Other_assay" ) Assays(seurat_obj) ``` ### layers_mapping We can explicitly provide a layers mapping. The names of the named list refer to what the layer will be called in the Seurat object. The values refer to what the layers are called in the anndata object. You must provide at least a layer called `counts` or `data` in the mapping. ```{r to_Seurat explicit_layers} seurat_obj <- to_Seurat( adata = ad, layers_mapping = list(counts = "integer_matrix", other_layer2 = "numeric_rsparse") ) Layers(seurat_obj, assay = "RNA") ``` ### object_metadata_mapping You can provide a mapping for object-level metadata in Seurat, which corresponds to cell-level metadata, and thus with the `obs` slot in the AnnData object. The names of the named list refer to what the metadata will be called in the Seurat object. The values refer to what the columns are called in the `obs` slot of the anndata object. ```{r to_Seurat explicit_object_metadata} seurat_obj <- to_Seurat( adata = ad, object_metadata_mapping = list(metadata1 = "integer", metadata2 = "numeric") ) seurat_obj[[]] ``` ### assay_metadata_mapping You can provide a mapping for assay-level metadata, which corresponds to gene-level metadata, and with the `var` slot in the AnnData object. The names of the named list refer to what the metadata will be called in the Seurat object. The values refer to what the metadata is called in the `var` slot of the anndata object. ```{r to_Seurat explicit_assay_metadata} seurat_obj <- to_Seurat( adata = ad, assay_metadata_mapping = list(metadata1 = "numeric", metadata2 = "logical") ) seurat_obj[["RNA"]][[]] ``` ### reduction_mapping You can provide a mapping for the `reductions` slot. The names of the named list refer to what the reduction will be called. The values must be a named list, containing the keys: `key`, `obsm` and maybe `varm`, if appropriate for the reduction. ```{r to_Seurat explicit_reduction} seurat_obj <- to_Seurat( adata = ad, reduction_mapping = list( pca = list(key = "PC", obsm = "X_pca", varm = "PCs"), umap = list(key = "UMAP", obsm = "X_umap") ) ) Reductions(seurat_obj) ``` ### graph_mapping You can provide a mapping for the `graph` slot. The names of the named list refer to what the graph will be called in the Seurat object. The values refer to names in the `obsp` slot of the anndata object. stored in the `obsp` slot. ```{r to_Seurat explicit_graph} seurat_obj <- to_Seurat( adata = ad, graph_mapping = list(connectivities = "numeric_matrix", connectivities_sparse = "numeric_csparse") ) Graphs(seurat_obj) ``` ### misc_mapping You must provide a named list. Names of the list correspond to the name the item will be assigned to in the `misc` of the Seurat object). Items can be a vector with two elements. The first is the name of a slot in the AnnData object (e.g. `X`, `layers`, `obs`, `var`, `obsm`, `varm`, `obsp`, `varp`, `uns`) and the second is the name of the item to transfer from that slot. ```{r to_Seurat explicit_misc1} seurat_obj <- to_Seurat( adata = ad, misc_mapping = list(varp_neighbors = c("varp", "integer_matrix"), uns_vec = c("uns", "vec_integer")) ) Misc(seurat_obj) ``` Giving an item with one element can be used to copy the whole AnnData slot to `misc`. ```{r to_Seurat explicit_misc2} seurat_obj <- to_Seurat( adata = ad, misc_mapping = list(metadata_from_anndata = "uns", varp = "varp") ) Misc(seurat_obj) ``` Of course you can also mix these approaches in a single mapping. ```{r to_Seurat explicit_misc3} seurat_obj <- to_Seurat( adata = ad, misc_mapping = list(metadata_from_anndata = "uns", obs_num = c("obs", "numeric")) ) Misc(seurat_obj) ``` # Convert Seurat objects to AnnData objects ## Implicit mapping The reverse, converting Seurat objects to AnnData objects, works in a similar way. There is an implicit conversion, where we attempt a standard conversion, but the user can always provide an explicit mapping. ```{r create_Seurat} suppressWarnings({ counts <- matrix(rbinom(600, 20000, .001), nrow = 20) obj <- CreateSeuratObject(counts = counts) |> NormalizeData() |> FindVariableFeatures() |> ScaleData() |> RunPCA(npcs = 10L) |> FindNeighbors() |> RunUMAP(dims = 1:10) obj@misc[["data"]] <- "some data" }) ``` ```{r from_Seurat_implicit} ad <- from_Seurat(obj) ad ``` ### assay_name The `assay_name` argument determines which assay in the Seurat object will be converted to the AnnData object. By default it's the `DefaultAssay`. ```{r fsi_assay_name} ad <- from_Seurat( obj, assay_name = NULL, layers_mapping = list(), obs_mapping = list(), var_mapping = list(), obsm_mapping = list(), varm_mapping = list(), obsp_mapping = list(), varp_mapping = list(), uns_mapping = list() ) ad ``` ### x_mapping By default, no data will be put into the `X` slot of the AnnData object. If you want to put the data from a Seurat layer, such as `counts` or `data` into the `X` slot, you need to provide a mapping. ```{r fsi_x_mapping} ad <- from_Seurat( obj, assay_name = "RNA", layers_mapping = list(), obs_mapping = list(), var_mapping = list(), obsm_mapping = list(), varm_mapping = list(), obsp_mapping = list(), varp_mapping = list(), uns_mapping = list() ) ad$X ``` As you can see, by default, no information was copied to the `X` slot of the AnnData object. ### layers_mapping By default, all layers in the Seurat object will be copied to the AnnData object. This means that the `X` slot will be `NULL` (empty). ```{r fsi_layers_mapping} ad <- from_Seurat( obj, obs_mapping = list(), var_mapping = list(), obsm_mapping = list(), varm_mapping = list(), obsp_mapping = list(), varp_mapping = list(), uns_mapping = list() ) names(ad$layers) ``` ### obs_mapping The obs slot is used to store observation-level annotations, such as cell-level metadata. By default, all information in the `meta.data` slot of the Seurat object is copied to the `obs` slot of the AnnData object. ```{r fsi_obs_mapping} ad <- from_Seurat( obj, layers_mapping = list(), var_mapping = list(), obsm_mapping = list(), varm_mapping = list(), obsp_mapping = list(), varp_mapping = list(), uns_mapping = list() ) names(ad$obs) ``` ### var_mapping The var slot is used to store feature-level annotations, such as gene-level metadata. By default, all information in the `meta.data` slot of the specified assay of the Seurat object is copied to the `var` slot of the AnnData object. ```{r fsi_var_mapping} ad <- from_Seurat( obj, layers_mapping = list(), obs_mapping = list(), obsm_mapping = list(), varm_mapping = list(), obsp_mapping = list(), varp_mapping = list(), uns_mapping = list() ) names(ad$var) ``` ### obsm_mapping The obsm slot is used to store multidimensional observation-level annotations, such as dimensionality reductions. By default, it will prefix the name of all Seurat reductions with `X_`. ```{r fsi_obsm_mapping} ad <- from_Seurat( obj, layers_mapping = list(), obs_mapping = list(), var_mapping = list(), varm_mapping = list(), obsp_mapping = list(), varp_mapping = list(), uns_mapping = list() ) ad ``` ### varm_mapping The varm slot is used to store multidimensional feature-level annotations. This is mainly used to store the loadings of dimensionality reductions. By default, all loadings of dimensionality reductions will be copied to the `varm` slot of the AnnData object. ```{r fsi_varm_mapping} ad <- from_Seurat( obj, layers_mapping = list(), obs_mapping = list(), var_mapping = list(), obsm_mapping = list(), obsp_mapping = list(), varp_mapping = list(), uns_mapping = list() ) ad$varm ``` ### obsp_mapping The obsp slot is used to store pairwise relations between observations. This is mainly used to store the connectivities of a graph. By default, all Seurat graphs are copied to the `obsp` slot of the AnnData object. ```{r fsi_obsp_mapping} ad <- from_Seurat( obj, layers_mapping = list(), obs_mapping = list(), var_mapping = list(), obsm_mapping = list(), varm_mapping = list(), varp_mapping = list(), uns_mapping = list() ) ad$obsp ``` ### varp_mapping The varp slot is used to store pairwise relations between features. By default, no information is copied to the `varp` slot of the AnnData object. ```{r fsi_varp_mapping} ad <- from_Seurat( obj, layers_mapping = list(), obs_mapping = list(), var_mapping = list(), obsm_mapping = list(), varm_mapping = list(), obsp_mapping = list(), uns_mapping = list() ) ad$varp ``` ### uns_mapping The uns slot is used for any non-structured miscellaneous data. By default all information in the `misc` slot of the Seurat object is copied to the `uns` slot of the AnnData object. ```{r fsi_uns_mapping} ad <- from_Seurat( obj, layers_mapping = list(), obs_mapping = list(), var_mapping = list(), obsm_mapping = list(), varm_mapping = list(), obsp_mapping = list(), varp_mapping = list(), ) ad$uns ``` ## Explicit mapping It's also possible to provide an explicit mapping for the conversion from Seurat to AnnData objects. For `layers_mapping`, `obs_mapping` and `var_mapping`, you can provide a named list with the names of the items in the AnnData object as names and the names of the items in the Seurat object as values. ```{r fse_explicit} ad <- from_Seurat( obj, layers_mapping = list(counts = "counts", data = "data"), obs_mapping = list(metadata1 = "orig.ident", metadata2 = "nCount_RNA"), var_mapping = list(metadata_mean = "vf_vst_counts_mean", metadata_variance = "vf_vst_counts_variance"), obsm_mapping = list(), varm_mapping = list(), obsp_mapping = list(), varp_mapping = list(), uns_mapping = list() ) ad ``` ### obsm_mapping, varm_mapping, obsp_mapping, varp_mapping, uns_mapping The mappings for `obsm`, `varm` and `obsp` all work in the same way. You need to provide a named list, where the name of the list refers to the name of the item in that slot of the AnnData object. The values of the list must themselves be vectors of length 2 (for `obsm`) or 2 or 3 (for `varm`, `obsp`, `varp` and `uns`). The first element of the vector refers to a slot in the the Seurat object and the second element refers to the name of the item in that slot. The third element (if provided) refers to the name of a column in that item. For `obsm`, the first element must be `reductions` or `misc`. For `varm`, the first element must be `reductions` or `misc`. For `obsp`, the first element must be `graphs` or `misc`. For `varp`, the first element must be `misc`. For `uns`, the first element must be `misc`. ```{r fse_explicit_obsm} ad <- from_Seurat( obj, layers_mapping = list(), obs_mapping = list(), var_mapping = list(), obsm_mapping = list(X_pca = c("reductions", "pca"), X_umap = c("reductions", "umap")), varm_mapping = list(PCs = c("reductions", "pca")), obsp_mapping = list(connectivities = c("graphs", "RNA_nn")), varp_mapping = list(), uns_mapping = list(extra_data = c("misc", "data")) ) ad ```