Read/write SingleCellExperiment objects using anndataR

This vignette demonstrates how to read and write SingleCellExperiment objects using the {anndataR} package, leveraging the interoperability between SingleCellExperiment and the AnnData format.

Check out ?anndataR for a full list of the functions provided by this package.

Introduction

SingleCellExperiment is a widely used class for storing single-cell data in R, especially within the Bioconductor ecosystem. {anndataR} enables conversion between SingleCellExperiment objects and AnnData objects, allowing you to leverage the strengths of both the scverse and Bioconductor ecosystems.

Prerequisites

Before you begin, make sure you have both SingleCellExperiment and {anndataR} installed. You can install them using the following code:

if (!requireNamespace("pak", quietly = TRUE)) {
    install.packages("pak")
}
pak::pak(c("SingleCellExperiment", "SummarizedExperiment"))
pak::pak("scverse/anndataR")

Converting an AnnData Object to a SingleCellExperiment Object

Using an example .h5ad file included in the package, we will demonstrate how to read an .h5ad file and convert it to a SingleCellExperiment object.

library(anndataR)
library(SingleCellExperiment)

h5ad_file <- system.file("extdata", "example.h5ad", package = "anndataR")

Read the .h5ad file:

adata <- read_h5ad(h5ad_file)
adata
#> AnnData object with n_obs × n_vars = 50 × 100
#>     obs: 'Float', 'FloatNA', 'Int', 'IntNA', 'Bool', 'BoolNA', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'leiden'
#>     var: 'String', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
#>     uns: 'Bool', 'BoolNA', 'Category', 'DataFrameEmpty', 'Int', 'IntNA', 'IntScalar', 'Sparse1D', 'String', 'String2D', 'StringScalar', 'hvg', 'leiden', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'umap'
#>     obsm: 'X_pca', 'X_umap'
#>     varm: 'PCs'
#>     layers: 'counts', 'csc_counts', 'dense_X', 'dense_counts'
#>     obsp: 'connectivities', 'distances'

Convert to a SingleCellExperiment object:

sce_obj <- adata$to_SingleCellExperiment()
sce_obj
#> class: SingleCellExperiment 
#> dim: 100 50 
#> metadata(18): Bool BoolNA ... rank_genes_groups umap
#> assays(5): data counts csc_counts dense_X dense_counts
#> rownames(100): Gene000 Gene001 ... Gene098 Gene099
#> rowData names(11): String n_cells_by_counts ... dispersions
#>   dispersions_norm
#> colnames(50): Cell000 Cell001 ... Cell048 Cell049
#> colData names(11): Float FloatNA ... log1p_total_counts leiden
#> reducedDimNames(2): pca umap
#> mainExpName: NULL
#> altExpNames(0):

Note that there is no one-to-one mapping possible between the AnnData and SingleCellExperiment data structures, so some information might be lost during conversion. It is recommended to carefully inspect the converted object to ensure that all necessary information has been transferred.

See the in-depth vignette on conversion to and from SingleCellExperiment objects for more details on how to customize the conversion process. For instance:

adata$to_SingleCellExperiment(
  assays_mapping = list(),
  colData_mapping = list(coldata1 = "integer", coldata2 = "numeric"),
  rowData_mapping = list(rowdata1 = "character", rowdata2 = "logical")
)

Convert a SingleCellExperiment Object to an AnnData Object

Here’s an example demonstrating how to create a SingleCellExperiment object from scratch, then convert it to AnnData and save it as .h5ad

counts <- matrix(rbinom(20000, 1000, .001), nrow = 100)
sce_obj <- SingleCellExperiment(list(counts = counts))

sce_obj
#> class: SingleCellExperiment 
#> dim: 100 200 
#> metadata(0):
#> assays(1): counts
#> rownames: NULL
#> rowData names(0):
#> colnames: NULL
#> colData names(0):
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

You can convert the SingleCellExperiment object to an AnnData object using the from_SingleCellExperiment function:

adata <- from_SingleCellExperiment(sce_obj)
adata
#> AnnData object with n_obs × n_vars = 200 × 100
#>     layers: 'counts'

Again note that there is no one-to-one mapping possible between the AnnData and SingleCellExperiment data structures, so some information might be lost during conversion. It is recommended to carefully inspect the converted object to ensure that all necessary information has been transferred.

See the in-depth vignette on conversion to and from SingleCellExperiment objects for more details on how to customize the conversion process. Example:

from_SingleCellExperiment(
  sce_obj,
  x_mapping = "counts",
  layers_mapping = list()
)
#> AnnData object with n_obs × n_vars = 200 × 100

Session info

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] Seurat_5.2.1                tidyr_1.3.1                
#>  [3] dplyr_1.1.4                 purrr_1.0.4                
#>  [5] stringr_1.5.1               rprojroot_2.0.4            
#>  [7] knitr_1.49                  tibble_3.2.1               
#>  [9] rmarkdown_2.29              anndataR_0.99.0            
#> [11] SingleCellExperiment_1.29.1 SummarizedExperiment_1.37.0
#> [13] Biobase_2.67.0              GenomicRanges_1.59.1       
#> [15] GenomeInfoDb_1.43.4         IRanges_2.41.2             
#> [17] S4Vectors_0.45.2            BiocGenerics_0.53.6        
#> [19] generics_0.1.3              MatrixGenerics_1.19.1      
#> [21] matrixStats_1.5.0           SeuratObject_5.0.2         
#> [23] sp_2.2-0                    BiocStyle_2.35.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] RColorBrewer_1.1-3      sys_3.4.3               jsonlite_1.8.9         
#>   [4] magrittr_2.0.3          spatstat.utils_3.1-2    farver_2.1.2           
#>   [7] vctrs_0.6.5             ROCR_1.0-11             spatstat.explore_3.3-4 
#>  [10] htmltools_0.5.8.1       S4Arrays_1.7.2          SparseArray_1.7.5      
#>  [13] sass_0.4.9              sctransform_0.4.1       parallelly_1.42.0      
#>  [16] KernSmooth_2.23-26      bslib_0.9.0             htmlwidgets_1.6.4      
#>  [19] ica_1.0-3               plyr_1.8.9              plotly_4.10.4          
#>  [22] zoo_1.8-12              cachem_1.1.0            buildtools_1.0.0       
#>  [25] igraph_2.1.4            mime_0.12               lifecycle_1.0.4        
#>  [28] pkgconfig_2.0.3         Matrix_1.7-2            R6_2.5.1               
#>  [31] fastmap_1.2.0           GenomeInfoDbData_1.2.13 fitdistrplus_1.2-2     
#>  [34] future_1.34.0           shiny_1.10.0            digest_0.6.37          
#>  [37] colorspace_2.1-1        patchwork_1.3.0         tensor_1.5             
#>  [40] RSpectra_0.16-2         irlba_2.3.5.1           progressr_0.15.1       
#>  [43] spatstat.sparse_3.1-0   polyclip_1.10-7         httr_1.4.7             
#>  [46] abind_1.4-8             compiler_4.4.2          withr_3.0.2            
#>  [49] bit64_4.6.0-1           fastDummies_1.7.5       MASS_7.3-64            
#>  [52] DelayedArray_0.33.5     tools_4.4.2             lmtest_0.9-40          
#>  [55] httpuv_1.6.15           future.apply_1.11.3     goftest_1.2-3          
#>  [58] glue_1.8.0              nlme_3.1-167            promises_1.3.2         
#>  [61] grid_4.4.2              Rtsne_0.17              cluster_2.1.8          
#>  [64] reshape2_1.4.4          hdf5r_1.3.12            spatstat.data_3.1-4    
#>  [67] gtable_0.3.6            data.table_1.16.4       XVector_0.47.2         
#>  [70] spatstat.geom_3.3-5     RcppAnnoy_0.0.22        ggrepel_0.9.6          
#>  [73] RANN_2.6.2              pillar_1.10.1           spam_2.11-1            
#>  [76] RcppHNSW_0.6.0          later_1.4.1             splines_4.4.2          
#>  [79] lattice_0.22-6          deldir_2.0-4            survival_3.8-3         
#>  [82] bit_4.5.0.1             tidyselect_1.2.1        maketools_1.3.1        
#>  [85] miniUI_0.1.1.1          pbapply_1.7-2           gridExtra_2.3          
#>  [88] scattermore_1.2         xfun_0.50               stringi_1.8.4          
#>  [91] UCSC.utils_1.3.1        lazyeval_0.2.2          yaml_2.3.10            
#>  [94] evaluate_1.0.3          codetools_0.2-20        BiocManager_1.30.25    
#>  [97] cli_3.6.3               uwot_0.2.2              xtable_1.8-4           
#> [100] reticulate_1.40.0       munsell_0.5.1           jquerylib_0.1.4        
#> [103] Rcpp_1.0.14             spatstat.random_3.3-2   globals_0.16.3         
#> [106] png_0.1-8               spatstat.univar_3.1-1   parallel_4.4.2         
#> [109] ggplot2_3.5.1           dotCall64_1.2           listenv_0.9.1          
#> [112] viridisLite_0.4.2       scales_1.3.0            ggridges_0.5.6         
#> [115] crayon_1.5.3            rlang_1.1.5             cowplot_1.1.3