Known issues

This document lists known issues with the package and suggests possible solutions.

Issue: None’s are being dropped from uns

  • Affected backend: HDF5AnnData
  • Affected slot(s): uns, uns_nested
  • Affected dtype(s): empty, none
  • Probable cause: read
  • To investigate: TRUE
  • To fix: TRUE

Error message

Error: names(adata_r$uns) (`actual`) not equal to reticulate::py_to_r(adata_py$uns) (`expected`).

`actual` is NULL
`expected` is a list

Proposed solution

Debug and fix

Issue: Python object is not being converted correctly.

  • Affected backend: HDF5AnnData
  • Affected slot(s): uns, uns_nested
  • Affected dtype(s): categorical, categorical_missing_values, categorical_ordered, categorical_ordered_missing_values
  • Probable cause: reticulate
  • To investigate: TRUE
  • To fix: TRUE

Error message

<python.builtin.AttributeError/python.builtin.Exception/python.builtin.BaseException/python.builtin.object/error/condition>
Error in `py_get_attr(x, name)`: AttributeError: 'Categorical' object has no attribute 'get_values'. Did you mean: 'sort_values'?
Run `reticulate::py_last_error()` for details.
Backtrace:
    ▆
  1. ├─testthat::expect_equal(adata_r$uns[[name]], reticulate::py_to_r(adata_py$uns[[name]])) at test-roundtrip-uns.R:80:5
  2. │ └─testthat::quasi_label(enquo(expected), expected.label, arg = "expected")
  3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
  4. ├─reticulate::py_to_r(adata_py$uns[[name]])
  5. └─reticulate:::py_to_r.pandas.core.arrays.categorical.Categorical(adata_py$uns[[name]])
  6.   ├─reticulate::py_to_r(x$get_values())
  7.   │ ├─reticulate::is_py_object(x <- py_to_r_cpp(x))
  8.   │ └─reticulate:::py_to_r_cpp(x)
  9.   ├─x$get_values
10.   └─reticulate:::`$.python.builtin.object`(x, "get_values")
11.     └─reticulate:::py_get_attr_or_item(x, name, TRUE)
12.       └─reticulate::py_get_attr(x, name)

Proposed solution

Debug and fix

Issue: Python object is not being converted correctly.

  • Affected backend: HDF5AnnData
  • Affected slot(s): uns, uns_nested
  • Affected dtype(s): nullable_boolean_array, nullable_integer_array
  • Probable cause: reticulate
  • To investigate: TRUE
  • To fix: TRUE

Error message

adata_r$uns[[name]] (`actual`) not equal to reticulate::py_to_r(adata_py$uns[[name]]) (`expected`).

`actual` is a logical vector (NA, FALSE, TRUE, FALSE, TRUE, ...)
`expected` is an S3 object of class <pandas.core.arrays.boolean.BooleanArray/pandas.core.arrays.masked.BaseMaskedArray/pandas.core.arraylike.OpsMixin/pandas.core.arrays.base.ExtensionArray/python.builtin.object>, an environment

Proposed solution

Debug and fix

Issue: The data type is different after the roundtrip test.

  • Affected backend: HDF5AnnData
  • Affected slot(s): uns, uns_nested
  • Affected dtype(s): boolean, char, float, integer, nan, string
  • Probable cause: write
  • To investigate: TRUE
  • To fix: TRUE

Error message

bi$type(a) (`actual`) not equal to bi$type(b) (`expected`).

`attr(actual, 'py_object')$pyobj` is <pointer: 0x7f4af9694d00>
`attr(expected, 'py_object')$pyobj` is <pointer: 0x7f4af9f5eca0>
Backtrace:
    ▆
1. └─anndataR:::expect_equal_py(...) at test-roundtrip-uns.R:109:5
2.   └─testthat::expect_equal(bi$type(a), bi$type(b)) at tests/testthat/helper-expect_equal_py.R:7:3

Proposed solution

Debug and fix

Issue: hdf5py writes the attribute as a H5T_STD_I64LE, hdf5r writes it as H5T_STD_I32LE.

  • Affected backend: HDF5AnnData
  • Affected slot(s): X
  • Affected dtype(s): float_csparse, float_csparse_nas
  • Probable cause: h5diff
  • To investigate: TRUE
  • To fix: TRUE

Error message

Warning: different storage datatype
<shape> has file datatype H5T_STD_I64LE
<shape> has file datatype H5T_STD_I32LE
attribute: <shape of </X>> and <shape of </X>>  

Proposed solution

We should investigate if we can specify the type with which an attribute should be written.

Issue: hdf5py has max dimensions as 2^64 - 1, the max val for an unsigned int. hdf5r has it as the actual value

  • Affected backend: HDF5AnnData
  • Affected slot(s): X, obsm, varm, obsp, varp
  • Affected dtype(s): float_csparse, float_csparse_nas, float_rsparse, float_rsparse_nas
  • Probable cause: h5diff
  • To investigate: TRUE
  • To fix: FALSE

Error message

dataset: </X/data> and </X/data>
Not comparable: </X/data> has rank 1, dimensions [200], max dimensions [18446744073709551615]
and </X/data> has rank 1, dimensions [108], max dimensions [108]
0 differences found
dataset: </X/indices> and </X/indices>
Not comparable: </X/indices> has rank 1, dimensions [200], max dimensions [18446744073709551615]
and </X/indices> has rank 1, dimensions [108], max dimensions [108]
0 differences found
dataset: </X/indptr> and </X/indptr>
Warning: different maximum dimensions
</X/indptr> has max dimensions [18446744073709551615]
</X/indptr> has max dimensions [21]

Proposed solution

We should investigate if something goes wrong with h5py, but I think hdf5 provides the expected behaviour.

Issue: hdf5py writes a nullable integer array with type H5T_SGN_2, hdf5r writes with type H5T_SGN_NONE

  • Affected backend: HDF5AnnData
  • Affected slot(s): obs, var
  • Affected dtype(s): integer_with_nas
  • Probable cause: h5diff
  • To investigate: TRUE
  • To fix: TRUE

Error message

dataset: </var/nullable_integer_array/mask> and </var/integer_with_nas/mask>
Warning: different storage datatype
Not comparable: </var/nullable_integer_array/mask> has sign H5T_SGN_2 and </var/integer_with_nas/mask> has sign H5T_SGN_NONE
0 differences found

Proposed solution

We should investigate if we can specify the type with which an attribute should be written.

Issue: hdf5py writes a nullable integer array with type H5T_STD_I64LE, hdf5r writes with type H5T_STD_I32LE

  • Affected backend: HDF5AnnData
  • Affected slot(s): obs, var
  • Affected dtype(s): nullable_integer_array
  • Probable cause: h5diff
  • To investigate: TRUE
  • To fix: TRUE

Error message

dataset: </var/nullable_integer_array/values> and </var/integer_with_nas/values>
Warning: different storage datatype
</var/nullable_integer_array/values> has file datatype H5T_STD_I64LE
</var/integer_with_nas/values> has file datatype H5T_STD_I32LE
size:           [20]           [20]
position        values          values          difference          
------------------------------------------------------------
[ 0 ]          0               1               1              
1 differences found

Proposed solution

We should investigate if we can specify the type with which an attribute should be written.

Issue: On position 0, hdf5py writes a 0 in the values array, hdf5r writes a 1.

  • Affected backend: HDF5AnnData
  • Affected slot(s): obs, var
  • Affected dtype(s): nullable_integer_array
  • Probable cause: h5diff
  • To investigate: TRUE
  • To fix: TRUE

Error message

dataset: </var/nullable_integer_array/values> and </var/integer_with_nas/values>
Warning: different storage datatype
</var/nullable_integer_array/values> has file datatype H5T_STD_I64LE
</var/integer_with_nas/values> has file datatype H5T_STD_I32LE
size:           [20]           [20]
position        values          values          difference          
------------------------------------------------------------
[ 0 ]          0               1               1              
1 differences found

Proposed solution

We should investigate why this difference happens.

Issue: After conversion of a sparse matrix, also containing NAs to a SelfHits object, the distinction between NA and 0 is lost.

  • Affected backend: to_SCE
  • Affected slot(s): obsp, varp
  • Affected dtype(s): numeric_csparse_with_nas, numeric_rsparse_with_nas, integer_csparse_with_nas, integer_rsparse_with_nas
  • Probable cause: convert
  • To investigate: TRUE
  • To fix: FALSE

Error message

`sce_matrix` (`actual`) not equal to `ad_matrix` (`expected`).

actual vs expected
                [, 1] [, 2] [, 3] [, 4] [, 5] [, 6] [, 7] [, 8] [, 9] [,10]
  actual[1, ]       NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
- actual[2, ]       NA    NA    NA    NA   1.2    NA    NA    NA    NA    NA
+ expected[2, ]   0.00     0     0     0   1.2     0  0.00     0     0     0
  actual[3, ]       NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
- actual[4, ]     0.48    NA    NA    NA    NA    NA  0.66    NA    NA    NA
+ expected[4, ]   0.48     0     0     0   0.0     0  0.66     0     0     0

Proposed solution

Upstream the issue, is this intentional?

Issue: After conversion of a sparse matrix, also containing NAs to a SelfHits object, the distinction between NA and 0 is lost.

  • Affected backend: to_SCE
  • Affected slot(s): obsp, varp
  • Affected dtype(s): numeric_dense_with_nas, numeric_matrix_with_nas, integer_dense_with_nas, integer_matrix_with_nas
  • Probable cause: convert
  • To investigate: TRUE
  • To fix: FALSE

Error message

`sce_matrix` (`actual`) not equal to `ad_matrix` (`expected`).

actual vs expected
                      [, 1]     [, 2]     [, 3]     [, 4]     [, 5]     [, 6]      [, 7]     [, 8]     [, 9]     [,10]
- actual[1, ]    0.00000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.0000000 0.0000000 0.0000000
+ expected[1, ]          NA        NA        NA        NA        NA        NA         NA        NA        NA        NA
  actual[2, ]    0.30879331 0.3489866 0.9774142 0.5004646 0.5611313 0.8525832 0.06551198 0.1663290 0.1574261 0.4143122
- actual[3, ]    0.00000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.0000000 0.0000000 0.0000000
+ expected[3, ]          NA        NA        NA        NA        NA        NA         NA        NA        NA        NA

Proposed solution

Upstream the issue, is this intentional?

Issue: converted sce object has dimnames(), whilst the original anndata does not.

  • Affected backend: to_SCE
  • Affected slot(s): obsm, varm
  • Affected dtype(s): pca
  • Probable cause: convert
  • To investigate: TRUE
  • To fix: FALSE

Error message

sampleFactors(reducedDims(sce)$pca) (`actual`) not equal to ad$obsm[["X_pca"]] (`expected`).
`dimnames(actual)` is a list `dimnames(expected)` is absent

Proposed solution

Investigate if this is a problem or not.