Title: | Simple Simulation of Single-cell RNA Sequencing Data |
---|---|
Description: | Splatter is a package for the simulation of single-cell RNA sequencing count data. It provides a simple interface for creating complex simulations that are reproducible and well-documented. Parameters can be estimated from real data and functions are provided for comparing real and simulated datasets. |
Authors: | Luke Zappia [aut, cre] (<https://orcid.org/0000-0001-7744-8565>, lazappi), Belinda Phipson [aut] (<https://orcid.org/0000-0002-1711-7454>, bphipson), Christina Azodi [ctb] (<https://orcid.org/0000-0002-6097-606X>, azodichr), Alicia Oshlack [aut] |
Maintainer: | Luke Zappia <[email protected]> |
License: | GPL-3 + file LICENSE |
Version: | 1.31.0 |
Built: | 2024-11-21 17:23:40 UTC |
Source: | https://github.com/Oshlack/splatter |
Add gene lengths to an SingleCellExperiment object
addGeneLengths( sce, method = c("generate", "sample"), loc = 7.9, scale = 0.7, lengths = NULL )
addGeneLengths( sce, method = c("generate", "sample"), loc = 7.9, scale = 0.7, lengths = NULL )
sce |
SingleCellExperiment to add gene lengths to. |
method |
Method to use for creating lengths. |
loc |
Location parameter for the generate method. |
scale |
Scale parameter for the generate method. |
lengths |
Vector of lengths for the sample method. |
This function adds simulated gene lengths to the
rowData
slot of a
SingleCellExperiment
object that can be
used for calculating length normalised expression values such as TPM or FPKM.
The generate
method simulates lengths using a (rounded) log-normal
distribution, with the default loc
and scale
parameters based
on human protein-coding genes. Alternatively the sample
method can be
used which randomly samples lengths (with replacement) from a supplied
vector.
SingleCellExperiment with added gene lengths
# Default generate method sce <- simpleSimulate() sce <- addGeneLengths(sce) head(rowData(sce)) # Sample method (human coding genes) ## Not run: library(TxDb.Hsapiens.UCSC.hg19.knownGene) library(GenomicFeatures) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene tx.lens <- transcriptLengths(txdb, with.cds_len = TRUE) tx.lens <- tx.lens[tx.lens$cds_len > 0, ] gene.lens <- max(splitAsList(tx.lens$tx_len, tx.lens$gene_id)) sce <- addGeneLengths(sce, method = "sample", lengths = gene.lens) ## End(Not run)
# Default generate method sce <- simpleSimulate() sce <- addGeneLengths(sce) head(rowData(sce)) # Sample method (human coding genes) ## Not run: library(TxDb.Hsapiens.UCSC.hg19.knownGene) library(GenomicFeatures) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene tx.lens <- transcriptLengths(txdb, with.cds_len = TRUE) tx.lens <- tx.lens[tx.lens$cds_len > 0, ] gene.lens <- max(splitAsList(tx.lens$tx_len, tx.lens$gene_id)) sce <- addGeneLengths(sce, method = "sample", lengths = gene.lens) ## End(Not run)
Estimate simulation parameters for the BASiCS simulation from a real dataset.
BASiCSEstimate( counts, spike.info = NULL, batch = NULL, n = 20000, thin = 10, burn = 5000, regression = TRUE, params = newBASiCSParams(), verbose = TRUE, progress = TRUE, ... ) ## S3 method for class 'SingleCellExperiment' BASiCSEstimate( counts, spike.info = NULL, batch = NULL, n = 20000, thin = 10, burn = 5000, regression = TRUE, params = newBASiCSParams(), verbose = TRUE, progress = TRUE, ... ) ## S3 method for class 'matrix' BASiCSEstimate( counts, spike.info = NULL, batch = NULL, n = 20000, thin = 10, burn = 5000, regression = TRUE, params = newBASiCSParams(), verbose = TRUE, progress = TRUE, ... )
BASiCSEstimate( counts, spike.info = NULL, batch = NULL, n = 20000, thin = 10, burn = 5000, regression = TRUE, params = newBASiCSParams(), verbose = TRUE, progress = TRUE, ... ) ## S3 method for class 'SingleCellExperiment' BASiCSEstimate( counts, spike.info = NULL, batch = NULL, n = 20000, thin = 10, burn = 5000, regression = TRUE, params = newBASiCSParams(), verbose = TRUE, progress = TRUE, ... ) ## S3 method for class 'matrix' BASiCSEstimate( counts, spike.info = NULL, batch = NULL, n = 20000, thin = 10, burn = 5000, regression = TRUE, params = newBASiCSParams(), verbose = TRUE, progress = TRUE, ... )
counts |
either a counts matrix or a SingleCellExperiment object containing count data to estimate parameters from. |
spike.info |
data.frame describing spike-ins with two columns: "Name"
giving the names of the spike-in features (must match
|
batch |
vector giving the batch that each cell belongs to. |
n |
total number of MCMC iterations. Must be |
thin |
thining period for the MCMC sampler. Must be |
burn |
burn-in period for the MCMC sampler. Must be in the range
|
regression |
logical. Whether to use regression to identify
over-dispersion. See |
params |
BASiCSParams object to store estimated values in. |
verbose |
logical. Whether to print progress messages. |
progress |
logical. Whether to print additional BASiCS progress messages. |
... |
Optional parameters passed to |
This function is just a wrapper around BASiCS_MCMC
that
takes the output and converts it to a BASiCSParams object. Either a set of
spike-ins or batch information (or both) must be supplied. If only batch
information is provided there must be at least two batches. See
BASiCS_MCMC
for details.
BASiCSParams object containing the estimated parameters.
# Load example data library(scuttle) set.seed(1) sce <- mockSCE() spike.info <- data.frame( Name = rownames(sce)[1:10], Input = rnorm(10, 500, 200), stringsAsFactors = FALSE ) params <- BASiCSEstimate(sce[1:100, 1:30], spike.info) params
# Load example data library(scuttle) set.seed(1) sce <- mockSCE() spike.info <- data.frame( Name = rownames(sce)[1:10], Input = rnorm(10, 500, 200), stringsAsFactors = FALSE ) params <- BASiCSEstimate(sce[1:100, 1:30], spike.info) params
S4 class that holds parameters for the BASiCS simulation.
The BASiCS simulation uses the following parameters:
nGenes
The number of genes to simulate.
nCells
The number of cells to simulate.
[seed]
Seed to use for generating random numbers.
nBatches
Number of batches to simulate.
batchCells
Number of cells in each batch.
gene.params
A data.frame
containing gene
parameters with two columns: Mean
(mean expression for
each biological gene) and Delta
(cell-to-cell
heterogeneity for each biological gene).
nSpikes
The number of spike-ins to simulate.
spike.means
Input molecules for each spike-in.
cell.params
A data.frame
containing gene
parameters with two columns: Phi
(mRNA content factor for
each cell, scaled to sum to the number of cells in each batch)
and S
(capture efficient for each cell).
theta
Technical variability parameter for each batch.
The parameters not shown in brackets can be estimated from real data using
BASiCSEstimate
. For details of the BASiCS simulation see
BASiCSSimulate
.
Simulate counts using the BASiCS method.
BASiCSSimulate( params = newBASiCSParams(), sparsify = TRUE, verbose = TRUE, ... )
BASiCSSimulate( params = newBASiCSParams(), sparsify = TRUE, verbose = TRUE, ... )
params |
BASiCSParams object containing simulation parameters. |
sparsify |
logical. Whether to automatically convert assays to sparse matrices if there will be a size reduction. |
verbose |
logical. Whether to print progress messages |
... |
any additional parameter settings to override what is provided in
|
This function is just a wrapper around BASiCS_Sim
that
takes a BASiCSParams
, runs the simulation then converts the
output to a SingleCellExperiment
object.
See BASiCS_Sim
for more details of how the simulation
works.
SingleCellExperiment containing simulated counts
Vallejos CA, Marioni JC, Richardson S. BASiCS: Bayesian Analysis of Single-Cell Sequencing data. PLoS Computational Biology (2015).
Paper: 10.1371/journal.pcbi.1004333
Code: https://github.com/catavallejos/BASiCS
if (requireNamespace("BASiCS", quietly = TRUE)) { sim <- BASiCSSimulate() }
if (requireNamespace("BASiCS", quietly = TRUE)) { sim <- BASiCSSimulate() }
Combine the data from several SingleCellExperiment objects and produce some basic plots comparing them.
compareSCEs( sces, point.size = 0.1, point.alpha = 0.1, fits = TRUE, colours = NULL )
compareSCEs( sces, point.size = 0.1, point.alpha = 0.1, fits = TRUE, colours = NULL )
sces |
named list of SingleCellExperiment objects to combine and compare. |
point.size |
size of points in scatter plots. |
point.alpha |
opacity of points in scatter plots. |
fits |
whether to include fits in scatter plots. |
colours |
vector of colours to use for each dataset. |
The returned list has three items:
RowData
Combined row data from the provided SingleCellExperiments.
ColData
Combined column data from the provided SingleCellExperiments.
Plots
Comparison plots
Means
Boxplot of mean distribution.
Variances
Boxplot of variance distribution.
MeanVar
Scatter plot with fitted lines showing the mean-variance relationship.
LibrarySizes
Boxplot of the library size distribution.
ZerosGene
Boxplot of the percentage of each gene that is zero.
ZerosCell
Boxplot of the percentage of each cell that is zero.
MeanZeros
Scatter plot with fitted lines showing the mean-zeros relationship.
VarGeneCor
Heatmap of correlation of the 100 most variable genes.
The plots returned by this function are created using
ggplot
and are only a sample of the kind of plots you
might like to consider. The data used to create these plots is also returned
and should be in the correct format to allow you to create further plots
using ggplot
.
List containing the combined datasets and plots.
sim1 <- splatSimulate(nGenes = 1000, batchCells = 20) sim2 <- simpleSimulate(nGenes = 1000, nCells = 20) comparison <- compareSCEs(list(Splat = sim1, Simple = sim2)) names(comparison) names(comparison$Plots)
sim1 <- splatSimulate(nGenes = 1000, batchCells = 20) sim2 <- simpleSimulate(nGenes = 1000, nCells = 20) comparison <- compareSCEs(list(Splat = sim1, Simple = sim2)) names(comparison) names(comparison$Plots)
Combine the data from several SingleCellExperiment objects and produce some basic plots comparing them to a reference.
diffSCEs( sces, ref, point.size = 0.1, point.alpha = 0.1, fits = TRUE, colours = NULL )
diffSCEs( sces, ref, point.size = 0.1, point.alpha = 0.1, fits = TRUE, colours = NULL )
sces |
named list of SingleCellExperiment objects to combine and compare. |
ref |
string giving the name of the SingleCellExperiment to use as the reference |
point.size |
size of points in scatter plots. |
point.alpha |
opacity of points in scatter plots. |
fits |
whether to include fits in scatter plots. |
colours |
vector of colours to use for each dataset. |
This function aims to look at the differences between a reference SingleCellExperiment and one or more others. It requires each SingleCellExperiment to have the same dimensions. Properties are compared by ranks, for example when comparing the means the values are ordered and the differences between the reference and another dataset plotted. A series of Q-Q plots are also returned.
The returned list has five items:
Reference
The SingleCellExperiment used as the reference.
RowData
Combined feature data from the provided SingleCellExperiments.
ColData
Combined column data from the provided SingleCellExperiments.
Plots
Difference plots
Means
Boxplot of mean differences.
Variances
Boxplot of variance differences.
MeanVar
Scatter plot showing the difference from the reference variance across expression ranks.
LibraeySizes
Boxplot of the library size differences.
ZerosGene
Boxplot of the differences in the percentage of each gene that is zero.
ZerosCell
Boxplot of the differences in the percentage of each cell that is zero.
MeanZeros
Scatter plot showing the difference from the reference percentage of zeros across expression ranks.
QQPlots
Quantile-Quantile plots
Means
Q-Q plot of the means.
Variances
Q-Q plot of the variances.
LibrarySizes
Q-Q plot of the library sizes.
ZerosGene
Q-Q plot of the percentage of zeros per gene.
ZerosCell
Q-Q plot of the percentage of zeros per cell.
The plots returned by this function are created using
ggplot
and are only a sample of the kind of plots you
might like to consider. The data used to create these plots is also returned
and should be in the correct format to allow you to create further plots
using ggplot
.
List containing the combined datasets and plots.
sim1 <- splatSimulate(nGenes = 1000, batchCells = 20) sim2 <- simpleSimulate(nGenes = 1000, nCells = 20) difference <- diffSCEs(list(Splat = sim1, Simple = sim2), ref = "Simple") names(difference) names(difference$Plots)
sim1 <- splatSimulate(nGenes = 1000, batchCells = 20) sim2 <- simpleSimulate(nGenes = 1000, nCells = 20) difference <- diffSCEs(list(Splat = sim1, Simple = sim2), ref = "Simple") names(difference) names(difference$Plots)
Accessor function for getting parameter values.
getParam(object, name) ## S4 method for signature 'Params' getParam(object, name)
getParam(object, name) ## S4 method for signature 'Params' getParam(object, name)
object |
object to get parameter from. |
name |
name of the parameter to get. |
The extracted parameter value
params <- newSimpleParams() getParam(params, "nGenes")
params <- newSimpleParams() getParam(params, "nGenes")
Get multiple parameter values from a Params object.
getParams(params, names)
getParams(params, names)
params |
Params object to get values from. |
names |
vector of names of the parameters to get. |
List with the values of the selected parameters.
params <- newSimpleParams() getParams(params, c("nGenes", "nCells", "mean.rate"))
params <- newSimpleParams() getParams(params, c("nGenes", "nCells", "mean.rate"))
Estimate simulation parameters for the Kersplat simulation from a real dataset. See the individual estimation functions for more details on how this is done.
kersplatEstimate(counts, params = newKersplatParams(), verbose = TRUE) ## S3 method for class 'SingleCellExperiment' kersplatEstimate(counts, params = newKersplatParams(), verbose = TRUE) ## S3 method for class 'matrix' kersplatEstimate(counts, params = newKersplatParams(), verbose = TRUE)
kersplatEstimate(counts, params = newKersplatParams(), verbose = TRUE) ## S3 method for class 'SingleCellExperiment' kersplatEstimate(counts, params = newKersplatParams(), verbose = TRUE) ## S3 method for class 'matrix' kersplatEstimate(counts, params = newKersplatParams(), verbose = TRUE)
counts |
either a counts matrix or a SingleCellExperiment object containing count data to estimate parameters from. |
params |
KersplatParams object to store estimated values in. |
verbose |
logical. Whether to print progress messages. |
KersplatParams object containing the estimated parameters.
kersplatEstMean
, kersplatEstBCV
,
kersplatEstLib
if (requireNamespace("igraph", quietly = TRUE)) { # Load example data library(scuttle) set.seed(1) sce <- mockSCE() params <- kersplatEstimate(sce) params }
if (requireNamespace("igraph", quietly = TRUE)) { # Load example data library(scuttle) set.seed(1) sce <- mockSCE() params <- kersplatEstimate(sce) params }
S4 class that holds parameters for the Kersplat simulation.
The Kersplat simulation uses the following parameters:
nGenes
The number of genes to simulate.
nCells
The number of cells to simulate.
[seed]
Seed to use for generating random numbers.
mean.shape
Shape parameter for the mean gamma distribution.
mean.rate
Rate parameter for the mean gamma distribution.
mean.outProb
Probability that a gene is an expression outlier.
mean.outFacLoc
Location (meanlog) parameter for the expression outlier factor log-normal distribution.
mean.outFacScale
Scale (sdlog) parameter for the expression outlier factor log-normal distribution.
mean.dens
density
object describing
the log gene mean density.
[mean.method]
Method to use for simulating gene means. Either "fit" to sample from a gamma distribution (with expression outliers) or "density" to sample from the provided density object.
[mean.values]
Vector of means for each gene.
bcv.common
Underlying common dispersion across all genes.
[bcv.df]
Degrees of Freedom for the BCV inverse chi-squared distribution.
[network.graph]
Graph containing the gene network.
[network.nRegs]
Number of regulators in the network.
[paths.programs]
Number of expression programs.
[paths.design]
data.frame describing path
structure. See kersplatSimPaths
for details.
lib.loc
Location (meanlog) parameter for the library size log-normal distribution, or mean parameter if a normal distribution is used.
lib.scale
Scale (sdlog) parameter for the library size log-normal distribution, or sd parameter if a normal distribution is used.
lib.dens
density
object describing
the library size density.
[lib.method]
Method to use for simulating library sizes. Either "fit" to sample from a log-normal distribution or "density" to sample from the provided density object.
[cells.design]
data.frame describing cell
structure. See kersplatSimCellMeans
for details.
[doublet.prop]
Proportion of cells that are doublets.
[ambient.scale]
Scaling factor for the library size log-normal distribution when generating ambient library sizes.
[ambient.nEmpty]
Number of empty cells to simulate.
The parameters not shown in brackets can be estimated from real data using
kersplatEstimate
. For details of the Kersplat simulation
see kersplatSimulate
.
Sample cells for the Kersplat simulation
kersplatSample(params, sparsify = TRUE, verbose = TRUE)
kersplatSample(params, sparsify = TRUE, verbose = TRUE)
params |
KersplatParams object containing simulation parameters. |
sparsify |
logical. Whether to automatically convert assays to sparse matrices if there will be a size reduction. |
verbose |
logical. Whether to print progress messages |
The second stage is a two-step Kersplat simulation is to generate cells based
on a complete KersplatParams
object.
intermediate parameters.
The sampling process involves the following steps:
Simulate library sizes for each cell
Simulate means for each cell
Simulate endogenous counts for each cell
Simulate ambient counts for each cell
Simulate final counts for each cell
The final output is a
SingleCellExperiment
object that
contains the simulated counts but also the values for various intermediate
steps. These are stored in the colData
(for cell specific
information), rowData
(for gene specific information) or
assays
(for gene by cell matrices) slots. This additional
information includes:
colData
Unique cell identifier.
Whether the cell is a Cell, Doublet or Empty.
The expected number of endogenous counts for that cell.
The expected number of ambient counts for that cell.
The path the cell belongs to.
How far along the path each cell is.
For doublets the path of the first partner in the
doublet (otherwise NA
).
For doublets the step of the first partner in the
doublet (otherwise NA
).
For doublets the path of the second partner in the
doublet (otherwise NA
).
For doublets the step of the second partner in the
doublet (otherwise NA
).
rowData
Unique gene identifier.
The base expression level for that gene.
The ambient expression level for that gene.
assays
The mean expression of genes in each cell after any differential expression and adjusted for expected library size.
Endogenous count matrix.
Ambient count matrix.
Final count matrix.
Values that have been added by Splatter are named using UpperCamelCase
in order to differentiate them from the values added by analysis packages
which typically use underscore_naming
.
SingleCellExperiment object containing the simulated counts and intermediate values.
kersplatSimLibSizes
, kersplatSimCellMeans
,
kersplatSimCellCounts
, kersplatSimAmbientCounts
,
kersplatSimCounts
if (requireNamespace("igraph", quietly = TRUE)) { params <- kersplatSetup() sim <- kersplatSample(params) }
if (requireNamespace("igraph", quietly = TRUE)) { params <- kersplatSetup() sim <- kersplatSample(params) }
Setup the parameters required for the Kersplat simulation
kersplatSetup(params = newKersplatParams(), verbose = TRUE, ...)
kersplatSetup(params = newKersplatParams(), verbose = TRUE, ...)
params |
KersplatParams object containing simulation parameters. |
verbose |
logical. Whether to print progress messages |
... |
any additional parameter settings to override what is provided in
|
The first stage is a two-step Kersplat simulation is to generate some of the
intermediate parameters. The resulting parameters allow multiple simulated
datasets to be generated from the same biological structure (using
kersplatSample
). As with all the other parameters these values
can be manually overwritten if desired.
The setup involves the following steps:
Generate a gene network (if not already present)
Select regulator genes (if not already present)
Simulate gene means (if not already present)
Simulate cell paths
The resulting KersplatParams
object will have the following
parameters set (if they weren't already).
mean.values
network.graph
network.regsSet
paths.means
See KersplatParams
for more details about these parameters and
the functions for the individual steps for more details about the process.
A complete KersplatParams object
kersplatGenNetwork
, kersplatSelectRegs
,
kersplatSimGeneMeans
, kersplatSimPaths
,
KersplatParams
if (requireNamespace("igraph", quietly = TRUE)) { params <- kersplatSetup() }
if (requireNamespace("igraph", quietly = TRUE)) { params <- kersplatSetup() }
Simulate scRNA-seq count data using the Kersplat model
kersplatSimulate( params = newKersplatParams(), sparsify = TRUE, verbose = TRUE, ... )
kersplatSimulate( params = newKersplatParams(), sparsify = TRUE, verbose = TRUE, ... )
params |
KersplatParams object containing simulation parameters. |
sparsify |
logical. Whether to automatically convert assays to sparse matrices if there will be a size reduction. |
verbose |
logical. Whether to print progress messages |
... |
any additional parameter settings to override what is provided in
|
This functions is for simulating data in a single step. It consists of a
call to kersplatSetup
followed by a call to
kersplatSample
. Please see the documentation for those
functions for more details of the individual steps.
SingleCellExperiment containing simulated counts and intermediate values
if (requireNamespace("igraph", quietly = TRUE)) { sim <- kersplatSimulate }
if (requireNamespace("igraph", quietly = TRUE)) { sim <- kersplatSimulate }
List all the simulations that are currently available in Splatter with a brief description.
listSims(print = TRUE)
listSims(print = TRUE)
print |
logical. Whether to print to the console. |
Invisibly returns a data.frame containing the information that is displayed.
listSims()
listSims()
Estimate simulation parameters for the Lun2 simulation from a real dataset.
lun2Estimate( counts, plates, params = newLun2Params(), min.size = 200, verbose = TRUE, BPPARAM = SerialParam() ) ## S3 method for class 'SingleCellExperiment' lun2Estimate( counts, plates, params = newLun2Params(), min.size = 200, verbose = TRUE, BPPARAM = SerialParam() ) ## S3 method for class 'matrix' lun2Estimate( counts, plates, params = newLun2Params(), min.size = 200, verbose = TRUE, BPPARAM = SerialParam() )
lun2Estimate( counts, plates, params = newLun2Params(), min.size = 200, verbose = TRUE, BPPARAM = SerialParam() ) ## S3 method for class 'SingleCellExperiment' lun2Estimate( counts, plates, params = newLun2Params(), min.size = 200, verbose = TRUE, BPPARAM = SerialParam() ) ## S3 method for class 'matrix' lun2Estimate( counts, plates, params = newLun2Params(), min.size = 200, verbose = TRUE, BPPARAM = SerialParam() )
counts |
either a counts matrix or a SingleCellExperiment object containing count data to estimate parameters from. |
plates |
integer vector giving the plate that each cell originated from. |
params |
Lun2Params object to store estimated values in. |
min.size |
minimum size of clusters when identifying group of cells in the data. |
verbose |
logical. Whether to show progress messages. |
BPPARAM |
A |
See Lun2Params
for more details on the parameters.
LunParams object containing the estimated parameters.
# Load example data library(scuttle) set.seed(1) sce <- mockSCE() plates <- as.numeric(factor(colData(sce)$Mutation_Status)) params <- lun2Estimate(sce, plates, min.size = 20) params
# Load example data library(scuttle) set.seed(1) sce <- mockSCE() plates <- as.numeric(factor(colData(sce)$Mutation_Status)) params <- lun2Estimate(sce, plates, min.size = 20) params
S4 class that holds parameters for the Lun2 simulation.
The Lun2 simulation uses the following parameters:
nGenes
The number of genes to simulate.
nCells
The number of cells to simulate.
[seed]
Seed to use for generating random numbers.
gene.params
A data.frame
containing gene
parameters with two columns: Mean
(mean expression for
each gene) and Disp
(dispersion for each gene).
zi.params
A data.frame
containing
zero-inflated gene parameters with three columns: Mean
(mean expression for each gene), Disp
(dispersion for
each, gene), and Prop
(zero proportion for each gene).
[nPlates]
The number of plates to simulate.
plate.ingroup
Character vector giving the plates considered to be part of the "ingroup".
plate.mod
Plate effect modifier factor. The plate effect variance is divided by this value.
plate.var
Plate effect variance.
cell.plates
Factor giving the plate that each cell comes from.
cell.libSizes
Library size for each cell.
cell.libMod
Modifier factor for library sizes. The library sizes are multiplied by this value.
de.nGenes
Number of differentially expressed genes.
de.fc
Fold change for differentially expressed genes.
The parameters not shown in brackets can be estimated from real data using
lun2Estimate
. For details of the Lun2 simulation see
lun2Simulate
.
Simulate single-cell RNA-seq count data using the method described in Lun and Marioni "Overcoming confounding plate effects in differential expression analyses of single-cell RNA-seq data".
lun2Simulate( params = newLun2Params(), zinb = FALSE, sparsify = TRUE, verbose = TRUE, ... )
lun2Simulate( params = newLun2Params(), zinb = FALSE, sparsify = TRUE, verbose = TRUE, ... )
params |
Lun2Params object containing simulation parameters. |
zinb |
logical. Whether to use a zero-inflated model. |
sparsify |
logical. Whether to automatically convert assays to sparse matrices if there will be a size reduction. |
verbose |
logical. Whether to print progress messages. |
... |
any additional parameter settings to override what is provided in
|
The Lun2 simulation uses a negative-binomial distribution where the means and
dispersions have been sampled from a real dataset
(using lun2Estimate
). The other core feature of the Lun2
simulation is the addition of plate effects. Differential expression can be
added between two groups of plates (an "ingroup" and all other plates).
Library size factors are also applied and optionally a zero-inflated
negative-binomial can be used.
If the number of genes to simulate differs from the number of provided gene parameters or the number of cells to simulate differs from the number of library sizes the relevant parameters will be sampled with a warning. This allows any number of genes or cells to be simulated regardless of the number in the dataset used in the estimation step but has the downside that some genes or cells may be simulated multiple times.
SingleCellExperiment containing simulated counts.
Lun ATL, Marioni JC. Overcoming confounding plate effects in differential expression analyses of single-cell RNA-seq data. Biostatistics (2017).
Paper: dx.doi.org/10.1093/biostatistics/kxw055
Code: https://github.com/MarioniLab/PlateEffects2016
sim <- lun2Simulate()
sim <- lun2Simulate()
Estimate simulation parameters for the Lun simulation from a real dataset.
lunEstimate(counts, params = newLunParams()) ## S3 method for class 'SingleCellExperiment' lunEstimate(counts, params = newLunParams()) ## S3 method for class 'matrix' lunEstimate(counts, params = newLunParams())
lunEstimate(counts, params = newLunParams()) ## S3 method for class 'SingleCellExperiment' lunEstimate(counts, params = newLunParams()) ## S3 method for class 'matrix' lunEstimate(counts, params = newLunParams())
counts |
either a counts matrix or an SingleCellExperiment object containing count data to estimate parameters from. |
params |
LunParams object to store estimated values in. |
The nGenes
and nCells
parameters are taken from the size of the
input data. No other parameters are estimated. See LunParams
for more details on the parameters.
LunParams object containing the estimated parameters.
# Load example data library(scuttle) set.seed(1) sce <- mockSCE() params <- lunEstimate(sce) params
# Load example data library(scuttle) set.seed(1) sce <- mockSCE() params <- lunEstimate(sce) params
S4 class that holds parameters for the Lun simulation.
The Lun simulation uses the following parameters:
nGenes
The number of genes to simulate.
nCells
The number of cells to simulate.
[nGroups]
The number of groups to simulate.
[groupCells]
Vector giving the number of cells in each simulation group/path.
[seed]
Seed to use for generating random numbers.
[mean.shape]
Shape parameter for the mean gamma distribution.
[mean.rate]
Rate parameter for the mean gamma distribution.
[count.disp]
The dispersion parameter for the counts negative binomial distribution.
[de.nGenes]
The number of genes that are differentially expressed in each group
[de.upProp]
The proportion of differentially expressed genes that are up-regulated in each group
[de.upFC]
The fold change for up-regulated genes
[de.downFC]
The fold change for down-regulated genes
The parameters not shown in brackets can be estimated from real data using
lunEstimate
. For details of the Lun simulation see
lunSimulate
.
Simulate single-cell RNA-seq count data using the method described in Lun, Bach and Marioni "Pooling across cells to normalize single-cell RNA sequencing data with many zero counts".
lunSimulate(params = newLunParams(), sparsify = TRUE, verbose = TRUE, ...)
lunSimulate(params = newLunParams(), sparsify = TRUE, verbose = TRUE, ...)
params |
LunParams object containing Lun simulation parameters. |
sparsify |
logical. Whether to automatically convert assays to sparse matrices if there will be a size reduction. |
verbose |
logical. Whether to print progress messages. |
... |
any additional parameter settings to override what is provided in
|
The Lun simulation generates gene mean expression levels from a gamma
distribution with shape = mean.shape
and rate = mean.rate
.
Counts are then simulated from a negative binomial distribution with
mu = means
and size = 1 / bcv.common
. In addition each cell is
given a size factor (2 ^ rnorm(nCells, mean = 0, sd = 0.5)
) and
differential expression can be simulated with fixed fold changes.
See LunParams
for details of the parameters.
SingleCellExperiment object containing the simulated counts and intermediate values.
Lun ATL, Bach K, Marioni JC. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biology (2016).
Paper: dx.doi.org/10.1186/s13059-016-0947-7
Code: https://github.com/MarioniLab/Deconvolution2016
sim <- lunSimulate()
sim <- lunSimulate()
Combine the plots from compareSCEs
into a single panel.
makeCompPanel( comp, title = "Comparison", labels = c("Means", "Variance", "Mean-variance relationship", "Library size", "Zeros per gene", "Zeros per cell", "Mean-zeros relationship") )
makeCompPanel( comp, title = "Comparison", labels = c("Means", "Variance", "Mean-variance relationship", "Library size", "Zeros per gene", "Zeros per cell", "Mean-zeros relationship") )
comp |
list returned by |
title |
title for the panel. |
labels |
vector of labels for each of the seven plots. |
Combined panel plot
sim1 <- splatSimulate(nGenes = 1000, batchCells = 20) sim2 <- simpleSimulate(nGenes = 1000, nCells = 20) comparison <- compareSCEs(list(Splat = sim1, Simple = sim2)) panel <- makeCompPanel(comparison)
sim1 <- splatSimulate(nGenes = 1000, batchCells = 20) sim2 <- simpleSimulate(nGenes = 1000, nCells = 20) comparison <- compareSCEs(list(Splat = sim1, Simple = sim2)) panel <- makeCompPanel(comparison)
Combine the plots from diffSCEs
into a single panel.
makeDiffPanel( diff, title = "Difference comparison", labels = c("Means", "Variance", "Library size", "Zeros per cell", "Zeros per gene", "Mean-variance relationship", "Mean-zeros relationship") )
makeDiffPanel( diff, title = "Difference comparison", labels = c("Means", "Variance", "Library size", "Zeros per cell", "Zeros per gene", "Mean-variance relationship", "Mean-zeros relationship") )
diff |
list returned by |
title |
title for the panel. |
labels |
vector of labels for each of the seven sections. |
Combined panel plot
sim1 <- splatSimulate(nGenes = 1000, batchCells = 20) sim2 <- simpleSimulate(nGenes = 1000, nCells = 20) difference <- diffSCEs(list(Splat = sim1, Simple = sim2), ref = "Simple") panel <- makeDiffPanel(difference)
sim1 <- splatSimulate(nGenes = 1000, batchCells = 20) sim2 <- simpleSimulate(nGenes = 1000, nCells = 20) difference <- diffSCEs(list(Splat = sim1, Simple = sim2), ref = "Simple") panel <- makeDiffPanel(difference)
Combine the plots from compSCEs
and diffSCEs
into a
single panel.
makeOverallPanel( comp, diff, title = "Overall comparison", row.labels = c("Means", "Variance", "Mean-variance relationship", "Library size", "Zeros per cell", "Zeros per gene", "Mean-zeros relationship") )
makeOverallPanel( comp, diff, title = "Overall comparison", row.labels = c("Means", "Variance", "Mean-variance relationship", "Library size", "Zeros per cell", "Zeros per gene", "Mean-zeros relationship") )
comp |
list returned by |
diff |
list returned by |
title |
title for the panel. |
row.labels |
vector of labels for each of the seven rows. |
Combined panel plot
sim1 <- splatSimulate(nGenes = 1000, batchCells = 20) sim2 <- simpleSimulate(nGenes = 1000, nCells = 20) comparison <- compareSCEs(list(Splat = sim1, Simple = sim2)) difference <- diffSCEs(list(Splat = sim1, Simple = sim2), ref = "Simple") panel <- makeOverallPanel(comparison, difference)
sim1 <- splatSimulate(nGenes = 1000, batchCells = 20) sim2 <- simpleSimulate(nGenes = 1000, nCells = 20) comparison <- compareSCEs(list(Splat = sim1, Simple = sim2)) difference <- diffSCEs(list(Splat = sim1, Simple = sim2), ref = "Simple") panel <- makeOverallPanel(comparison, difference)
Estimate simulation parameters for the mfa simulation from a real dataset.
mfaEstimate(counts, params = newMFAParams()) ## S3 method for class 'SingleCellExperiment' mfaEstimate(counts, params = newMFAParams()) ## S3 method for class 'matrix' mfaEstimate(counts, params = newMFAParams())
mfaEstimate(counts, params = newMFAParams()) ## S3 method for class 'SingleCellExperiment' mfaEstimate(counts, params = newMFAParams()) ## S3 method for class 'matrix' mfaEstimate(counts, params = newMFAParams())
counts |
either a counts matrix or a SingleCellExperiment object containing count data to estimate parameters from. |
params |
MFAParams object to store estimated values in. |
The nGenes
and nCells
parameters are taken from the size of the
input data. The dropout lambda parameter is estimate using
empirical_lambda
. See MFAParams
for more
details on the parameters.
MFAParams object containing the estimated parameters.
# Load example data if (requireNamespace("mfa", quietly = TRUE)) { library(mfa) synth <- create_synthetic( C = 20, G = 5, zero_negative = TRUE, model_dropout = TRUE ) params <- mfaEstimate(synth$X) params }
# Load example data if (requireNamespace("mfa", quietly = TRUE)) { library(mfa) synth <- create_synthetic( C = 20, G = 5, zero_negative = TRUE, model_dropout = TRUE ) params <- mfaEstimate(synth$X) params }
S4 class that holds parameters for the mfa simulation.
The mfa simulation uses the following parameters:
nGenes
The number of genes to simulate.
nCells
The number of cells to simulate.
[seed]
Seed to use for generating random numbers.
[trans.prop]
Proportion of genes that show transient expression. These genes are briefly up or down-regulated before returning to their initial state
[zero.neg]
Logical. Whether to set negative expression values to zero. This will zero-inflate the data.
[dropout.present]
Logical. Whether to simulate dropout.
dropout.lambda
Lambda parameter for the exponential dropout function.
The parameters not shown in brackets can be estimated from real data using
mfaEstimate
. See create_synthetic
for more
details about the parameters. For details of the Splatter implementation of
the mfa simulation see mfaSimulate
.
Simulate a bifurcating pseudotime path using the mfa method.
mfaSimulate(params = newMFAParams(), sparsify = TRUE, verbose = TRUE, ...)
mfaSimulate(params = newMFAParams(), sparsify = TRUE, verbose = TRUE, ...)
params |
MFAParams object containing simulation parameters. |
sparsify |
logical. Whether to automatically convert assays to sparse matrices if there will be a size reduction. |
verbose |
Logical. Whether to print progress messages. |
... |
any additional parameter settings to override what is provided in
|
This function is just a wrapper around create_synthetic
that takes a MFAParams
, runs the simulation then converts the
output from log-expression to counts and returns a
SingleCellExperiment
object. See
create_synthetic
and the mfa paper for more details about
how the simulation works.
SingleCellExperiment containing simulated counts
Campbell KR, Yau C. Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers. Wellcome Open Research (2017).
Paper: 10.12688/wellcomeopenres.11087.1
Code: https://github.com/kieranrcampbell/mfa
if (requireNamespace("mfa", quietly = TRUE)) { sim <- mfaSimulate() }
if (requireNamespace("mfa", quietly = TRUE)) { sim <- mfaSimulate() }
Reduce the size of a SingleCellExperiment object by unneeded information.
minimiseSCE( sce, rowData.keep = FALSE, colData.keep = FALSE, metadata.keep = FALSE, assays.keep = "counts", sparsify = c("auto", "all", "none"), verbose = TRUE )
minimiseSCE( sce, rowData.keep = FALSE, colData.keep = FALSE, metadata.keep = FALSE, assays.keep = "counts", sparsify = c("auto", "all", "none"), verbose = TRUE )
sce |
SingleCellExperiment object |
rowData.keep |
Either TRUE (keep all rowData columns), FALSE (remove all rowData columns) or a character vector with the names of the rowData columns to keep |
colData.keep |
Either TRUE (keep all colData columns), FALSE (remove all colData columns) or a character vector with the names of the colData columns to keep |
metadata.keep |
Either TRUE (keep all metadata), FALSE (remove all metadata) or a character vector with the names of the metadata items to keep |
assays.keep |
Either TRUE (keep all assays), FALSE (remove all assays) or a character vector with the names of the assays to keep |
sparsify |
Whether to convert assay matrices to sparse format. Either "all", "none" or "auto" (default) to only convert those matrices that will result in a size reduction |
verbose |
Whether to print status messages |
SingleCellExperiment object
sce <- splatSimulate(verbose = FALSE) sce.min <- minimiseSCE(sce, verbose = FALSE) object.size(sce) object.size(sce.min)
sce <- splatSimulate(verbose = FALSE) sce.min <- minimiseSCE(sce, verbose = FALSE) object.size(sce) object.size(sce.min)
Quick function to generate mock eQTL mapping results, with parameters estimated using real eQTL mapping results from GTEx using thyroid tissue.
mockBulkeQTL(n.genes = 500, seed = NULL)
mockBulkeQTL(n.genes = 500, seed = NULL)
n.genes |
Number of genes in mock eQTL data. |
seed |
Optional: seed for random seed |
data.frame containing mock bulk eQTL mapping results.
eqtl <- mockBulkeQTL()
eqtl <- mockBulkeQTL()
Quick function to generate mock bulk expression data for a population, with parameters estimated using real thyroid tissue data from GTEx.
mockBulkMatrix(n.genes = 100, n.samples = 50, seed = NULL)
mockBulkMatrix(n.genes = 100, n.samples = 50, seed = NULL)
n.genes |
Number of genes in mock bulk data. |
n.samples |
Number of samples in mock bulk data. |
seed |
Optional: seed for random seed |
matrix containing mock bulk expression data.
bulk <- mockBulkMatrix
bulk <- mockBulkMatrix
Quick function to generate matching mock VCF, bulk expression, and eQTL data, useful for running splatPopEmpiricalMeans
mockEmpiricalSet( n.genes = 20, n.snps = 1000, n.samples = 10, chromosome = 1, chr.length = 2e+06, seed = NULL )
mockEmpiricalSet( n.genes = 20, n.snps = 1000, n.samples = 10, chromosome = 1, chr.length = 2e+06, seed = NULL )
n.genes |
Number of genes in mock eQTL data. |
n.snps |
Number of SNPs in mock vcf file. |
n.samples |
Number of samples in mock bulk data. |
chromosome |
Chromosome name |
chr.length |
Length of mock chromosome |
seed |
Optional: seed for random seed |
list(gff=mockGFF, vcf=mockVCF, means=mockMEANS, eqtl=mockEQTL)
empirical <- mockEmpiricalSet()
empirical <- mockEmpiricalSet()
Quick function to generate a mock gff.
mockGFF(n.genes = 50, chromosome = 1, chr.length = 2e+06, seed = NULL)
mockGFF(n.genes = 50, chromosome = 1, chr.length = 2e+06, seed = NULL)
n.genes |
Number of genes in mock gff file |
chromosome |
Chromosome name |
chr.length |
Length of mock chromosome |
seed |
Optional: seed for random seed |
data.frame containing mock gff data.
gff <- mockGFF()
gff <- mockGFF()
Quick function to generate mock vcf file. Note this data has unrealistic population structure.
mockVCF( n.snps = 200, n.samples = 5, chromosome = 1, chr.length = 2e+06, seed = NULL )
mockVCF( n.snps = 200, n.samples = 5, chromosome = 1, chr.length = 2e+06, seed = NULL )
n.snps |
Number of SNPs in mock vcf file. |
n.samples |
Number of samples in mock bulk data. |
chromosome |
Chromosome name |
chr.length |
Length of mock chromosome |
seed |
Optional: seed for random seed |
data.frame containing mock vcf data.
vcf <- mockVCF()
vcf <- mockVCF()
Create a new Params object. Functions exist for each of the different Params subtypes.
newBASiCSParams(...) newKersplatParams(...) newLun2Params(...) newLunParams(...) newMFAParams(...) newPhenoParams(...) newSCDDParams(...) newSimpleParams(...) newSparseDCParams(...) newSplatParams(...) newSplatPopParams(...) newZINBParams(...)
newBASiCSParams(...) newKersplatParams(...) newLun2Params(...) newLunParams(...) newMFAParams(...) newPhenoParams(...) newSCDDParams(...) newSimpleParams(...) newSparseDCParams(...) newSplatParams(...) newSplatPopParams(...) newZINBParams(...)
... |
additional parameters passed to |
New Params object.
params <- newSimpleParams() params <- newSimpleParams(nGenes = 200, nCells = 10)
params <- newSimpleParams() params <- newSimpleParams(nGenes = 200, nCells = 10)
Virtual S4 class that all other Params classes inherit from.
The Params class defines the following parameters:
nGenes
The number of genes to simulate.
nCells
The number of cells to simulate.
[seed]
Seed to use for generating random numbers.
The parameters not shown in brackets can be estimated from real data.
Estimate simulation parameters for the PhenoPath simulation from a real dataset.
phenoEstimate(counts, params = newPhenoParams()) ## S3 method for class 'SingleCellExperiment' phenoEstimate(counts, params = newPhenoParams()) ## S3 method for class 'matrix' phenoEstimate(counts, params = newPhenoParams())
phenoEstimate(counts, params = newPhenoParams()) ## S3 method for class 'SingleCellExperiment' phenoEstimate(counts, params = newPhenoParams()) ## S3 method for class 'matrix' phenoEstimate(counts, params = newPhenoParams())
counts |
either a counts matrix or an SingleCellExperiment object containing count data to estimate parameters from. |
params |
PhenoParams object to store estimated values in. |
The nGenes
and nCells
parameters are taken from the size of the
input data. The total number of genes is evenly divided into the four types.
See PhenoParams
for more details on the parameters.
PhenoParams object containing the estimated parameters.
if (requireNamespace("phenopath", quietly = TRUE)) { # Load example data library(scuttle) set.seed(1) sce <- mockSCE() params <- phenoEstimate(sce) params }
if (requireNamespace("phenopath", quietly = TRUE)) { # Load example data library(scuttle) set.seed(1) sce <- mockSCE() params <- phenoEstimate(sce) params }
S4 class that holds parameters for the PhenoPath simulation.
The PhenoPath simulation uses the following parameters:
nGenes
The number of genes to simulate.
nCells
The number of cells to simulate.
[seed]
Seed to use for generating random numbers.
[n.de]
Number of genes to simulate from the differential expression regime
[n.pst]
Number of genes to simulate from the pseudotime regime
[n.pst.beta]
Number of genes to simulate from the pseudotime + beta interactions regime
[n.de.pst.beta]
Number of genes to simulate from the differential expression + pseudotime + interactions regime
The parameters not shown in brackets can be estimated from real data using
phenoEstimate
. For details of the PhenoPath simulation
see phenoSimulate
.
Simulate counts from a pseudotime trajectory using the PhenoPath method.
phenoSimulate(params = newPhenoParams(), sparsify = TRUE, verbose = TRUE, ...)
phenoSimulate(params = newPhenoParams(), sparsify = TRUE, verbose = TRUE, ...)
params |
PhenoParams object containing simulation parameters. |
sparsify |
logical. Whether to automatically convert assays to sparse matrices if there will be a size reduction. |
verbose |
logical. Whether to print progress messages |
... |
any additional parameter settings to override what is provided in
|
This function is just a wrapper around
simulate_phenopath
that takes a
PhenoParams
, runs the simulation then converts the
output from log-expression to counts and returns a
SingleCellExperiment
object. The original
simulated log-expression values are returned in the LogExprs
assay.
See simulate_phenopath
and the PhenoPath paper for
more details about how the simulation works.
SingleCellExperiment containing simulated counts
Campbell K, Yau C. Uncovering genomic trajectories with heterogeneous genetic and environmental backgrounds across single-cells and populations. bioRxiv (2017).
Paper: 10.1101/159913
Code: https://github.com/kieranrcampbell/phenopath
if (requireNamespace("phenopath", quietly = TRUE)) { sim <- phenoSimulate() }
if (requireNamespace("phenopath", quietly = TRUE)) { sim <- phenoSimulate() }
Estimate simulation parameters for the scDD simulation from a real dataset.
scDDEstimate( counts, params = newSCDDParams(), verbose = TRUE, BPPARAM = SerialParam(), ... ) ## S3 method for class 'matrix' scDDEstimate( counts, params = newSCDDParams(), verbose = TRUE, BPPARAM = SerialParam(), conditions, ... ) ## S3 method for class 'SingleCellExperiment' scDDEstimate( counts, params = newSCDDParams(), verbose = TRUE, BPPARAM = SerialParam(), condition = "condition", ... ) ## Default S3 method: scDDEstimate( counts, params = newSCDDParams(), verbose = TRUE, BPPARAM = SerialParam(), condition, ... )
scDDEstimate( counts, params = newSCDDParams(), verbose = TRUE, BPPARAM = SerialParam(), ... ) ## S3 method for class 'matrix' scDDEstimate( counts, params = newSCDDParams(), verbose = TRUE, BPPARAM = SerialParam(), conditions, ... ) ## S3 method for class 'SingleCellExperiment' scDDEstimate( counts, params = newSCDDParams(), verbose = TRUE, BPPARAM = SerialParam(), condition = "condition", ... ) ## Default S3 method: scDDEstimate( counts, params = newSCDDParams(), verbose = TRUE, BPPARAM = SerialParam(), condition, ... )
counts |
either a counts matrix or a SingleCellExperiment object containing count data to estimate parameters from. |
params |
SCDDParams object to store estimated values in. |
verbose |
logical. Whether to show progress messages. |
BPPARAM |
A |
... |
further arguments passed to or from other methods. |
conditions |
Vector giving the condition that each cell belongs to. Conditions can be 1 or 2. |
condition |
String giving the column that represents biological group of interest. |
This function applies preprocess
to the counts then uses
scDD
to estimate the numbers of each gene type to
simulate. The output is then converted to a SCDDParams object. See
preprocess
and scDD
for details.
SCDDParams object containing the estimated parameters.
if (requireNamespace("scDD", quietly = TRUE)) { library(scuttle) set.seed(1) sce <- mockSCE(ncells = 20, ngenes = 100) colData(sce)$condition <- sample(1:2, ncol(sce), replace = TRUE) params <- scDDEstimate(sce, condition = "condition") params }
if (requireNamespace("scDD", quietly = TRUE)) { library(scuttle) set.seed(1) sce <- mockSCE(ncells = 20, ngenes = 100) colData(sce)$condition <- sample(1:2, ncol(sce), replace = TRUE) params <- scDDEstimate(sce, condition = "condition") params }
S4 class that holds parameters for the scDD simulation.
The SCDD simulation uses the following parameters:
nGenes
The number of genes to simulate (not used).
nCells
The number of cells to simulate in each condition.
[seed]
Seed to use for generating random numbers.
SCdat
SingleCellExperiment
containing real
data.
nDE
Number of DE genes to simulate.
nDP
Number of DP genes to simulate.
nDM
Number of DM genes to simulate.
nDB
Number of DB genes to simulate.
nEE
Number of EE genes to simulate.
nEP
Number of EP genes to simulate.
[sd.range]
Interval for fold change standard deviations.
[modeFC]
Values for DP, DM and DB mode fold changes.
[varInflation]
Variance inflation factors for each
condition. If all equal to 1 will be set to NULL
(default).
[condition]
String giving the column that represents biological group of interest.
The parameters not shown in brackets can be estimated from real data using
scDDEstimate
. See simulateSet
for more
details about the parameters. For details of the Splatter implementation of
the scDD simulation see scDDSimulate
.
Simulate counts using the scDD method.
scDDSimulate( params = newSCDDParams(), plots = FALSE, plot.file = NULL, sparsify = TRUE, verbose = TRUE, BPPARAM = SerialParam(), ... )
scDDSimulate( params = newSCDDParams(), plots = FALSE, plot.file = NULL, sparsify = TRUE, verbose = TRUE, BPPARAM = SerialParam(), ... )
params |
SCDDParams object containing simulation parameters. |
plots |
logical. whether to generate scDD fold change and validation plots. |
plot.file |
File path to save plots as PDF. |
sparsify |
logical. Whether to automatically convert assays to sparse matrices if there will be a size reduction. |
verbose |
logical. Whether to print progress messages |
BPPARAM |
A |
... |
any additional parameter settings to override what is provided in
|
This function is just a wrapper around simulateSet
that
takes a SCDDParams
, runs the simulation then converts the
output to a SingleCellExperiment
object.
See simulateSet
for more details about how the simulation
works.
SingleCellExperiment containing simulated counts
Korthauer KD, Chu L-F, Newton MA, Li Y, Thomson J, Stewart R, et al. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biology (2016).
Paper: 10.1186/s13059-016-1077-y
Code: https://github.com/kdkorthauer/scDD
sim <- scDDSimulate()
sim <- scDDSimulate()
Function for setting parameter values.
setParam(object, name, value) ## S4 method for signature 'BASiCSParams' setParam(object, name, value) ## S4 method for signature 'KersplatParams' setParam(object, name, value) ## S4 method for signature 'Lun2Params' setParam(object, name, value) ## S4 method for signature 'LunParams' setParam(object, name, value) ## S4 method for signature 'Params' setParam(object, name, value) ## S4 method for signature 'PhenoParams' setParam(object, name, value) ## S4 method for signature 'SCDDParams' setParam(object, name, value) ## S4 method for signature 'SplatParams' setParam(object, name, value) ## S4 method for signature 'SplatPopParams' setParam(object, name, value) ## S4 method for signature 'ZINBParams' setParam(object, name, value)
setParam(object, name, value) ## S4 method for signature 'BASiCSParams' setParam(object, name, value) ## S4 method for signature 'KersplatParams' setParam(object, name, value) ## S4 method for signature 'Lun2Params' setParam(object, name, value) ## S4 method for signature 'LunParams' setParam(object, name, value) ## S4 method for signature 'Params' setParam(object, name, value) ## S4 method for signature 'PhenoParams' setParam(object, name, value) ## S4 method for signature 'SCDDParams' setParam(object, name, value) ## S4 method for signature 'SplatParams' setParam(object, name, value) ## S4 method for signature 'SplatPopParams' setParam(object, name, value) ## S4 method for signature 'ZINBParams' setParam(object, name, value)
object |
object to set parameter in. |
name |
name of the parameter to set. |
value |
value to set the parameter to. |
Object with new parameter value.
params <- newSimpleParams() setParam(params, "nGenes", 100)
params <- newSimpleParams() setParam(params, "nGenes", 100)
Set multiple parameters in a Params object.
setParams(object, update = NULL, ...) ## S4 method for signature 'KersplatParams' setParams(object, update = NULL, ...) ## S4 method for signature 'Params' setParams(object, update = NULL, ...) ## S4 method for signature 'SplatParams' setParams(object, update = NULL, ...)
setParams(object, update = NULL, ...) ## S4 method for signature 'KersplatParams' setParams(object, update = NULL, ...) ## S4 method for signature 'Params' setParams(object, update = NULL, ...) ## S4 method for signature 'SplatParams' setParams(object, update = NULL, ...)
object |
Params object to set parameters in. |
update |
list of parameters to set where |
... |
additional parameters to set. These are combined with any
parameters specified in |
Each parameter is set by a call to setParam
. If the same
parameter is specified multiple times it will be set multiple times.
Parameters can be specified using a list via update
(useful when
collecting parameter values in some way) or individually (useful when setting
them manually), see examples.
Params object with updated values.
params <- newSimpleParams() params # Set individually params <- setParams(params, nGenes = 1000, nCells = 50) params # Set via update list params <- setParams(params, list(mean.rate = 0.2, mean.shape = 0.8)) params
params <- newSimpleParams() params # Set individually params <- setParams(params, nGenes = 1000, nCells = 50) params # Set via update list params <- setParams(params, list(mean.rate = 0.2, mean.shape = 0.8)) params
Estimate simulation parameters for the simple simulation from a real dataset.
simpleEstimate(counts, params = newSimpleParams()) ## S3 method for class 'SingleCellExperiment' simpleEstimate(counts, params = newSimpleParams()) ## S3 method for class 'matrix' simpleEstimate(counts, params = newSimpleParams())
simpleEstimate(counts, params = newSimpleParams()) ## S3 method for class 'SingleCellExperiment' simpleEstimate(counts, params = newSimpleParams()) ## S3 method for class 'matrix' simpleEstimate(counts, params = newSimpleParams())
counts |
either a counts matrix or a SingleCellExperiment object containing count data to estimate parameters from. |
params |
SimpleParams object to store estimated values in. |
The nGenes
and nCells
parameters are taken from the size of the
input data. The mean parameters are estimated by fitting a gamma distribution
to the library size normalised mean expression level using
fitdist
. See SimpleParams
for more
details on the parameters.
SimpleParams object containing the estimated parameters.
# Load example data library(scuttle) set.seed(1) sce <- mockSCE() params <- simpleEstimate(sce) params
# Load example data library(scuttle) set.seed(1) sce <- mockSCE() params <- simpleEstimate(sce) params
S4 class that holds parameters for the simple simulation.
The simple simulation uses the following parameters:
nGenes
The number of genes to simulate.
nCells
The number of cells to simulate.
[seed]
Seed to use for generating random numbers.
mean.shape
The shape parameter for the mean gamma distribution.
mean.rate
The rate parameter for the mean gamma distribution.
[count.disp]
The dispersion parameter for the counts negative binomial distribution.
The parameters not shown in brackets can be estimated from real data using
simpleEstimate
. For details of the simple simulation
see simpleSimulate
.
Simulate counts from a simple negative binomial distribution without simulated library sizes, differential expression etc.
simpleSimulate( params = newSimpleParams(), sparsify = TRUE, verbose = TRUE, ... )
simpleSimulate( params = newSimpleParams(), sparsify = TRUE, verbose = TRUE, ... )
params |
SimpleParams object containing simulation parameters. |
sparsify |
logical. Whether to automatically convert assays to sparse matrices if there will be a size reduction. |
verbose |
logical. Whether to print progress messages |
... |
any additional parameter settings to override what is provided in
|
Gene means are simulated from a gamma distribution with
shape = mean.shape
and rate = mean.rate
. Counts are then
simulated from a negative binomial distribution with mu = means
and
size = 1 / counts.disp
. See SimpleParams
for more
details of the parameters.
SingleCellExperiment containing simulated counts
sim <- simpleSimulate() # Override default parameters sim <- simpleSimulate(nGenes = 1000, nCells = 50)
sim <- simpleSimulate() # Override default parameters sim <- simpleSimulate(nGenes = 1000, nCells = 50)
Estimate simulation parameters for the SparseDC simulation from a real dataset.
sparseDCEstimate( counts, conditions, nclusters, norm = TRUE, params = newSparseDCParams() ) ## S3 method for class 'SingleCellExperiment' sparseDCEstimate( counts, conditions, nclusters, norm = TRUE, params = newSparseDCParams() ) ## S3 method for class 'matrix' sparseDCEstimate( counts, conditions, nclusters, norm = TRUE, params = newSparseDCParams() )
sparseDCEstimate( counts, conditions, nclusters, norm = TRUE, params = newSparseDCParams() ) ## S3 method for class 'SingleCellExperiment' sparseDCEstimate( counts, conditions, nclusters, norm = TRUE, params = newSparseDCParams() ) ## S3 method for class 'matrix' sparseDCEstimate( counts, conditions, nclusters, norm = TRUE, params = newSparseDCParams() )
counts |
either a counts matrix or an SingleCellExperiment object containing count data to estimate parameters from. |
conditions |
numeric vector giving the condition each cell belongs to. |
nclusters |
number of cluster present in the dataset. |
norm |
logical, whether to library size normalise counts before estimation. Set this to FALSE if counts is already normalised. |
params |
PhenoParams object to store estimated values in. |
The nGenes
and nCells
parameters are taken from the size of the
input data. The counts are preprocessed using
pre_proc_data
and then parameters are estimated using
sparsedc_cluster
using lambda values calculated using
lambda1_calculator
and
lambda2_calculator
.
See SparseDCParams
for more details on the parameters.
SparseParams object containing the estimated parameters.
if (requireNamespace("SparseDC", quietly = TRUE)) { # Load example data library(scuttle) set.seed(1) sce <- mockSCE(ncells = 20, ngenes = 100) conditions <- sample(1:2, ncol(sce), replace = TRUE) params <- sparseDCEstimate(sce, conditions, nclusters = 3) params }
if (requireNamespace("SparseDC", quietly = TRUE)) { # Load example data library(scuttle) set.seed(1) sce <- mockSCE(ncells = 20, ngenes = 100) conditions <- sample(1:2, ncol(sce), replace = TRUE) params <- sparseDCEstimate(sce, conditions, nclusters = 3) params }
S4 class that holds parameters for the SparseDC simulation.
The SparseDC simulation uses the following parameters:
nGenes
The number of genes to simulate in each condition.
nCells
The number of cells to simulate.
[seed]
Seed to use for generating random numbers.
markers.n
Number of marker genes to simulate for each cluster.
markers.shared
Number of marker genes for each cluster
shared between conditions. Must be less than or equal to
markers.n
.
[markers.same]
Logical. Whether each cluster should have the same set of marker genes.
clusts.c1
Numeric vector of clusters present in condition 1. The number of times a cluster is repeated controls the proportion of cells from that cluster.
clusts.c2
Numeric vector of clusters present in condition 2. The number of times a cluster is repeated controls the proportion of cells from that cluster.
[mean.lower]
Lower bound for cluster gene means.
[mean.upper]
Upper bound for cluster gene means.
The parameters not shown in brackets can be estimated from real data using
sparseDCEstimate
. For details of the SparseDC simulation
see sparseDCSimulate
.
Simulate counts from cluster in two conditions using the SparseDC method.
sparseDCSimulate( params = newSparseDCParams(), sparsify = TRUE, verbose = TRUE, ... )
sparseDCSimulate( params = newSparseDCParams(), sparsify = TRUE, verbose = TRUE, ... )
params |
SparseDCParams object containing simulation parameters. |
sparsify |
logical. Whether to automatically convert assays to sparse matrices if there will be a size reduction. |
verbose |
logical. Whether to print progress messages |
... |
any additional parameter settings to override what is provided in
|
This function is just a wrapper around
sim_data
that takes a
SparseDCParams
, runs the simulation then converts the
output from log-expression to counts and returns a
SingleCellExperiment
object. The original
simulated log-expression values are returned in the LogExprs
assay.
See sim_data
and the SparseDC paper for
more details about how the simulation works.
SingleCellExperiment containing simulated counts
Campbell K, Yau C. Uncovering genomic trajectories with heterogeneous genetic and environmental backgrounds across single-cells and populations. bioRxiv (2017).
Barron M, Zhang S, Li J. A sparse differential clustering algorithm for tracing cell type changes via single-cell RNA-sequencing data. Nucleic Acids Research (2017).
Paper: 10.1093/nar/gkx1113
if (requireNamespace("SparseDC", quietly = TRUE)) { sim <- sparseDCSimulate() }
if (requireNamespace("SparseDC", quietly = TRUE)) { sim <- sparseDCSimulate() }
Estimate simulation parameters for the Splat simulation from a real dataset. See the individual estimation functions for more details on how this is done.
splatEstimate(counts, params = newSplatParams()) ## S3 method for class 'SingleCellExperiment' splatEstimate(counts, params = newSplatParams()) ## S3 method for class 'matrix' splatEstimate(counts, params = newSplatParams())
splatEstimate(counts, params = newSplatParams()) ## S3 method for class 'SingleCellExperiment' splatEstimate(counts, params = newSplatParams()) ## S3 method for class 'matrix' splatEstimate(counts, params = newSplatParams())
counts |
either a counts matrix or a SingleCellExperiment object containing count data to estimate parameters from. |
params |
SplatParams object to store estimated values in. |
SplatParams object with estimated values.
splatEstMean
, splatEstLib
,
splatEstOutlier
, splatEstBCV
,
splatEstDropout
# Load example data library(scuttle) set.seed(1) sce <- mockSCE() params <- splatEstimate(sce) params
# Load example data library(scuttle) set.seed(1) sce <- mockSCE() params <- splatEstimate(sce) params
S4 class that holds parameters for the Splat simulation.
The Splat simulation requires the following parameters:
nGenes
The number of genes to simulate.
nCells
The number of cells to simulate.
[seed]
Seed to use for generating random numbers.
[nBatches]
The number of batches to simulate.
[batchCells]
Vector giving the number of cells in each batch.
[batch.facLoc]
Location (meanlog) parameter for the batch effect factor log-normal distribution. Can be a vector.
[batch.facScale]
Scale (sdlog) parameter for the batch effect factor log-normal distribution. Can be a vector.
[batch.rmEffect]
Logical, removes the batch effect and continues with the simulation when TRUE. This allows the user to test batch removal algorithms without having to calculate the new expected cell means with batch removed.
mean.shape
Shape parameter for the mean gamma distribution.
mean.rate
Rate parameter for the mean gamma distribution.
lib.loc
Location (meanlog) parameter for the library size log-normal distribution, or mean parameter if a normal distribution is used.
lib.scale
Scale (sdlog) parameter for the library size log-normal distribution, or sd parameter if a normal distribution is used.
lib.norm
Logical. Whether to use a normal distribution for library sizes instead of a log-normal.
out.prob
Probability that a gene is an expression outlier.
out.facLoc
Location (meanlog) parameter for the expression outlier factor log-normal distribution.
out.facScale
Scale (sdlog) parameter for the expression outlier factor log-normal distribution.
[nGroups]
The number of groups or paths to simulate.
[group.prob]
Probability that a cell comes from a group.
[de.prob]
Probability that a gene is differentially expressed in a group. Can be a vector.
[de.downProb]
Probability that a differentially expressed gene is down-regulated. Can be a vector.
[de.facLoc]
Location (meanlog) parameter for the differential expression factor log-normal distribution. Can be a vector.
[de.facScale]
Scale (sdlog) parameter for the differential expression factor log-normal distribution. Can be a vector.
bcv.common
Underlying common dispersion across all genes.
bcv.df
Degrees of Freedom for the BCV inverse chi-squared distribution.
dropout.type
The type of dropout to simulate. "none" indicates no dropout, "experiment" is global dropout using the same parameters for every cell, "batch" uses the same parameters for every cell in each batch, "group" uses the same parameters for every cell in each groups and "cell" uses a different set of parameters for each cell.
dropout.mid
Midpoint parameter for the dropout logistic function.
dropout.shape
Shape parameter for the dropout logistic function.
[path.from]
Vector giving the originating point of each path. This allows path structure such as a cell type which differentiates into an intermediate cell type that then differentiates into two mature cell types. A path structure of this form would have a "from" parameter of c(0, 1, 1) (where 0 is the origin). If no vector is given all paths will start at the origin.
[path.nSteps]
Vector giving the number of steps to
simulate along each path. If a single value is given it will be
applied to all paths. This parameter was previously called
path.length
.
[path.skew]
Vector giving the skew of each path. Values closer to 1 will give more cells towards the starting population, values closer to 0 will give more cells towards the final population. If a single value is given it will be applied to all paths.
[path.nonlinearProb]
Probability that a gene follows a non-linear path along the differentiation path. This allows more complex gene patterns such as a gene being equally expressed at the beginning an end of a path but lowly expressed in the middle.
[path.sigmaFac]
Sigma factor for non-linear gene paths. A higher value will result in more extreme non-linear variations along a path.
The parameters not shown in brackets can be estimated from real data using
splatEstimate
. For details of the Splat simulation
see splatSimulate
.
Estimate simulation parameters for the eQTL population simulation from real data. See the individual estimation functions for more details on how this is done.
splatPopEstimate( counts = NULL, means = NULL, eqtl = NULL, params = newSplatPopParams() )
splatPopEstimate( counts = NULL, means = NULL, eqtl = NULL, params = newSplatPopParams() )
counts |
either a counts matrix or a SingleCellExperiment object containing count data to estimate parameters from. |
means |
Matrix of real gene means across a population, where each row is a gene and each column is an individual in the population. |
eqtl |
data.frame with all or top eQTL pairs from a real eQTL analysis. Must include columns: 'gene_id', 'pval_nominal', and 'slope'. |
params |
SplatPopParams object containing parameters for the
simulation of the mean expression levels for the population.
See |
SplatPopParams object containing the estimated parameters.
splatPopEstimateEffectSize
,
splatPopEstimateMeanCV
if (requireNamespace("VariantAnnotation", quietly = TRUE) && requireNamespace("preprocessCore", quietly = TRUE)) { # Load example data library(scuttle) sce <- mockSCE() params <- splatPopEstimate(sce) }
if (requireNamespace("VariantAnnotation", quietly = TRUE) && requireNamespace("preprocessCore", quietly = TRUE)) { # Load example data library(scuttle) sce <- mockSCE() params <- splatPopEstimate(sce) }
S4 class that holds parameters for the splatPop simulation.
In addition to the SplatParams
parameters, splatPop simulation
requires the following parameters:
[similarity.scale]
Scaling factor for pop.cv.param.rate, where values larger than 1 increase the similarity between individuals in the population and values less than one make the individuals less similar.
[eqtl.n]
The number (>1) or percent (<=1) of genes to assign eQTL effects.
[eqtl.dist]
Maximum distance between eSNP and eGene
[eqtl.maf.min]
Minimum Minor Allele Frequency of eSNPs.
[eqtl.maf.max]
Maximum Minor Allele Frequency of eSNPs.
[eqtl.coreg]
Proportion of eGenes to have a shared eSNP (i.e., co-regulated genes)
[eqtl.group.specific]
Percent of eQTL effects to simulate as group specific.
[eqtl.condition.specific]
Percent of eQTL effects to simulate as condition specific.
eqtl.ES.shape
Shape parameter for the effect size gamma distribution.
eqtl.ES.rate
Rate parameter for the effect size gamma distribution.
pop.mean.shape
Shape parameter for the mean (i.e. bulk) expression gamma distribution
pop.mean.rate
Rate parameter for the mean (i.e. bulk) expression gamma distribution
pop.cv.param
Dataframe containing gene mean bin range, and the CV shape, and CV rate parameters for each of those bins.
batch.size
The number of donors in each pool/batch.
nCells.sample
True/False if nCells should be set as nCells or sampled from a gamma distribution for each batch/donor.
nCells.shape
Shape parameter for the nCells per batch per donor distribution.
nCells.rate
Rate parameter for the nCells per batch per donor distribution.
[nConditions]
The number of conditions/treatments to divide samples into.
[condition.prob]
Probability that a sample belongs to each condition/treatment group. Can be a vector.
[cde.prob]
Probability that a gene is differentially expressed in a condition group. Can be a vector.
[cde.downProb]
Probability that a conditionally differentially expressed gene is down-regulated. Can be a vector.
[cde.facLoc]
Location (meanlog) parameter for the conditional differential expression factor log-normal distribution. Can be a vector.
[cde.facScale]
Scale (sdlog) parameter for the conditional differential expression factor log-normal distribution. Can be a vector.
The parameters not shown in brackets can be estimated from real data using
splatPopEstimate
. For details of the eQTL simulation
see splatPopSimulate
.
Parse splatPop key information from empirical data provided.
splatPopParseEmpirical( vcf = vcf, gff = gff, eqtl = eqtl, means = means, params = params )
splatPopParseEmpirical( vcf = vcf, gff = gff, eqtl = eqtl, means = means, params = params )
vcf |
VariantAnnotation object containing genotypes of samples. |
gff |
Either NULL or a data.frame object containing a GFF/GTF file. |
eqtl |
Either NULL or if simulating population parameters directly from empirical data, a data.frame with empirical/desired eQTL results. To see required format, run 'mockEmpiricalSet()' and see eqtl output. |
means |
Either NULL or if simulating population parameters directly from empirical data, a Matrix of real gene means across a population, where each row is a gene and each column is an individual in the population. To see required format, run 'mockEmpiricalSet()' and see means output. |
params |
SplatPopParams object containing parameters for population
scale simulations. See |
NOTE: This function will cause some of the parameters in the splatPopParams object to be ignored, such as population level gene mean and variance and eQTL parameters.
This function will ignore a number of parameters defined in splatPopParams, instead pulling key information directly from provided VCF, GFF, gene means, and eQTL mapping result data provided.
A partial splatPop 'key'
For each sample, expression values are quantile normalized (qgamma) using the gamma distribution parameterized from splatEstimate(). This ensures the simulated gene means reflect the distribution expected from a sc dataset and not a bulk dataset.
splatPopQuantNorm(params, means)
splatPopQuantNorm(params, means)
params |
SplatPopParams object containing parameters for population
scale simulations. See |
means |
Mean gene expression matrix with eQTL effects. |
matrix of quantile normalized gene mean expression levels.
if (requireNamespace("VariantAnnotation", quietly = TRUE) && requireNamespace("preprocessCore", quietly = TRUE)) { bulk.means <- mockBulkMatrix(n.genes = 100, n.samples = 100) bulk.qnorm <- splatPopQuantNorm(newSplatPopParams(), bulk.means) }
if (requireNamespace("VariantAnnotation", quietly = TRUE) && requireNamespace("preprocessCore", quietly = TRUE)) { bulk.means <- mockBulkMatrix(n.genes = 100, n.samples = 100) bulk.qnorm <- splatPopQuantNorm(newSplatPopParams(), bulk.means) }
Simulate scRNA-seq count data using the splat model for a population of individuals with correlation structure.
splatPopSimulate( params = newSplatPopParams(nGenes = 50), vcf = mockVCF(), method = c("single", "groups", "paths"), gff = NULL, eqtl = NULL, means = NULL, key = NULL, counts.only = FALSE, sparsify = TRUE, verbose = TRUE, ... )
splatPopSimulate( params = newSplatPopParams(nGenes = 50), vcf = mockVCF(), method = c("single", "groups", "paths"), gff = NULL, eqtl = NULL, means = NULL, key = NULL, counts.only = FALSE, sparsify = TRUE, verbose = TRUE, ... )
params |
SplatPopParams object containing parameters for population
scale simulations. See |
vcf |
VariantAnnotation object containing genotypes of samples. |
method |
which simulation method to use. Options are "single" which produces a single population, "groups" which produces distinct groups (eg. cell types), "paths" which selects cells from continuous trajectories (eg. differentiation processes). |
gff |
Either NULL or a data.frame object containing a GFF/GTF file. |
eqtl |
Either NULL or if simulating population parameters directly from empirical data, a data.frame with empirical/desired eQTL results. To see required format, run 'mockEmpiricalSet()' and see eqtl output. |
means |
Either NULL or if simulating population parameters directly from empirical data, a Matrix of real gene means across a population, where each row is a gene and each column is an individual in the population. To see required format, run 'mockEmpiricalSet()' and see means output. |
key |
Either NULL or a data.frame object containing a full or partial splatPop key. |
counts.only |
logical. Whether to save only counts in sce object. |
sparsify |
logical. Whether to automatically convert assays to sparse matrices if there will be a size reduction. |
verbose |
logical. Whether to print progress messages. |
... |
any additional parameter settings to override what is provided in
|
This functions is for simulating data in a single step. It consists of a
call to splatPopSimulateMeans
, which simulates a mean
expression level per gene per sample, followed by a call to
splatPopSimulateSC
, which uses the splat model to simulate
single-cell counts per individual. Please see the documentation for those
functions for more details.
SingleCellExperiment object containing simulated counts, intermediate values like the gene means simulated in 'splatPopSimulateMeans', and information about the differential expression and eQTL effects assigned to each gene.
splatPopSimulateMeans
, splatPopSimulateSC
if (requireNamespace("VariantAnnotation", quietly = TRUE) && requireNamespace("preprocessCore", quietly = TRUE)) { vcf <- mockVCF() gff <- mockGFF() sim <- splatPopSimulate(vcf = vcf, gff = gff, sparsify = FALSE) }
if (requireNamespace("VariantAnnotation", quietly = TRUE) && requireNamespace("preprocessCore", quietly = TRUE)) { vcf <- mockVCF() gff <- mockGFF() sim <- splatPopSimulate(vcf = vcf, gff = gff, sparsify = FALSE) }
Simulate mean expression levels for all genes for all samples, with between sample correlation structure simulated with eQTL effects and with the option to simulate multiple groups (i.e. cell-types).
splatPopSimulateMeans( vcf = mockVCF(), params = newSplatPopParams(nGenes = 1000), verbose = TRUE, key = NULL, gff = NULL, eqtl = NULL, means = NULL, ... )
splatPopSimulateMeans( vcf = mockVCF(), params = newSplatPopParams(nGenes = 1000), verbose = TRUE, key = NULL, gff = NULL, eqtl = NULL, means = NULL, ... )
vcf |
VariantAnnotation object containing genotypes of samples. |
params |
SplatPopParams object containing parameters for population
scale simulations. See |
verbose |
logical. Whether to print progress messages. |
key |
Either FALSE or a data.frame object containing a full or partial splatPop key. |
gff |
Either NULL or a data.frame object containing a GFF/GTF file. |
eqtl |
Either NULL or if simulating population parameters directly from empirical data, a data.frame with empirical/desired eQTL results. To see required format, run 'mockEmpiricalSet()' and see eqtl output. |
means |
Either NULL or if simulating population parameters directly from empirical data, a Matrix of real gene means across a population, where each row is a gene and each column is an individual in the population. To see required format, run 'mockEmpiricalSet()' and see means output. |
... |
any additional parameter settings to override what is provided in
|
SplatPopParams can be set in a variety of ways. 1. If
not provided, default parameters are used. 2. Default parameters can be
overridden by supplying desired parameters using setParams
.
3. Parameters can be estimated from real data of your choice using
splatPopEstimate
.
'splatPopSimulateMeans' involves the following steps:
Load population key or generate random or GFF/GTF based key.
Format and subset genotype data from the VCF file.
If not in key, assign expression mean and variance to each gene.
If not in key, assign eGenes-eSNPs pairs and effect sizes.
If not in key and groups >1, assign subset of eQTL associations as group-specific and assign DEG group effects.
Simulate mean gene expression matrix without eQTL effects
Quantile normalize by sample to fit single-cell expression distribution as defined in 'splatEstimate'.
Add quantile normalized gene mean and cv info the eQTL key.
Add eQTL effects to means matrix.
A list containing: 'means' a matrix (or list of matrices if n.groups > 1) with the simulated mean gene expression value for each gene (row) and each sample (column), 'key' a data.frame with population information including eQTL and group effects, and 'condition' a named array containing conditional group assignments for each sample.
splatPopParseVCF
, splatPopParseGenes
,
splatPopAssignMeans
,
splatPopQuantNorm
, splatPopQuantNormKey
splatPopeQTLEffects
, splatPopGroupEffects
,
splatPopSimMeans
, splatPopSimEffects
,
if (requireNamespace("VariantAnnotation", quietly = TRUE) && requireNamespace("preprocessCore", quietly = TRUE)) { means <- splatPopSimulateMeans() }
if (requireNamespace("VariantAnnotation", quietly = TRUE) && requireNamespace("preprocessCore", quietly = TRUE)) { means <- splatPopSimulateMeans() }
Simulate count data for a population from a fictional single-cell RNA-seq experiment using the Splat method.
splatPopSimulateSC( sim.means, params, key, method = c("single", "groups", "paths"), counts.only = FALSE, conditions = NULL, sparsify = TRUE, verbose = TRUE, ... )
splatPopSimulateSC( sim.means, params, key, method = c("single", "groups", "paths"), counts.only = FALSE, conditions = NULL, sparsify = TRUE, verbose = TRUE, ... )
sim.means |
Matrix or list of matrices of gene means for the population. Output from 'splatPopSimulateMeans()'. |
params |
SplatPopParams object containing parameters for population
scale simulations. See |
key |
data.frame object containing a full or partial splatPop key. Output from 'splatPopSimulateMeans()'. |
method |
which simulation method to use. Options are "single" which produces a single cell population for each sample, "groups" which produces distinct groups (eg. cell types) for each sample (note, this creates separate groups from those created in 'popSimulate' with only DE effects), and "paths" which selects cells from continuous trajectories (eg. differentiation processes). |
counts.only |
logical. Whether to return only the counts. |
conditions |
named array with conditional group assignment for each sample. Output from 'splatPopSimulateMeans()'. |
sparsify |
logical. Whether to automatically convert assays to sparse matrices if there will be a size reduction. |
verbose |
logical. Whether to print progress messages. |
... |
any additional parameter settings to override what is provided in
|
SingleCellExperiment object containing simulated counts, intermediate values like the gene means simulated in 'splatPopSimulateMeans', and information about the differential expression and eQTL effects assigned to each gene.
if (requireNamespace("VariantAnnotation", quietly = TRUE) && requireNamespace("preprocessCore", quietly = TRUE)) { params <- newSplatPopParams() sim.means <- splatPopSimulateMeans() sim <- splatPopSimulateSC(sim.means$means, params, sim.means$key) }
if (requireNamespace("VariantAnnotation", quietly = TRUE) && requireNamespace("preprocessCore", quietly = TRUE)) { params <- newSplatPopParams() sim.means <- splatPopSimulateMeans() sim <- splatPopSimulateSC(sim.means$means, params, sim.means$key) }
Simulate count data from a fictional single-cell RNA-seq experiment using the Splat method.
splatSimulate( params = newSplatParams(), method = c("single", "groups", "paths"), sparsify = TRUE, verbose = TRUE, ... ) splatSimulateSingle(params = newSplatParams(), verbose = TRUE, ...) splatSimulateGroups(params = newSplatParams(), verbose = TRUE, ...) splatSimulatePaths(params = newSplatParams(), verbose = TRUE, ...)
splatSimulate( params = newSplatParams(), method = c("single", "groups", "paths"), sparsify = TRUE, verbose = TRUE, ... ) splatSimulateSingle(params = newSplatParams(), verbose = TRUE, ...) splatSimulateGroups(params = newSplatParams(), verbose = TRUE, ...) splatSimulatePaths(params = newSplatParams(), verbose = TRUE, ...)
params |
SplatParams object containing parameters for the simulation.
See |
method |
which simulation method to use. Options are "single" which produces a single population, "groups" which produces distinct groups (eg. cell types), or "paths" which selects cells from continuous trajectories (eg. differentiation processes). |
sparsify |
logical. Whether to automatically convert assays to sparse matrices if there will be a size reduction. |
verbose |
logical. Whether to print progress messages. |
... |
any additional parameter settings to override what is provided in
|
Parameters can be set in a variety of ways. If no parameters are provided
the default parameters are used. Any parameters in params
can be
overridden by supplying additional arguments through a call to
setParams
. This design allows the user flexibility in
how they supply parameters and allows small adjustments without creating a
new SplatParams
object. See examples for a demonstration of how this
can be used.
The simulation involves the following steps:
Set up simulation object
Simulate library sizes
Simulate gene means
Simulate groups/paths
Simulate BCV adjusted cell means
Simulate true counts
Simulate dropout
Create final dataset
The final output is a
SingleCellExperiment
object that
contains the simulated counts but also the values for various intermediate
steps. These are stored in the colData
(for cell specific
information), rowData
(for gene specific information) or
assays
(for gene by cell matrices) slots. This additional
information includes:
colData
Unique cell identifier.
The group or path the cell belongs to.
The expected library size for that cell.
how far along the path each cell is.
rowData
Unique gene identifier.
The base expression level for that gene.
Expression outlier factor for that gene. Values of 1 indicate the gene is not an expression outlier.
Expression level after applying outlier factors.
The batch effects factor for each gene for a particular batch.
The differential expression factor for each gene in a particular group. Values of 1 indicate the gene is not differentially expressed.
Factor applied to genes that have non-linear changes in expression along a path.
assays
The mean expression of genes in each cell after adding batch effects.
The mean expression of genes in each cell after any differential expression and adjusted for expected library size.
The Biological Coefficient of Variation for each gene in each cell.
The mean expression level of genes in each cell adjusted for BCV.
The simulated counts before dropout.
Logical matrix showing which values have been dropped in which cells.
Values that have been added by Splatter are named using UpperCamelCase
in order to differentiate them from the values added by analysis packages
which typically use underscore_naming
.
SingleCellExperiment object containing the simulated counts and intermediate values.
Zappia L, Phipson B, Oshlack A. Splatter: simulation of single-cell RNA sequencing data. Genome Biology (2017).
Paper: 10.1186/s13059-017-1305-0
Code: https://github.com/Oshlack/splatter
splatSimLibSizes
, splatSimGeneMeans
,
splatSimBatchEffects
, splatSimBatchCellMeans
,
splatSimDE
, splatSimCellMeans
,
splatSimBCVMeans
, splatSimTrueCounts
,
splatSimDropout
# Simulation with default parameters sim <- splatSimulate() # Simulation with different number of genes sim <- splatSimulate(nGenes = 1000) # Simulation with custom parameters params <- newSplatParams(nGenes = 100, mean.rate = 0.5) sim <- splatSimulate(params) # Simulation with adjusted custom parameters sim <- splatSimulate(params, mean.rate = 0.6, out.prob = 0.2) # Simulate groups sim <- splatSimulate(method = "groups") # Simulate paths sim <- splatSimulate(method = "paths")
# Simulation with default parameters sim <- splatSimulate() # Simulation with different number of genes sim <- splatSimulate(nGenes = 1000) # Simulation with custom parameters params <- newSplatParams(nGenes = 100, mean.rate = 0.5) sim <- splatSimulate(params) # Simulation with adjusted custom parameters sim <- splatSimulate(params, mean.rate = 0.6, out.prob = 0.2) # Simulate groups sim <- splatSimulate(method = "groups") # Simulate paths sim <- splatSimulate(method = "paths")
Summarise the results of diffSCEs
. Calculates the Median
Absolute Deviation (MAD), Mean Absolute Error (MAE), Root Mean Squared
Error (RMSE) and Kolmogorov-Smirnov (KS) statistics for the various
properties and ranks them.
summariseDiff(diff)
summariseDiff(diff)
diff |
Output from |
data.frame with MADs, MAEs, RMSEs, scaled statistics and ranks
sim1 <- splatSimulate(nGenes = 1000, batchCells = 20) sim2 <- simpleSimulate(nGenes = 1000, nCells = 20) difference <- diffSCEs(list(Splat = sim1, Simple = sim2), ref = "Simple") summary <- summariseDiff(difference) head(summary)
sim1 <- splatSimulate(nGenes = 1000, batchCells = 20) sim2 <- simpleSimulate(nGenes = 1000, nCells = 20) difference <- diffSCEs(list(Splat = sim1, Simple = sim2), ref = "Simple") summary <- summariseDiff(difference) head(summary)
Estimate simulation parameters for the ZINB-WaVE simulation from a real dataset.
zinbEstimate( counts, design.samples = NULL, design.genes = NULL, common.disp = TRUE, iter.init = 2, iter.opt = 25, stop.opt = 1e-04, params = newZINBParams(), verbose = TRUE, BPPARAM = SerialParam(), ... ) ## S3 method for class 'SingleCellExperiment' zinbEstimate( counts, design.samples = NULL, design.genes = NULL, common.disp = TRUE, iter.init = 2, iter.opt = 25, stop.opt = 1e-04, params = newZINBParams(), verbose = TRUE, BPPARAM = SerialParam(), ... ) ## S3 method for class 'matrix' zinbEstimate( counts, design.samples = NULL, design.genes = NULL, common.disp = TRUE, iter.init = 2, iter.opt = 25, stop.opt = 1e-04, params = newZINBParams(), verbose = TRUE, BPPARAM = SerialParam(), ... )
zinbEstimate( counts, design.samples = NULL, design.genes = NULL, common.disp = TRUE, iter.init = 2, iter.opt = 25, stop.opt = 1e-04, params = newZINBParams(), verbose = TRUE, BPPARAM = SerialParam(), ... ) ## S3 method for class 'SingleCellExperiment' zinbEstimate( counts, design.samples = NULL, design.genes = NULL, common.disp = TRUE, iter.init = 2, iter.opt = 25, stop.opt = 1e-04, params = newZINBParams(), verbose = TRUE, BPPARAM = SerialParam(), ... ) ## S3 method for class 'matrix' zinbEstimate( counts, design.samples = NULL, design.genes = NULL, common.disp = TRUE, iter.init = 2, iter.opt = 25, stop.opt = 1e-04, params = newZINBParams(), verbose = TRUE, BPPARAM = SerialParam(), ... )
counts |
either a counts matrix or a SingleCellExperiment object containing count data to estimate parameters from. |
design.samples |
design matrix of sample-level covariates. |
design.genes |
design matrix of gene-level covariates. |
common.disp |
logical. Whether or not a single dispersion for all features is estimated. |
iter.init |
number of iterations to use for initialization. |
iter.opt |
number of iterations to use for optimization. |
stop.opt |
stopping criterion for optimization. |
params |
ZINBParams object to store estimated values in. |
verbose |
logical. Whether to print progress messages. |
BPPARAM |
A |
... |
additional arguments passes to |
The function is a wrapper around zinbFit
that takes
the fitted model and inserts it into a ZINBParams
object. See
ZINBParams
for more details on the parameters and
zinbFit
for details of the estimation procedure.
ZINBParams object containing the estimated parameters.
if (requireNamespace("zinbwave", quietly = TRUE)) { library(scuttle) set.seed(1) sce <- mockSCE(ncells = 20, ngenes = 100) params <- zinbEstimate(sce) params }
if (requireNamespace("zinbwave", quietly = TRUE)) { library(scuttle) set.seed(1) sce <- mockSCE(ncells = 20, ngenes = 100) params <- zinbEstimate(sce) params }
S4 class that holds parameters for the ZINB-WaVE simulation.
The ZINB-WaVE simulation uses the following parameters:
nGenes
The number of genes to simulate.
nCells
The number of cells to simulate.
[seed]
Seed to use for generating random numbers.
model
Object describing a ZINB model.
The majority of the parameters for this simulation are stored in a
ZinbModel
object. Please refer to the documentation
for this class and its constructor(zinbModel
) for
details about all the parameters.
The parameters not shown in brackets can be estimated from real data using
zinbEstimate
. For details of the ZINB-WaVE simulation
see zinbSimulate
.
Simulate counts using the ZINB-WaVE method.
zinbSimulate(params = newZINBParams(), sparsify = TRUE, verbose = TRUE, ...)
zinbSimulate(params = newZINBParams(), sparsify = TRUE, verbose = TRUE, ...)
params |
ZINBParams object containing simulation parameters. |
sparsify |
logical. Whether to automatically convert assays to sparse matrices if there will be a size reduction. |
verbose |
logical. Whether to print progress messages |
... |
any additional parameter settings to override what is provided in
|
This function is just a wrapper around zinbSim
that
takes a ZINBParams
, runs the simulation then converts the
output to a SingleCellExperiment
object.
See zinbSim
and the ZINB-WaVE paper for
more details about how the simulation works.
SingleCellExperiment containing simulated counts
Campbell K, Yau C. Uncovering genomic trajectories with heterogeneous genetic and environmental backgrounds across single-cells and populations. bioRxiv (2017).
Risso D, Perraudeau F, Gribkova S, Dudoit S, Vert J-P. ZINB-WaVE: A general and flexible method for signal extraction from single-cell RNA-seq data bioRxiv (2017).
Paper: 10.1101/125112
Code: https://github.com/drisso/zinbwave
if (requireNamespace("zinbwave", quietly = TRUE)) { sim <- zinbSimulate() }
if (requireNamespace("zinbwave", quietly = TRUE)) { sim <- zinbSimulate() }