scalex.SCALEX
- scalex.SCALEX(data_list=None, batch_categories=None, profile='RNA', batch_name='batch', min_features=600, min_cells=3, target_sum=None, n_top_features=None, join='inner', batch_key='batch', processed=False, fraction=None, n_obs=None, use_layer='X', backed=False, batch_size=64, lr=0.0002, max_iteration=30000, seed=124, gpu=0, outdir=None, projection=None, repeat=False, impute=None, chunk_size=20000, ignore_umap=False, verbose=False, assess=False, show=True, eval=False, num_workers=4)
Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space
- Parameters:
data_list (
Union
[str
,AnnData
,List
,None
]) – A path list of AnnData matrices to concatenate with. Each matrix is referred to as a ‘batch’.batch_categories (
Optional
[List
]) – Categories for the batch annotation. By default, use increasing numbers.profile (
str
) – Specify the single-cell profile, RNA or ATAC. Default: RNA.batch_name (
str
) – Use this annotation in obs as batches for training model. Default: ‘batch’.min_features (
int
) – Filtered out cells that are detected in less than min_features. Default: 600.min_cells (
int
) – Filtered out genes that are detected in less than min_cells. Default: 3.n_top_features (
Optional
[int
]) – Number of highly-variable genes to keep. Default: 2000.join (
str
) – Use intersection (‘inner’) or union (‘outer’) of variables of different batches.batch_key (
str
) – Add the batch annotation to obs using this key. By default, batch_key=’batch’.batch_size (
int
) – Number of samples per batch to load. Default: 64.lr (
float
) – Learning rate. Default: 2e-4.max_iteration (
int
) – Max iterations for training. Training one batch_size samples is one iteration. Default: 30000.seed (
int
) – Random seed for torch and numpy. Default: 124.gpu (
int
) – Index of GPU to use if GPU is available. Default: 0.outdir (
Optional
[str
]) – Output directory. Default: ‘output/’.projection (
Optional
[str
]) – Use for new dataset projection. Input the folder containing the pre-trained model. If None, don’t do projection. Default: None.repeat (
bool
) – Use with projection. If False, concatenate the reference and projection datasets for downstream analysis. If True, only use projection datasets. Default: False.impute (
Optional
[str
]) – If True, calculate the imputed gene expression and store it at adata.layers[‘impute’]. Default: False.chunk_size (
int
) – Number of samples from the same batch to transform. Default: 20000.ignore_umap (
bool
) – If True, do not perform UMAP for visualization and leiden for clustering. Default: False.verbose (
bool
) – Verbosity, True or False. Default: False.assess (
bool
) – If True, calculate the entropy_batch_mixing score and silhouette score to evaluate integration results. Default: False.
- Return type:
AnnData
- Returns:
The output folder contains
adata.h5ad – The AnnData matrice after batch effects removal. The low-dimensional representation of the data is stored at adata.obsm[‘latent’].
checkpoint – model.pt contains the variables of the model and config.pt contains the parameters of the model.
log.txt – Records raw data information, filter conditions, model parameters etc.
umap.pdf – UMAP plot for visualization.