scalex.SCALEX

scalex.SCALEX(data_list=None, batch_categories=None, profile='RNA', batch_name='batch', min_features=600, min_cells=3, target_sum=None, n_top_features=None, join='inner', batch_key='batch', processed=False, fraction=None, n_obs=None, use_layer='X', backed=False, batch_size=64, lr=0.0002, max_iteration=30000, seed=124, gpu=0, outdir=None, projection=None, repeat=False, impute=None, chunk_size=20000, ignore_umap=False, verbose=False, assess=False, show=True, eval=False, num_workers=4)

Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space

Parameters:

data_list (Union[str, AnnData, List, None]) – A path list of AnnData matrices to concatenate with. Each matrix is referred to as a ‘batch’.
batch_categories (Optional[List]) – Categories for the batch annotation. By default, use increasing numbers.
profile (str) – Specify the single-cell profile, RNA or ATAC. Default: RNA.
batch_name (str) – Use this annotation in obs as batches for training model. Default: ‘batch’.
min_features (int) – Filtered out cells that are detected in less than min_features. Default: 600.
min_cells (int) – Filtered out genes that are detected in less than min_cells. Default: 3.
n_top_features (Optional[int]) – Number of highly-variable genes to keep. Default: 2000.
join (str) – Use intersection (‘inner’) or union (‘outer’) of variables of different batches.
batch_key (str) – Add the batch annotation to obs using this key. By default, batch_key=’batch’.
batch_size (int) – Number of samples per batch to load. Default: 64.
lr (float) – Learning rate. Default: 2e-4.
max_iteration (int) – Max iterations for training. Training one batch_size samples is one iteration. Default: 30000.
seed (int) – Random seed for torch and numpy. Default: 124.
gpu (int) – Index of GPU to use if GPU is available. Default: 0.
outdir (Optional[str]) – Output directory. Default: ‘output/’.
projection (Optional[str]) – Use for new dataset projection. Input the folder containing the pre-trained model. If None, don’t do projection. Default: None.
repeat (bool) – Use with projection. If False, concatenate the reference and projection datasets for downstream analysis. If True, only use projection datasets. Default: False.
impute (Optional[str]) – If True, calculate the imputed gene expression and store it at adata.layers[‘impute’]. Default: False.
chunk_size (int) – Number of samples from the same batch to transform. Default: 20000.
ignore_umap (bool) – If True, do not perform UMAP for visualization and leiden for clustering. Default: False.
verbose (bool) – Verbosity, True or False. Default: False.
assess (bool) – If True, calculate the entropy_batch_mixing score and silhouette score to evaluate integration results. Default: False.

Return type:

AnnData

Returns:

The output folder contains
adata.h5ad – The AnnData matrice after batch effects removal. The low-dimensional representation of the data is stored at adata.obsm[‘latent’].
checkpoint – model.pt contains the variables of the model and config.pt contains the parameters of the model.
log.txt – Records raw data information, filter conditions, model parameters etc.
umap.pdf – UMAP plot for visualization.