scalex.SCALEX

scalex.SCALEX(data_list=None, batch_categories=None, profile='RNA', join='inner', batch_key='batch', batch_name='batch', min_features=600, min_cells=3, target_sum=None, n_top_features=None, processed=False, batch_size=64, lr=0.0002, max_iteration=30000, seed=124, gpu=0, outdir='output/', projection=None, repeat=False, impute=None, chunk_size=20000, ignore_umap=False, verbose=False, assess=False, show=True, eval=False, test_list=None, test_batch_categories=None)

Single-Cell integrative Analysis via Latent feature Extraction

Parameters

data_list – A path list of AnnData matrices to concatenate with. Each matrix is referred to as a ‘batch’.
batch_categories – Categories for the batch annotation. By default, use increasing numbers.
profile – Specify the single-cell profile, RNA or ATAC. Default: RNA.
join – Use intersection (‘inner’) or union (‘outer’) of variables of different batches.
batch_key – Add the batch annotation to obs using this key. By default, batch_key=’batch’.
batch_name – Use this annotation in obs as batches for training model. Default: ‘batch’.
min_features – Filtered out cells that are detected in less than min_features. Default: 600.
min_cells – Filtered out genes that are detected in less than min_cells. Default: 3.
n_top_features – Number of highly-variable genes to keep. Default: 2000.
batch_size – Number of samples per batch to load. Default: 64.
lr – Learning rate. Default: 2e-4.
max_iteration – Max iterations for training. Training one batch_size samples is one iteration. Default: 30000.
seed – Random seed for torch and numpy. Default: 124.
gpu – Index of GPU to use if GPU is available. Default: 0.
outdir – Output directory. Default: ‘output/’.
projection – Use for new dataset projection. Input the folder containing the pre-trained model. If None, don’t do projection. Default: None.
repeat – Use with projection. If False, concatenate the reference and projection datasets for downstream analysis. If True, only use projection datasets. Default: False.
impute – If True, calculate the imputed gene expression and store it at adata.layers[‘impute’]. Default: False.
chunk_size – Number of samples from the same batch to transform. Default: 20000.
ignore_umap – If True, do not perform UMAP for visualization and leiden for clustering. Default: False.
verbose – Verbosity, True or False. Default: False.
assess – If True, calculate the entropy_batch_mixing score and silhouette score to evaluate integration results. Default: False.

Returns

The output folder contains
adata.h5ad – The AnnData matrice after batch effects removal. The low-dimensional representation of the data is stored at adata.obsm[‘latent’].
checkpoint – model.pt contains the variables of the model and config.pt contains the parameters of the model.
log.txt – Records raw data information, filter conditions, model parameters etc.
umap.pdf – UMAP plot for visualization.