scalex.SCALEX

scalex.SCALEX(data_list=None, batch_categories=None, profile='RNA', join='inner', batch_key='batch', batch_name='batch', min_features=600, min_cells=3, target_sum=None, n_top_features=None, processed=False, batch_size=64, lr=0.0002, max_iteration=30000, seed=124, gpu=0, outdir='output/', projection=None, repeat=False, impute=None, chunk_size=20000, ignore_umap=False, verbose=False, assess=False, show=True, eval=False, test_list=None, test_batch_categories=None)

Single-Cell integrative Analysis via Latent feature Extraction

Parameters
  • data_list – A path list of AnnData matrices to concatenate with. Each matrix is referred to as a ‘batch’.

  • batch_categories – Categories for the batch annotation. By default, use increasing numbers.

  • profile – Specify the single-cell profile, RNA or ATAC. Default: RNA.

  • join – Use intersection (‘inner’) or union (‘outer’) of variables of different batches.

  • batch_key – Add the batch annotation to obs using this key. By default, batch_key=’batch’.

  • batch_name – Use this annotation in obs as batches for training model. Default: ‘batch’.

  • min_features – Filtered out cells that are detected in less than min_features. Default: 600.

  • min_cells – Filtered out genes that are detected in less than min_cells. Default: 3.

  • n_top_features – Number of highly-variable genes to keep. Default: 2000.

  • batch_size – Number of samples per batch to load. Default: 64.

  • lr – Learning rate. Default: 2e-4.

  • max_iteration – Max iterations for training. Training one batch_size samples is one iteration. Default: 30000.

  • seed – Random seed for torch and numpy. Default: 124.

  • gpu – Index of GPU to use if GPU is available. Default: 0.

  • outdir – Output directory. Default: ‘output/’.

  • projection – Use for new dataset projection. Input the folder containing the pre-trained model. If None, don’t do projection. Default: None.

  • repeat – Use with projection. If False, concatenate the reference and projection datasets for downstream analysis. If True, only use projection datasets. Default: False.

  • impute – If True, calculate the imputed gene expression and store it at adata.layers[‘impute’]. Default: False.

  • chunk_size – Number of samples from the same batch to transform. Default: 20000.

  • ignore_umap – If True, do not perform UMAP for visualization and leiden for clustering. Default: False.

  • verbose – Verbosity, True or False. Default: False.

  • assess – If True, calculate the entropy_batch_mixing score and silhouette score to evaluate integration results. Default: False.

Returns

  • The output folder contains

  • adata.h5ad – The AnnData matrice after batch effects removal. The low-dimensional representation of the data is stored at adata.obsm[‘latent’].

  • checkpoint – model.pt contains the variables of the model and config.pt contains the parameters of the model.

  • log.txt – Records raw data information, filter conditions, model parameters etc.

  • umap.pdf – UMAP plot for visualization.