Usage

SCALEX provide both commanline tool and api function used in jupyter notebook

Command line

Run SCALEX after installation:

SCALEX.py --data_list data1 data2 --batch_categories batch_name1 batch_name2

data_list: data path of each batch of single-cell dataset

batch_categories: name of each batch, batch_categories will range from 0 to N if not specified

Input

Input can be one of following:

  • single file of format h5ad, csv, txt, mtx or their compression file

  • multiple files of above format

Note

h5ad file input

  • SCALEX will use the batch column in the obs of adata format read from h5ad file as batch information

  • Users can specify any columns in the obs with option: --batch_name name

  • If multiple inputs are given, SCALEX can take each file as individual batch by default, and overload previous batch information, users can change the concat name via option --batch_key other_name

Output

Output will be saved in the output folder including:

  • checkpoint: saved model to reproduce results cooperated with option --checkpoint or -c

  • adata.h5ad: preprocessed data and results including, latent, clustering and imputation

  • umap.png: UMAP visualization of latent representations of cells

  • log.txt: log file of training process

Useful options

  • output folder for saveing results: [-o] or [–outdir]

  • filter rare genes, default 3: [–min_cell]

  • filter low quality cells, default 600: [–min_gene]

  • select the number of highly variable genes, keep all genes with -1, default 2000: [–n_top_genes]

Help

Look for more usage of SCALEX:

SCALEX.py --help

API function

Use SCALEX in jupyter notebook:

from scalex.function import SCALEX
adata = SCALEX(data_list, batch_categories)
or

adata = SCALEX([adata_1, adata_2])

Function of parameters are similar to command line options. Input can be the files of adata or a list of AnnData or one concatenated AnnData Output is a Anndata object for further analysis with scanpy.

AnnData

SCALEX supports scanpy and anndata, which provides the AnnData class.

http://falexwolf.de/img/scanpy/anndata.svg

At the most basic level, an AnnData object adata stores a data matrix adata.X, annotation of observations adata.obs and variables adata.var as pd.DataFrame and unstructured annotation adata.uns as dict. Names of observations and variables can be accessed via adata.obs_names and adata.var_names, respectively. AnnData objects can be sliced like dataframes, for example, adata_subset = adata[:, list_of_gene_names]. For more, see this blog post.

To read a data file to an AnnData object, call:

import scanpy as sc
adata = sc.read(filename)

to initialize an AnnData object. Possibly add further annotation using, e.g., pd.read_csv:

import pandas as pd
anno = pd.read_csv(filename_sample_annotation)
adata.obs['cell_groups'] = anno['cell_groups']  # categorical annotation of type pandas.Categorical
adata.obs['time'] = anno['time']                # numerical annotation of type float
# alternatively, you could also set the whole dataframe
# adata.obs = anno

To write, use:

adata.write(filename)
adata.write_csvs(filename)
adata.write_loom(filename)