Usage
SCALEX provide both commanline tool and api function used in jupyter notebook
Command line
Run SCALEX after installation:
SCALEX.py --data_list data1 data2 --batch_categories batch_name1 batch_name2
data_list
: data path of each batch of single-cell dataset
batch_categories
: name of each batch, batch_categories will range from 0 to N if not specified
Input
Input can be one of following:
single file of format h5ad, csv, txt, mtx or their compression file
multiple files of above format
Note
h5ad file input
SCALEX will use the
batch
column in the obs of adata format read from h5ad file as batch informationUsers can specify any columns in the obs with option:
--batch_name
nameIf multiple inputs are given, SCALEX can take each file as individual batch by default, and overload previous batch information, users can change the concat name via option
--batch_key
other_name
Output
Output will be saved in the output folder including:
checkpoint: saved model to reproduce results cooperated with option
--checkpoint
or -cadata.h5ad: preprocessed data and results including, latent, clustering and imputation
umap.png: UMAP visualization of latent representations of cells
log.txt: log file of training process
Useful options
output folder for saveing results: [-o] or [–outdir]
filter rare genes, default 3: [–min_cell]
filter low quality cells, default 600: [–min_gene]
select the number of highly variable genes, keep all genes with -1, default 2000: [–n_top_genes]
Help
Look for more usage of SCALEX:
SCALEX.py --help
API function
Use SCALEX in jupyter notebook:
from scalex.function import SCALEX
adata = SCALEX(data_list, batch_categories)
- or
adata = SCALEX([adata_1, adata_2])
Function of parameters are similar to command line options. Input can be the files of adata or a list of AnnData or one concatenated AnnData Output is a Anndata object for further analysis with scanpy.
AnnData
SCALEX supports scanpy
and anndata
, which provides the AnnData
class.
At the most basic level, an AnnData
object adata stores
a data matrix adata.X, annotation of observations
adata.obs and variables adata.var as pd.DataFrame and unstructured
annotation adata.uns as dict. Names of observations and
variables can be accessed via adata.obs_names and adata.var_names,
respectively. AnnData
objects can be sliced like
dataframes, for example, adata_subset = adata[:, list_of_gene_names].
For more, see this blog post.
To read a data file to an AnnData
object, call:
import scanpy as sc
adata = sc.read(filename)
to initialize an AnnData
object. Possibly add further annotation using, e.g., pd.read_csv:
import pandas as pd
anno = pd.read_csv(filename_sample_annotation)
adata.obs['cell_groups'] = anno['cell_groups'] # categorical annotation of type pandas.Categorical
adata.obs['time'] = anno['time'] # numerical annotation of type float
# alternatively, you could also set the whole dataframe
# adata.obs = anno
To write, use:
adata.write(filename)
adata.write_csvs(filename)
adata.write_loom(filename)