scalex.data.load_data

scalex.data.load_data(data_list, batch_categories=None, profile='RNA', join='inner', batch_key='batch', batch_name='batch', min_features=600, min_cells=3, target_sum=None, n_top_features=None, backed=False, batch_size=64, chunk_size=20000, fraction=None, n_obs=None, processed=False, log=None, num_workers=4, use_layer='X')

Load dataset with preprocessing

Parameters:

data_list – A path list of AnnData matrices to concatenate with. Each matrix is referred to as a ‘batch’.
batch_categories – Categories for the batch annotation. By default, use increasing numbers.
join – Use intersection (‘inner’) or union (‘outer’) of variables of different batches. Default: ‘inner’.
batch_key – Add the batch annotation to obs using this key. Default: ‘batch’.
batch_name – Use this annotation in obs as batches for training model. Default: ‘batch’.
min_features – Filtered out cells that are detected in less than min_features. Default: 600.
min_cells – Filtered out genes that are detected in less than min_cells. Default: 3.
n_top_features – Number of highly-variable genes to keep. Default: 2000.
batch_size – Number of samples per batch to load. Default: 64.
chunk_size – Number of samples from the same batch to transform. Default: 20000.
log – If log, record each operation in the log file. Default: None.

Returns:

adata – The AnnData object after combination and preprocessing.
trainloader – An iterable over the given dataset for training.
testloader – An iterable over the given dataset for testing