scalex.data.load_data

scalex.data.load_data(data_list, batch_categories=None, profile='RNA', join='inner', batch_key='batch', batch_name='batch', min_features=600, min_cells=3, target_sum=None, n_top_features=None, backed=False, batch_size=64, chunk_size=20000, fraction=None, n_obs=None, processed=False, log=None, num_workers=4, use_layer='X')

Load dataset with preprocessing

Parameters:
  • data_list – A path list of AnnData matrices to concatenate with. Each matrix is referred to as a ‘batch’.

  • batch_categories – Categories for the batch annotation. By default, use increasing numbers.

  • join – Use intersection (‘inner’) or union (‘outer’) of variables of different batches. Default: ‘inner’.

  • batch_key – Add the batch annotation to obs using this key. Default: ‘batch’.

  • batch_name – Use this annotation in obs as batches for training model. Default: ‘batch’.

  • min_features – Filtered out cells that are detected in less than min_features. Default: 600.

  • min_cells – Filtered out genes that are detected in less than min_cells. Default: 3.

  • n_top_features – Number of highly-variable genes to keep. Default: 2000.

  • batch_size – Number of samples per batch to load. Default: 64.

  • chunk_size – Number of samples from the same batch to transform. Default: 20000.

  • log – If log, record each operation in the log file. Default: None.

Returns:

  • adata – The AnnData object after combination and preprocessing.

  • trainloader – An iterable over the given dataset for training.

  • testloader – An iterable over the given dataset for testing