topact.countdata

Classes storing gene expression data.

Module Contents

Classes

CountData

A collection of gene expression readings.

CountMatrix

A collection of gene expression readings recorded in a sparse matrix

CountTable

A collection of gene expression readings.

Functions

matching(values, pattern)

Given values and a pattern returns all values matching the pattern.

_apply_or_not(func, values, default)

topact.countdata.matching(values, pattern)

Given values and a pattern returns all values matching the pattern.

Parameters
  • values (Sequence[str]) – A sequence of strings.

  • pattern (Pattern | Collection[str]) – A regular expression or a collection of strings.

Returns

All elements of values which match or are an element of pattern.

Raises

TypeError – If pattern is neither a regular expression nor a collection.

Return type

Iterable[str]

topact.countdata._apply_or_not(func, values, default)
Parameters
  • func (Callable[[str], int]) –

  • values (Iterable[str] | Iterable[int] | None) –

  • default (Iterable[int]) –

Return type

Iterator[int]

class topact.countdata.CountData(genes=None, samples=None, num_genes=None, num_samples=None)

Bases: abc.ABC

A collection of gene expression readings.

Parameters
  • genes (MutableSequence[str] | None) –

  • samples (MutableSequence[str] | None) –

  • num_genes (int | None) –

  • num_samples (int | None) –

genes

An ordered list of gene identifiers.

samples

An ordered list of sample identifiers.

num_genes

The number of genes in the domain, i.e. len(genes).

num_samples

The number of samples in the domain, i.e. len(samples).

metadata

A dataframe where each row corresponds to a sample.

add_metadata(header, values)

Add values to metadata under a header.

If values is a mapping then this is used to infer new entries. Otherwise, it is assumed that the metadata entry for sample i is simply given by value i.

Parameters
  • header (str) – A string denoting the column name for the new data.

  • values (Mapping[str, Any] | Sequence[Any]) – The new metadata. Either a mapping from samples or a sequence of the same length as samples.

abstract filter_genes(pattern)

Filters genes according to a pattern.

Edits the count data so that only genes matching the pattern are included.

Parameters

pattern (Pattern | Collection[str]) – Either a regular expression or a collection identifying gene identifiers to be kept.

match_by_metadata(header, pattern)

Returns all samples matching the given metadata value.

Parameters
  • header (str) – A header of the object’s metadata.

  • pattern (str) – A string to be matched against.

Returns

An iterable of all samples whose metadata value under the given header matches the pattern.

Return type

Iterator[str]

group_by_metadata(header)

Returns all samples organised by the given metadata value.

Parameters

header (str) – A header of the object’s metadata.

Returns

A dictionary whose keys are all values under the given header, each mapping to an iterable of all samples matching that header.

Return type

Dict[str, Iterator[str]]

class topact.countdata.CountMatrix(matrix, **kwargs)

Bases: CountData

A collection of gene expression readings recorded in a sparse matrix

matrix

A sparse matrix whose [i,j]th entry is the expression of gene[j] in sample[i].

expression(samples=None, genes=None)

The expression sub-matrix for the given samples and genes.

Parameters
  • samples (Sequence[str] | Sequence[int] | None) – A sequence of either sample identifiers or sample indices.

  • genes (Sequence[str] | Sequence[int] | None) – A sequence of either gene identifiers or gene indices.

Returns

A 2D sparse array containing the expression of the given genes in the given samples.

avg_expression(samples=None, genes=None)

The average expression sub-matrix for the given samples and genes

Parameters
  • samples (Sequence[str] | Sequence[int] | None) –

  • genes (Sequence[str] | Sequence[int] | None) –

expressed_genes(samples=None, output_type='ident')

Returns all genes expressed at least once in a list of samples

Parameters
  • samples (Sequence[str] | Sequence[int] | None) –

  • output_type (str) –

Return type

Iterator[int] | Iterator[str]

rescale_genes(factors)

Rescales gene expression according to the given factors.

Column j of the gene matrix is multipled by the jth value of factors.

Parameters

factors (Sequence[float]) – A sequence of factors by which columns are rescaled.

filter_genes(pattern)

Filters genes according to a pattern.

Edits the count data so that only genes matching the pattern are included.

Parameters

pattern (Pattern | Collection[str]) – Either a regular expression or a collection identifying gene identifiers to be kept.

to_count_table(**kwargs)

Converts the CountMatrix into a CountTable.

Parameters
  • gene_col – The name of the gene ID column.

  • sample_col – The name of the sample ID column.

  • count_col – The name of the count column.

Returns

A CountTable holding the same expression data.

class topact.countdata.CountTable(table, genes=None, samples=None, gene_col='gene', sample_col='sample', count_col='count', **kwargs)

Bases: CountData

A collection of gene expression readings.

Parameters
  • genes (MutableSequence[str] | None) –

  • samples (MutableSequence[str] | None) –

  • gene_col (str) –

  • sample_col (str) –

  • count_col (str) –

genes

An ordered list of gene identifiers.

samples

An ordered list of sample identifiers.

num_genes

The number of genes in the domain, i.e. len(genes).

num_samples

The number of samples in the domain, i.e. len(samples).

metadata

A dataframe where each row corresponds to a sample.

filter_genes(pattern)

Filters genes according to a pattern.

Edits the count data so that only genes matching the pattern are included.

Parameters

pattern (Pattern | Collection[str]) – Either a regular expression or a collection identifying gene identifiers to be kept.

abstract toCountMatrix()