topact.countdata
Classes storing gene expression data.
Module Contents
Classes
A collection of gene expression readings. |
|
A collection of gene expression readings recorded in a sparse matrix |
|
A collection of gene expression readings. |
Functions
|
Given values and a pattern returns all values matching the pattern. |
|
- topact.countdata.matching(values, pattern)
Given values and a pattern returns all values matching the pattern.
- Parameters
values (Sequence[str]) – A sequence of strings.
pattern (Pattern | Collection[str]) – A regular expression or a collection of strings.
- Returns
All elements of values which match or are an element of pattern.
- Raises
TypeError – If pattern is neither a regular expression nor a collection.
- Return type
Iterable[str]
- topact.countdata._apply_or_not(func, values, default)
- Parameters
func (Callable[[str], int]) –
values (Iterable[str] | Iterable[int] | None) –
default (Iterable[int]) –
- Return type
Iterator[int]
- class topact.countdata.CountData(genes=None, samples=None, num_genes=None, num_samples=None)
Bases:
abc.ABCA collection of gene expression readings.
- Parameters
genes (MutableSequence[str] | None) –
samples (MutableSequence[str] | None) –
num_genes (int | None) –
num_samples (int | None) –
- genes
An ordered list of gene identifiers.
- samples
An ordered list of sample identifiers.
- num_genes
The number of genes in the domain, i.e. len(genes).
- num_samples
The number of samples in the domain, i.e. len(samples).
- metadata
A dataframe where each row corresponds to a sample.
- add_metadata(header, values)
Add values to metadata under a header.
If values is a mapping then this is used to infer new entries. Otherwise, it is assumed that the metadata entry for sample i is simply given by value i.
- Parameters
header (str) – A string denoting the column name for the new data.
values (Mapping[str, Any] | Sequence[Any]) – The new metadata. Either a mapping from samples or a sequence of the same length as samples.
- abstract filter_genes(pattern)
Filters genes according to a pattern.
Edits the count data so that only genes matching the pattern are included.
- Parameters
pattern (Pattern | Collection[str]) – Either a regular expression or a collection identifying gene identifiers to be kept.
- match_by_metadata(header, pattern)
Returns all samples matching the given metadata value.
- Parameters
header (str) – A header of the object’s metadata.
pattern (str) – A string to be matched against.
- Returns
An iterable of all samples whose metadata value under the given header matches the pattern.
- Return type
Iterator[str]
- group_by_metadata(header)
Returns all samples organised by the given metadata value.
- Parameters
header (str) – A header of the object’s metadata.
- Returns
A dictionary whose keys are all values under the given header, each mapping to an iterable of all samples matching that header.
- Return type
Dict[str, Iterator[str]]
- class topact.countdata.CountMatrix(matrix, **kwargs)
Bases:
CountDataA collection of gene expression readings recorded in a sparse matrix
- matrix
A sparse matrix whose [i,j]th entry is the expression of gene[j] in sample[i].
- expression(samples=None, genes=None)
The expression sub-matrix for the given samples and genes.
- Parameters
samples (Sequence[str] | Sequence[int] | None) – A sequence of either sample identifiers or sample indices.
genes (Sequence[str] | Sequence[int] | None) – A sequence of either gene identifiers or gene indices.
- Returns
A 2D sparse array containing the expression of the given genes in the given samples.
- avg_expression(samples=None, genes=None)
The average expression sub-matrix for the given samples and genes
- Parameters
samples (Sequence[str] | Sequence[int] | None) –
genes (Sequence[str] | Sequence[int] | None) –
- expressed_genes(samples=None, output_type='ident')
Returns all genes expressed at least once in a list of samples
- Parameters
samples (Sequence[str] | Sequence[int] | None) –
output_type (str) –
- Return type
Iterator[int] | Iterator[str]
- rescale_genes(factors)
Rescales gene expression according to the given factors.
Column j of the gene matrix is multipled by the jth value of factors.
- Parameters
factors (Sequence[float]) – A sequence of factors by which columns are rescaled.
- filter_genes(pattern)
Filters genes according to a pattern.
Edits the count data so that only genes matching the pattern are included.
- Parameters
pattern (Pattern | Collection[str]) – Either a regular expression or a collection identifying gene identifiers to be kept.
- to_count_table(**kwargs)
Converts the CountMatrix into a CountTable.
- Parameters
gene_col – The name of the gene ID column.
sample_col – The name of the sample ID column.
count_col – The name of the count column.
- Returns
A CountTable holding the same expression data.
- class topact.countdata.CountTable(table, genes=None, samples=None, gene_col='gene', sample_col='sample', count_col='count', **kwargs)
Bases:
CountDataA collection of gene expression readings.
- Parameters
genes (MutableSequence[str] | None) –
samples (MutableSequence[str] | None) –
gene_col (str) –
sample_col (str) –
count_col (str) –
- genes
An ordered list of gene identifiers.
- samples
An ordered list of sample identifiers.
- num_genes
The number of genes in the domain, i.e. len(genes).
- num_samples
The number of samples in the domain, i.e. len(samples).
- metadata
A dataframe where each row corresponds to a sample.
- filter_genes(pattern)
Filters genes according to a pattern.
Edits the count data so that only genes matching the pattern are included.
- Parameters
pattern (Pattern | Collection[str]) – Either a regular expression or a collection identifying gene identifiers to be kept.
- abstract toCountMatrix()