bx.motif.pwm module

Classes for working with position specific matrices.

class bx.motif.pwm.BaseMatrix(alphabet=None, sorted_alphabet=None, char_to_index=None, values=None)

Bases: object

Base class for position specific matrices.

classmethod create_from_other(other, values=None)

Create a new Matrix with attributes taken from other but with the values taken from values if provided

classmethod from_rows(alphabet, rows)

Create a new matrix for a sequence over alphabet alphabet taking values from rows which is a list whose length is the width of the matrix, and whose elements are lists of values associated with each character (in the order those characters appear in alphabet).

reverse_complement()

Create the reverse complement of this matrix. The result probably only makese sense if the alphabet is that of DNA (‘A’,’C’,’G’,’T’).

property width

Return the width (size along the sequence axis) of this matrix.

class bx.motif.pwm.FrequencyMatrix(alphabet=None, sorted_alphabet=None, char_to_index=None, values=None)

Bases: BaseMatrix

A position specific count/frequency matrix.

DEFAULT_CORRECTION = 1e-10

Default value to use for correcting when dealing with counts of zero, chosen to produce scoring matrices that are the same as produced by CREAD.

to_logodds_scoring_matrix(background=None, correction=1e-10)

Create a standard logodds scoring matrix.

to_stormo_scoring_matrix(background=None)

Create a scoring matrix from this count matrix using the method from:

Hertz, G.Z. and G.D. Stormo (1999). Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7): 563-577.

class bx.motif.pwm.ScoringMatrix(alphabet=None, sorted_alphabet=None, char_to_index=None, values=None)

Bases: BaseMatrix

A position specific matrix containing values that are suitable for scoring a sequence.

score_string(string)

Score each valid position in string using this scoring matrix. Positions which were not scored are set to nan.

score_string_with_gaps(string)

Score each valid position in string using this scoring matrix. Positions which were not scored are set to nan. Gap characters are ignored (matrices score across them).