bx.seqmapping module

Classes for char-to-int mapping and int-to-int mapping.

Author:: James Taylor (james@bx.psu.edu)

The char-to-int mapping can be used to translate a list of strings over some alphabet to a single int array (example for encoding a multiple sequence alignment).

The int-to-int mapping is particularly useful for creating partitions, and provides methods to merge/split symbols in the output mapping.

The two forms of mapping can be combined, for example to encode a multiple sequence alignment in a reduced alphabet defined by a partition of alignment columns. Many of the helper methods provided are for solving such alignment oriented problems.

This code was originally written for the ESPERR project which includes software for searcing for alignment encodings that work well for specific classification problems using various Markov chain classifiers over the reduced encodings.

Most of the core implementation is in the pyrex/C extension “_seqmapping.pyx” for performance reasons (specifically to avoid the excessive bounds checking that would make a sequence/array lookup heavy problem like this slow in pure python).

bx.seqmapping.alignment_mapping_from_file(f, char_mapping=<bx._seqmapping.CharToIntArrayMapping object>): Create a mapping from a file of alignment columns.

bx.seqmapping.identity_mapping(size)

bx.seqmapping.second_mapping_from_file(f, first_mapping, char_mapping=<bx._seqmapping.CharToIntArrayMapping object>)