bx.seq.seq module

Classes to support “biological sequence” files.

Author:: Bob Harris (rsharris@bx.psu.edu)

class bx.seq.seq.SeqFile(file=None, revcomp=False, name='', gap=None)

Bases: object

A biological sequence is a sequence of bytes or characters. Usually these represent DNA (A,C,G,T), proteins, or some variation of those.

class attributes:

file: file object containing the sequence revcomp: whether gets from this sequence should be reverse-complemented

False => no reverse complement True => (same as “-5’”) “maf” => (same as “-5’”) “+5’” => minus strand is from plus strand’s 5’ end (same as “-3’”) “+3’” => minus strand is from plus strand’s 3’ end (same as “-5’”) “-5’” => minus strand is from its 5’ end (as per MAF file format) “-3’” => minus strand is from its 3’ end (as per genome browser,

but with origin-zero)

name: usually a species and/or chromosome name (e.g. “mule.chr5”); if
the file contains a name, that overrides this one

gap: gap character that aligners should use for gaps in this sequence

close()

extract_name(line)

get(start, length): Fetch subsequence starting at position start with length length. This method is picky about parameters, the requested interval must have non-negative length and fit entirely inside the NIB sequence, the returned string will contain exactly ‘length’ characters, or an AssertionError will be generated.

raw_fetch(start, length)

reverse_complement(text)

set_text(text)

class bx.seq.seq.SeqReader(file, revcomp=False, name='', gap=None)

Bases: object

Iterate over all sequences in a file in order

close()

class bx.seq.seq.SeqReaderIter(reader): Bases: object