bx.intervals.intersection module

Data structure for performing intersect queries on a set of intervals which preserves all information about the intervals (unlike bitset projection methods).

Authors:

James Taylor (james@jamestaylor.org), Ian Schenk (ian.schenck@gmail.com), Brent Pedersen (bpederse@gmail.com)

bx.intervals.intersection.Intersecter

alias of IntervalTree

class bx.intervals.intersection.Interval

Bases: object

Basic feature, with required integer start and end properties. Also accepts optional strand as +1 or -1 (used for up/downstream queries), a name, and any arbitrary data is sent in on the info keyword argument

>>> from bx.intervals.intersection import Interval
>>> from collections import OrderedDict
>>> f1 = Interval(23, 36)
>>> f2 = Interval(34, 48, value=OrderedDict([('chr', 12), ('anno', 'transposon')]))
chrom
end
start
strand
value
class bx.intervals.intersection.IntervalNode

Bases: object

A single node of an IntervalTree.

NOTE: Unless you really know what you are doing, you probably should us

IntervalTree rather than using this directly.

end
find(start, end, sort=True)

given a start and a end, return a list of features falling within that range

insert(start, end, interval)

Insert a new IntervalNode into the tree of which this node is currently the root. The return value is the new root of the tree (which may or may not be this node!)

intersect(start, end, sort=True)

given a start and a end, return a list of features falling within that range

interval
left(position, n=1, max_dist=2500)

find n features with a start > than position f: a Interval object (or anything with an end attribute) n: the number of features to return max_dist: the maximum distance to look before giving up.

left_node
right(position, n=1, max_dist=2500)

find n features with a end < than position f: a Interval object (or anything with a start attribute) n: the number of features to return max_dist: the maximum distance to look before giving up.

right_node
root_node
start
traverse(func)
class bx.intervals.intersection.IntervalTree

Bases: object

Data structure for performing window intersect queries on a set of of possibly overlapping 1d intervals.

Usage

Create an empty IntervalTree

>>> from bx.intervals.intersection import Interval, IntervalTree
>>> intersecter = IntervalTree()

An interval is a start and end position and a value (possibly None). You can add any object as an interval:

>>> intersecter.insert( 0, 10, "food" )
>>> intersecter.insert( 3, 7, dict(foo='bar') )
>>> intersecter.find( 2, 5 )
['food', {'foo': 'bar'}]

If the object has start and end attributes (like the Interval class) there is are some shortcuts:

>>> intersecter = IntervalTree()
>>> intersecter.insert_interval( Interval( 0, 10 ) )
>>> intersecter.insert_interval( Interval( 3, 7 ) )
>>> intersecter.insert_interval( Interval( 3, 40 ) )
>>> intersecter.insert_interval( Interval( 13, 50 ) )
>>> intersecter.find( 30, 50 )
[Interval(3, 40), Interval(13, 50)]
>>> intersecter.find( 100, 200 )
[]

Before/after for intervals

>>> intersecter.before_interval( Interval( 10, 20 ) )
[Interval(3, 7)]
>>> intersecter.before_interval( Interval( 5, 20 ) )
[]

Upstream/downstream

>>> intersecter.upstream_of_interval(Interval(11, 12))
[Interval(0, 10)]
>>> intersecter.upstream_of_interval(Interval(11, 12, strand="-"))
[Interval(13, 50)]
>>> intersecter.upstream_of_interval(Interval(1, 2, strand="-"), num_intervals=3)
[Interval(3, 7), Interval(3, 40), Interval(13, 50)]
add(start, end, value=None)

Insert the interval [start,end) associated with value value.

add_interval(interval)

Insert an “interval” like object (one with at least start and end attributes)

after(position, num_intervals=1, max_dist=2500)

Find num_intervals intervals that lie after position and are no further than max_dist positions away

after_interval(interval, num_intervals=1, max_dist=2500)

Find num_intervals intervals that lie completely after interval and are no further than max_dist positions away

before(position, num_intervals=1, max_dist=2500)

Find num_intervals intervals that lie before position and are no further than max_dist positions away

before_interval(interval, num_intervals=1, max_dist=2500)

Find num_intervals intervals that lie completely before interval and are no further than max_dist positions away

downstream_of_interval(interval, num_intervals=1, max_dist=2500)

Find num_intervals intervals that lie completely downstream of interval and are no further than max_dist positions away

find(start, end)

Return a sorted list of all intervals overlapping [start,end).

insert(start, end, value=None)

Insert the interval [start,end) associated with value value.

insert_interval(interval)

Insert an “interval” like object (one with at least start and end attributes)

traverse(fn)

call fn for each element in the tree

upstream_of_interval(interval, num_intervals=1, max_dist=2500)

Find num_intervals intervals that lie completely upstream of interval and are no further than max_dist positions away