skbio.metadata.IntervalMetadata#
- class skbio.metadata.IntervalMetadata(upper_bound, copy_from=None)[source]#
Stores the interval features.
IntervalMetadata
object allows storage, modification, and querying of interval features covering a region of a single coordinate system. For instance, this can be used to store functional annotations about genes across a genome. This object is also applied to the sequence alignment.This object is typically coupled with another object, such as a
Sequence
object (or its child class), or aTabularMSA
object.- Parameters:
- upper_boundint or None
Defines the exclusive upper bound of the interval features. No coordinate can be greater than it.
None
means that the coordinate space is unbounded.- copy_fromIntervalMetadata or None, optional
Create a new object from the input
IntervalMetadata
object by shallow copying if it is notNone
. The upper bound of the new object will be updated with theupper_bound
parameter specified.
See also
Notes
This class stores coordinates of all feature bounds into a interval tree. It allows the speed up of query-by-bound. The building of interval tree is deferred until necessary to save computation. It is updated from all coordinates only when you need to fetch info from the interval tree.
When you add a method into this class and if you method need to fetch info from
IntervalMetadata._interval_tree
, you should decorate it with_rebuild_tree
. This decorator will check if the current interval tree is stale and will update it if so. Additionally, if your method add, delete, or changes the coordinates of any interval features, you should setself._is_stale_tree
toTrue
at the end of your method to indicate the interval tree becomes stale.Examples
Let’s say we have a sequence of length 10 and want to add annotation to it. Create an
IntervalMetadata
object:>>> from skbio.metadata import Interval, IntervalMetadata >>> im = IntervalMetadata(10)
Let’s add annotations of 3 genes:
>>> im.add(bounds=[(3, 9)], ... metadata={'gene': 'sagB'}) Interval(interval_metadata=..., bounds=[(3, 9)], fuzzy=[(False, False)], metadata={'gene': 'sagB'}) >>> im.add(bounds=[(3, 7)], ... metadata={'gene': 'sagC'}) Interval(interval_metadata=..., bounds=[(3, 7)], fuzzy=[(False, False)], metadata={'gene': 'sagC'}) >>> im.add(bounds=[(1, 2), (4, 7)], ... metadata={'gene': 'sagA'}) Interval(interval_metadata=..., bounds=[(1, 2), (4, 7)], fuzzy=[(False, False), (False, False)], metadata={'gene': 'sagA'})
Show the object representation:
>>> im 3 interval features ------------------- Interval(interval_metadata=..., bounds=[(3, 9)], fuzzy=[(False, False)], metadata={'gene': 'sagB'}) Interval(interval_metadata=..., bounds=[(3, 7)], fuzzy=[(False, False)], metadata={'gene': 'sagC'}) Interval(interval_metadata=..., bounds=[(1, 2), (4, 7)], fuzzy=[(False, False), (False, False)], metadata={'gene': 'sagA'})
We can sort the genes by their bounds:
>>> im.sort() >>> im 3 interval features ------------------- Interval(interval_metadata=..., bounds=[(1, 2), (4, 7)], fuzzy=[(False, False), (False, False)], metadata={'gene': 'sagA'}) Interval(interval_metadata=..., bounds=[(3, 7)], fuzzy=[(False, False)], metadata={'gene': 'sagC'}) Interval(interval_metadata=..., bounds=[(3, 9)], fuzzy=[(False, False)], metadata={'gene': 'sagB'})
Query the genes by bound and/or metadata:
>>> intvls = im.query([(1, 2)], metadata={'gene': 'foo'}) >>> list(intvls) [] >>> intvls = im.query([(7, 9)]) >>> list(intvls) [Interval(interval_metadata=..., bounds=[(3, 9)], fuzzy=[(False, False)], metadata={'gene': 'sagB'})] >>> intvls = im.query(metadata={'gene': 'sagA'}) >>> intvls = list(intvls) >>> intvls [Interval(interval_metadata=..., bounds=[(1, 2), (4, 7)], fuzzy=[(False, False), (False, False)], metadata={'gene': 'sagA'})]
Drop the gene(s) we get from query:
>>> im.drop(intvls) >>> im.sort() >>> im 2 interval features ------------------- Interval(interval_metadata=..., bounds=[(3, 7)], fuzzy=[(False, False)], metadata={'gene': 'sagC'}) Interval(interval_metadata=..., bounds=[(3, 9)], fuzzy=[(False, False)], metadata={'gene': 'sagB'})
Attributes
default_write_format
lower_bound
The inclusive lower bound of interval features.
num_interval_features
The total number of interval features.
upper_bound
The exclusive upper bound of interval features.
Built-ins
__copy__
()Return a shallow copy.
__deepcopy__
(memo)Return a deep copy.
__eq__
(other)Test if this object is equal to another.
__ge__
(value, /)Return self>=value.
__getstate__
(/)Helper for pickle.
__gt__
(value, /)Return self>value.
__le__
(value, /)Return self<=value.
__lt__
(value, /)Return self<value.
__ne__
(other)Test if this object is not equal to another.
__str__
(/)Return str(self).
Methods
add
(bounds[, fuzzy, metadata])Create and add an
Interval
to thisIntervalMetadata
.concat
(interval_metadata)Concatenate an iterable of
IntervalMetadata
objects.drop
(intervals[, negate])Drop Interval objects.
merge
(other)Merge the interval features of another
IntervalMetadata
object.query
([bounds, metadata])Yield
Interval
object with the bounds and attributes.read
(file[, format])Create a new
IntervalMetadata
instance from a file.sort
([ascending])Sort interval features by their coordinates.
write
(file[, format])Write an instance of
IntervalMetadata
to a file.