skbio.metadata.IntervalMetadata#

class skbio.metadata.IntervalMetadata(upper_bound, copy_from=None)[source]#

Stores the interval features.

IntervalMetadata object allows storage, modification, and querying of interval features covering a region of a single coordinate system. For instance, this can be used to store functional annotations about genes across a genome. This object is also applied to the sequence alignment.

This object is typically coupled with another object, such as a Sequence object (or its child class), or a TabularMSA object.

Parameters:
upper_boundint or None

Defines the exclusive upper bound of the interval features. No coordinate can be greater than it. None means that the coordinate space is unbounded.

copy_fromIntervalMetadata or None, optional

Create a new object from the input IntervalMetadata object by shallow copying if it is not None. The upper bound of the new object will be updated with the upper_bound parameter specified.

Notes

This class stores coordinates of all feature bounds into a interval tree. It allows the speed up of query-by-bound. The building of interval tree is deferred until necessary to save computation. It is updated from all coordinates only when you need to fetch info from the interval tree.

When you add a method into this class and if you method need to fetch info from IntervalMetadata._interval_tree, you should decorate it with _rebuild_tree. This decorator will check if the current interval tree is stale and will update it if so. Additionally, if your method add, delete, or changes the coordinates of any interval features, you should set self._is_stale_tree to True at the end of your method to indicate the interval tree becomes stale.

Examples

Let’s say we have a sequence of length 10 and want to add annotation to it. Create an IntervalMetadata object:

>>> from skbio.metadata import Interval, IntervalMetadata
>>> im = IntervalMetadata(10)

Let’s add annotations of 3 genes:

>>> im.add(bounds=[(3, 9)],
...        metadata={'gene': 'sagB'})
Interval(interval_metadata=..., bounds=[(3, 9)], fuzzy=[(False, False)], metadata={'gene': 'sagB'})
>>> im.add(bounds=[(3, 7)],
...        metadata={'gene': 'sagC'})
Interval(interval_metadata=..., bounds=[(3, 7)], fuzzy=[(False, False)], metadata={'gene': 'sagC'})
>>> im.add(bounds=[(1, 2), (4, 7)],
...        metadata={'gene': 'sagA'})
Interval(interval_metadata=..., bounds=[(1, 2), (4, 7)], fuzzy=[(False, False), (False, False)], metadata={'gene': 'sagA'})

Show the object representation:

>>> im
3 interval features
-------------------
Interval(interval_metadata=..., bounds=[(3, 9)], fuzzy=[(False, False)], metadata={'gene': 'sagB'})
Interval(interval_metadata=..., bounds=[(3, 7)], fuzzy=[(False, False)], metadata={'gene': 'sagC'})
Interval(interval_metadata=..., bounds=[(1, 2), (4, 7)], fuzzy=[(False, False), (False, False)], metadata={'gene': 'sagA'})

We can sort the genes by their bounds:

>>> im.sort()
>>> im
3 interval features
-------------------
Interval(interval_metadata=..., bounds=[(1, 2), (4, 7)], fuzzy=[(False, False), (False, False)], metadata={'gene': 'sagA'})
Interval(interval_metadata=..., bounds=[(3, 7)], fuzzy=[(False, False)], metadata={'gene': 'sagC'})
Interval(interval_metadata=..., bounds=[(3, 9)], fuzzy=[(False, False)], metadata={'gene': 'sagB'})

Query the genes by bound and/or metadata:

>>> intvls = im.query([(1, 2)], metadata={'gene': 'foo'})
>>> list(intvls)
[]
>>> intvls = im.query([(7, 9)])
>>> list(intvls)
[Interval(interval_metadata=..., bounds=[(3, 9)], fuzzy=[(False, False)], metadata={'gene': 'sagB'})]
>>> intvls = im.query(metadata={'gene': 'sagA'})
>>> intvls = list(intvls)
>>> intvls
[Interval(interval_metadata=..., bounds=[(1, 2), (4, 7)], fuzzy=[(False, False), (False, False)], metadata={'gene': 'sagA'})]

Drop the gene(s) we get from query:

>>> im.drop(intvls)
>>> im.sort()
>>> im
2 interval features
-------------------
Interval(interval_metadata=..., bounds=[(3, 7)], fuzzy=[(False, False)], metadata={'gene': 'sagC'})
Interval(interval_metadata=..., bounds=[(3, 9)], fuzzy=[(False, False)], metadata={'gene': 'sagB'})

Attributes

default_write_format

Default write format for this object: gff3.

lower_bound

The inclusive lower bound of interval features.

num_interval_features

The total number of interval features.

upper_bound

The exclusive upper bound of interval features.

Methods

add

Create and add an Interval to this IntervalMetadata.

concat

Concatenate an iterable of IntervalMetadata objects.

drop

Drop Interval objects.

merge

Merge the interval features of another IntervalMetadata object.

query

Yield Interval object with the bounds and attributes.

read

Create a new IntervalMetadata instance from a file.

sort

Sort interval features by their coordinates.

write

Write an instance of IntervalMetadata to a file.

Special methods

__copy__

Return a shallow copy.

__deepcopy__

Return a deep copy.

__eq__

Test if this object is equal to another.

__ne__

Test if this object is not equal to another.

__str__

Return a string representation of this object.

Special methods (inherited)

__ge__

Return self>=value.

__getstate__

Helper for pickle.

__gt__

Return self>value.

__le__

Return self<=value.

__lt__

Return self<value.

Details

default_write_format = 'gff3'#

Default write format for this object: gff3.

lower_bound#

The inclusive lower bound of interval features.

num_interval_features#

The total number of interval features.

upper_bound#

The exclusive upper bound of interval features.

__copy__()[source]#

Return a shallow copy.

See also

__deepcopy__

Notes

The IntervalMetadata copy will have copies of the Interval objects present in this object. The metadata dictionary of each Interval object will be a shallow copy.

__deepcopy__(memo)[source]#

Return a deep copy.

See also

__copy__

Notes

The IntervalMetadata copy will have copies of the Interval objects present in this object. The metadata dictionary of each Interval object will be a deep copy.

__eq__(other)[source]#

Test if this object is equal to another.

It checks if the coordinate spaces are the same between the two objects. If so, then check if all the interval features are equal between the two objects after sorting them by bounds.

Parameters:
otherIntervalMetadata

Interval metadata to test for equality against.

Returns:
bool

Indicates if the two objects are equal.

__ne__(other)[source]#

Test if this object is not equal to another.

Parameters:
otherIntervalMetadata

Interval metadata to test for inequality against.

Returns:
bool

Indicates if the two objects are not equal.

__str__()[source]#

Return a string representation of this object.

Required to inherit from SkbioObject.

Returns:
str

String representation of this IntervalMetadata object.