I/O Registry (skbio.io.registry
)#
Classes#
Create a registry of formats and implementations which map to classes. |
|
|
Defines a format on which readers/writers/sniffer can be registered. |
Functions#
|
Create new file formats. |
Exceptions#
|
Raised when a function is already registered in skbio.io. |
|
Raised if function doesn't meet the expected API of its registration. |
Creating a new format for scikit-bio#
scikit-bio makes it simple to add new file formats to its I/O registry.
scikit-bio maintains a singleton of the IORegistry
class called
io_registry. This is where all scikit-bio file formats are registered. One
could also instantiate their own IORegistry
, but that is not the focus
of this tutorial.
The first step to creating a new format is to add a submodule in skbio/io/format/ named after the file format you are implementing. For example, if the format you are implementing is called myformat then you would create a file called skbio/io/format/myformat.py.
The next step is to import the create_format()
factory from
skbio.io
. This will allow you to create a new Format
object
that io_registry will know about.
Ideally you should name the result of create_format()
as your file name.
For example:
from skbio.io import create_format
myformat = create_format('myformat')
The myformat object is what we will use to register our new functionality.
At this point you should evaulate whether your format is binary or text.
If your format is binary, your create_format()
call should look like
this:
myformat = create_format('myformat', encoding='binary')
Alternatively if your format is text and has a specific encoding or newline handling you can also specify that:
myformat = create_format('myformat', encoding='ascii', newline='\n')
This will ensure that our registry will open files with a default encoding of ‘ascii’ for ‘myformat’ and expect all newlines to be ‘n’ characters.
Having worked out these details, we are ready to register the actual functionality of our format (e.g., sniffer, readers, and writers).
To create a sniffer simply decorate the following onto your sniffer function:
@myformat.sniffer()
def _myformat_sniffer(fh):
# do something with `fh` to determine the membership of the file
For futher details on sniffer functions see Format.sniffer()
.
Creating a reader is very similar, but has one difference:
@myformat.reader(SomeSkbioClass)
def _myformat_to_some_skbio_class(fh, kwarg1='default', extra=FileSentinel):
# parse `fh` and return a SomeSkbioClass instance here
# `extra` will also be an open filehandle if provided else None
Here we bound a function to a specific class. We also demonstrated using
our FileSentinel object to indicate to the registry that this reader can take
auxilary files that should be handled in the same way as the primary file.
For futher details on reader functions see Format.reader()
.
Creating a writer is about the same:
@myformat.writer(SomeSkbioClass)
def _some_skbio_class_to_myformat(obj, fh, kwarg1='whatever',
extra=FileSentinel):
# write the contents of `obj` into `fh` and whatever else into `extra`
# do not return anything, it will be ignored
This is exactly the same as the reader above just in reverse, we also
receive the object we are writing as the first parameter instead of the file
(which is the second one). For further details on writer functions see
Format.writer()
.
Note
When raising errors in readers and writers, the error should be a
subclass of FileFormatError
specific to your new format.
Once you are satisfied with the functionality, you will need to ensure that
skbio/io/__init__.py contains an import of your new submodule so the
decorators are executed. Add the function
import_module('skbio.io.format.myformat')
with your module name to the
existing list.
Note
Because scikit-bio handles all of the I/O boilerplate, you only need to unit-test the actual business logic of your readers, writers, and sniffers.
Reserved Keyword Arguments#
The following keyword args may not be used when defining new readers or writers as they already have special meaning to the registry system:
format
into
verify
mode
encoding
errors
newline
compression
compresslevel
The following are not yet used but should be avoided as well:
auth
user
password
buffering
buffer_size
closefd
exclusive
append