Format#
The binpickle.format
module contains the data structures that define the
BinPickle format.
Users will not need these classes. They are documented here in the interest of documenting the file format. The current format version is 2, first used in binpickle 0.4.0; this is not compatible with prior versions.
File Structure#
BinPickle uses Pickle 5’s out-of-band buffer serialization support, and thus stores the pickled object in two parts:
The contents of the out-of-band buffers.
The Protocol 5 pickled bytes.
The bytes are stored as another buffer, so pickling an object with n buffers stores n+1 buffers in the file, the last one of which contains the pickle bytes.
The BinPickle format is inspired by Zip, with an index at the end of the file that tells the reader where in the file to find the various contents.
A Version 1 BinPickle file is organized as follows:
16-byte header, beginning with magic bytes
BPCK
(seeFileHeader
).The out-of-band buffers, in order. Padding may appear before or after any buffer’s contents.
The pickle bytes, as a buffer.
The file index, stored as a list of
IndexEntry
objects encoded in MsgPack.44-byte trailer (see
FileTrailer
).
The position and length of each buffer is stored in the index, so buffers can have arbitrary
padding between them. They could even technically be out-of-order, but such a file should
not be generated. Uncompressed BinPickle files intended for memory-mapped use align each
buffer to the operating system page size (from mmap.PAGESIZE
).
Classes#
- class binpickle.format.FileHeader(version=2, flags=Flags.None, length=-1)#
File header for a BinPickle file. The header is a 16-byte sequence containing the magic (
BPCK
) followed by version and offset information:File version (2 bytes, big-endian).
Flags (2 bytes), as defined in
Flags
.File length (8 bytes, big-endian). Length is signed; if the file length is not known, this field is set to -1.
- encode()#
Encode the file header as bytes.
- classmethod decode(buf, *, verify=True)#
Decode a file header from bytes.
- Parameters:
buf (bytes | bytearray | memoryview) – Buffer contianing the file header to decode.
verify (bool) – Whether to fail on invalid header data (such as mismatched magic or unsupported version).
- Return type:
- trailer_pos()#
Get the position of the start of the file trailer.
- class binpickle.format.Flags(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)#
Flags that can be set in the BinPickle header.
- BIG_ENDIAN = 1#
This file was created on a big-endian system; if absent, the data is in little-endian.
Note that this affects only the serialized buffer data; it does not affect the lengths and offsets in the file format, which are always stored in network byte order (big-endian) or encoded with MsgPack.
- MAPPABLE = 2#
This file is designed to be memory-mapped.
- class binpickle.format.FileTrailer(offset, length, hash, reserved=<factory>)#
File trailer for a BinPickle file. The trailer is a 44-byte sequence that tells the reader where to find the rest of the binpickle data. It consists of the following fields:
Index start (8 bytes, big-endian). Measured in bytes from the start of the file.
Index length (4 bytes, big-endian). The number of bytes in the index.
Index digest (32 bytes). The SHA256 digest of the index data.
Reserved digest (32 bytes). Currently set to all 0s; this is to leave space for future support of MAC authentication of binpickle files.
- encode()#
Encode the file trailer as bytes.
- classmethod decode(buf, *, verify=True)#
Decode a file trailer from bytes.
- Parameters:
buf (bytes | bytearray | memoryview) – Buffer containing the trailer to decode.
verify (bool) – Whether to verify invalid trailer data.
- Return type:
- class binpickle.format.IndexEntry(offset, enc_length, dec_length, hash, info, codecs=<factory>)#
Index entry for a buffer in the BinPickle index.
- Parameters:
- codecs: list[dict[str, str | bool | int | float | None]]#
The sequence of codecs used to encode the buffer.
- to_repr()#
Convert an index entry to its MsgPack-compatible representation
Format History#
The current file format version is 2, introduced in BinPickle 0.4.0.
Version 2#
Version 2 introduced the following:
Replaced Adler32 checksums with SHA-256 digests.
Replaced the single
codec
field with acodecs
list field. The new field directly specifies a list ofnumcodecs
codec configurations in the order they were applied to encode the buffer. The old native codecs have been removed, all codecs come from numcodecs.Added the
info
field toIndexEntry
to store information about the buffer’s data, when available (currently stores NumPy data type and shape when serializing a NumPy array).
Version 1#
Version 1 is the original BinPickle format, used through the 0.3 release series. It is no longer supported.