Format#

The binpickle.format module contains the data structures that define the BinPickle format.

Users will not need these classes. They are documented here in the interest of documenting the file format. The current format version is 2, first used in binpickle 0.4.0; this is not compatible with prior versions.

File Structure#

BinPickle uses Pickle 5’s out-of-band buffer serialization support, and thus stores the pickled object in two parts:

  1. The contents of the out-of-band buffers.

  2. The Protocol 5 pickled bytes.

The bytes are stored as another buffer, so pickling an object with n buffers stores n+1 buffers in the file, the last one of which contains the pickle bytes.

The BinPickle format is inspired by Zip, with an index at the end of the file that tells the reader where in the file to find the various contents.

A Version 1 BinPickle file is organized as follows:

  1. 16-byte header, beginning with magic bytes BPCK (see FileHeader).

  2. The out-of-band buffers, in order. Padding may appear before or after any buffer’s contents.

  3. The pickle bytes, as a buffer.

  4. The file index, stored as a list of IndexEntry objects encoded in MsgPack.

  5. 44-byte trailer (see FileTrailer).

The position and length of each buffer is stored in the index, so buffers can have arbitrary padding between them. They could even technically be out-of-order, but such a file should not be generated. Uncompressed BinPickle files intended for memory-mapped use align each buffer to the operating system page size (from mmap.PAGESIZE).

Classes#

class binpickle.format.FileHeader(version=2, flags=Flags.None, length=-1)#

File header for a BinPickle file. The header is a 16-byte sequence containing the magic (BPCK) followed by version and offset information:

  1. File version (2 bytes, big-endian).

  2. Flags (2 bytes), as defined in Flags.

  3. File length (8 bytes, big-endian). Length is signed; if the file length is not known, this field is set to -1.

Parameters:
version: int = 2#

The NumPy file version.

length: int = -1#

The length of the file (-1 for unknown).

encode()#

Encode the file header as bytes.

classmethod decode(buf, *, verify=True)#

Decode a file header from bytes.

Parameters:
  • buf (bytes | bytearray | memoryview) – Buffer contianing the file header to decode.

  • verify (bool) – Whether to fail on invalid header data (such as mismatched magic or unsupported version).

Return type:

FileHeader

trailer_pos()#

Get the position of the start of the file trailer.

class binpickle.format.Flags(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)#

Flags that can be set in the BinPickle header.

BIG_ENDIAN = 1#

This file was created on a big-endian system; if absent, the data is in little-endian.

Note that this affects only the serialized buffer data; it does not affect the lengths and offsets in the file format, which are always stored in network byte order (big-endian) or encoded with MsgPack.

MAPPABLE = 2#

This file is designed to be memory-mapped.

class binpickle.format.FileTrailer(offset, length, hash, reserved=<factory>)#

File trailer for a BinPickle file. The trailer is a 44-byte sequence that tells the reader where to find the rest of the binpickle data. It consists of the following fields:

  1. Index start (8 bytes, big-endian). Measured in bytes from the start of the file.

  2. Index length (4 bytes, big-endian). The number of bytes in the index.

  3. Index digest (32 bytes). The SHA256 digest of the index data.

  4. Reserved digest (32 bytes). Currently set to all 0s; this is to leave space for future support of MAC authentication of binpickle files.

Parameters:
offset: int#

Position of the start of the file index.

length: int#

Length of the file index.

hash: bytes#

SHA-256 digest of the file index.

reserved: bytes#

Rserved for future MAC of the file contents.

encode()#

Encode the file trailer as bytes.

classmethod decode(buf, *, verify=True)#

Decode a file trailer from bytes.

Parameters:
  • buf (bytes | bytearray | memoryview) – Buffer containing the trailer to decode.

  • verify (bool) – Whether to verify invalid trailer data.

Return type:

FileTrailer

class binpickle.format.IndexEntry(offset, enc_length, dec_length, hash, info, codecs=<factory>)#

Index entry for a buffer in the BinPickle index.

Parameters:
offset: int#

The position in the file where the buffer begins (bytes).

enc_length: int#

The encoded length of the buffer data in bytes.

dec_length: int#

The decoded length of the buffer in bytes.

hash: bytes#

The SHA-256 checksum of the encoded buffer data.

info: tuple[str, str, tuple[int, ...]] | None#

Type information for the buffer (if available).

codecs: list[dict[str, str | bool | int | float | None]]#

The sequence of codecs used to encode the buffer.

to_repr()#

Convert an index entry to its MsgPack-compatible representation

classmethod from_repr(repr)#

Convert an index entry from its MsgPack-compatible representation

Parameters:

repr (dict[str, Any]) –

Format History#

The current file format version is 2, introduced in BinPickle 0.4.0.

Version 2#

Version 2 introduced the following:

  • Replaced Adler32 checksums with SHA-256 digests.

  • Replaced the single codec field with a codecs list field. The new field directly specifies a list of numcodecs codec configurations in the order they were applied to encode the buffer. The old native codecs have been removed, all codecs come from numcodecs.

  • Added the info field to IndexEntry to store information about the buffer’s data, when available (currently stores NumPy data type and shape when serializing a NumPy array).

Version 1#

Version 1 is the original BinPickle format, used through the 0.3 release series. It is no longer supported.