Format

The binpickle.format module contains the data structures that define the BinPickle format.

Users will not need these classes. They are documented here in the interest of documenting the file format.

File Structure

BinPickle uses Pickle 5’s out-of-band buffer serialization support, and thus stores the pickled object in two parts:

  1. The contents of the out-of-band buffers.

  2. The Protocol 5 pickled bytes.

The bytes are stored as another buffer, so pickling an object with n buffers stores n+1 buffers in the file, the last one of which contains the pickle bytes.

The BinPickle format is inspired by Zip, with an index at the end of the file that tells the reader where in the file to find the various contents.

A Version 1 BinPickle file is organized as follows:

  1. 16-byte header, beginning with magic bytes BPCK (see FileHeader).

  2. The out-of-band buffers, in order. Padding may appear before or after any buffer’s contents.

  3. The pickle bytes, as a buffer.

  4. The file index, stored as a list of IndexEntry objects encoded in MsgPack.

  5. 16-byte trailer (see FileTrailer).

The position and length of each buffer is stored in the index, so buffers can have arbitrary padding between them. They could even technically be out-of-order, but such a file should not be generated. Uncompressed BinPickle files intended for memory-mapped use align each buffer to the operating system page size (from mmap.PAGESIZE).

Classes

class binpickle.format.FileHeader

File header for a BinPickle file. The header is a 16-byte sequence containing the magic (BPCK) followed by version and offset information:

  1. File version (2 bytes, big-endian). Currently only version 1 exists.

  2. Reserved (2 bytes). Set to 0.

  3. File length (8 bytes, big-endian). Length is signed; if the file length is not known, this field is set to -1.

property version

The NumPy file version.

property length

The length of the file (-1 for unknown).

encode()

Encode the file header as bytes.

classmethod decode(buf, *, verify=True)

Decode a file header from bytes.

trailer_pos()

Get the position of the start of the file trailer.

class binpickle.format.FileTrailer

File trailer for a BinPickle file. The trailer is a 16-byte sequence that tells the reader where to find the rest of the binpickle data. It consists of the following fields:

  1. Index start (8 bytes, big-endian). Measured in bytes from the start of the file.

  2. Index length (4 bytes, big-endian). The number of bytes in the index.

  3. Index checksum (4 bytes, big-endian). The Adler32 checksum of the index data.

property offset

Alias for field number 0

property length

Alias for field number 1

property checksum

Alias for field number 2

encode()

Encode the file trailer as bytes.

classmethod decode(buf, *, verify=True)

Decode a file trailer from bytes.

class binpickle.format.IndexEntry

Index entry for a buffer in the BinPickle index.

property offset

The position in the file where the buffer begins (bytes).

property enc_length

The encoded length of the buffer data in bytes.

property dec_length

The decoded length of the buffer in bytes.

property checksum

The Adler-32 checksum of the encoded buffer data.

property codec

The codec used to encode the buffer, or None.

to_repr()

Convert an index entry to its MsgPack-compatible representation

classmethod from_repr(repr)

Convert an index entry from its MsgPack-compatible representation