BinPickle#

The BinPickle library provides an optimized file format for serializing Python objects in a scientific computing setting. It uses Pickle Protocol 5 (with the pickle5 library on older versions of Python) to efficiently serialize objects with large binary data blobs such as NumPy arrays; one of the primary use cases for BinPickle is efficiently serializing scikit-style statistical and machine learning models.

BinPickle supports a few useful features on top of standard pickling:

  • Optional per-buffer compression

  • Memory-mapped buffers (when uncompressed) for efficiently sharing

BinPickle wraps Python’s pickling functionality, so any object that can be pickled (including SciKit models) can be stored with BinPickle. If the object supports Pickle Protocol 5 (or stores most of its data in NumPy arrays, which in recent versions support Pickle 5), then large array data will be efficiently stored, either compressed (using any compressor supported by numcodecs) or page-aligned and ready for memory-mapping, possibly into multiple processes simultaneously.

Quick Start#

Save an object:

from binpickle import dump, load
dump(my_large_object, 'file.bpk')

Load an object:

model = load('file.bpk')

Contents#

Inspiriation#

BinPickle is inspired in part by joblib’s dump and load routines that support memory-mapping numpy buffers. By building on top of Pickle Protocol 5, we are able to obtain the same functionality without hacking the pickle serialization protocol.

Acknowledgements#

This material is based upon work supported by the National Science Foundation under Grant No. IIS 17-51278. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. This page has not been approved by Boise State University and does not reflect official university positions.