DEV Community

Cover image for Python Pickling at Lightspeed ⚡
Ali Sherief
Ali Sherief

Posted on

Python Pickling at Lightspeed ⚡

The lightspeed returns. ⚡

Today I'm going to cover pickling in Python. Very briefly, pickling is the act of serializing python objects, which will either be sent to another python program or will be saved to disk and read by the same Python program in case it is stopped and restarted.

There are 5 different pickling formats in Python, each newer than the other, and they are versions 1, 2, 3, 4 and 5. This post will only cover the latest format, version 5, which was added in Python 3.4. In particular, Python 2 does not support this format.

There is also a simpler serialization module called marshal, but that should not be used because it is not portable across python versions since it's mainly used for .pyc files.

Heads Up: the pickle module is not secure if it's used by itself. Because the pickled object is basically python opcodes it is possible to make an opcode sequence, a malicious pickled object, that crashes the interpreter or exploits a security vulnerability. Always sign your picked data with the hmac python module (basically the difference between HTTP and HTTPS to give you an analogy).

hmac and SSL in general will be saved for a future post and in this one I will only cover pickle.

Why not just use JSON?

Hey, JSON is a great data format to use... for data. It can't help you if you're trying to send over a function or class because that's not what it's designed to do. The pickle module is designed to handle almost every single python object in the language. So while you can serialize lists, dictionaries (also called maps and hash tables), strings and numbers in JSON, and be able to read the file that is made since JSON files are human-readable, that's all it can do.

Dump

To serialize an object we call pickle.dump(obj, filehandle, protocol=None). This function has some other arguments you don't need to know. This dumps the object into an file handle (the file must already be open) - think of it as a time capsule. the protocol=None argument means it will choose a protocol to use by itself (usually the latest protocol), but you can set the protocol number to use in this argument.

When using dump() make sure you pass an opened file handle. Don't give it the file name or it won't work.

You want the actual bytes of the serialized object instead of dumping it into a file? You should use pickle.dumps(obj) instead. It returns the actual serialized object.

Load

To load an object from a file, we call pickle.load(filehandle) function, which returns the actual python object which was serialized. Please note that you need some way to determine the type of the object pickled, so you could use something like obj=pickle.load(filehandle); type(obj) to get the type of the object.

Similarly, one could also load an object from the serialized bytes instead of a file. This is accomplished by calling pickle.loads(bytes_object).

Catching Errors

If for some reason your pickled data gets corrupted (or as I like to call it, spoiled), then load() and loads() will raise an UnpicklingError. This is an exception you can catch for unpickling failures. Note that picked objects don't actually expire. They get messed up if you only copy part of the serialized data among other things. From the top of my head I recount that incomplete downloads of the picked data can do that too.

Also, dump() and dumps() will throw a PicklingError if the object can't be pickled.

Copied from the Python documentation (Don't worry if you don't understand some of these types, as long as the type you use can be pickled):

What can be pickled and unpickled?

The following types can be pickled:

  • None, True, and False

  • integers, floating point numbers, complex numbers

  • strings, bytes, bytearrays

  • tuples, lists, sets, and dictionaries containing only picklable objects

  • functions defined at the top level of a module (using def, not lambda)

  • built-in functions defined at the top level of a module

  • classes that are defined at the top level of a module

  • instances of such classes whose dict or the result of calling getstate() is picklable (see section Pickling Class Instances for details).

Do note that when a PicklingError is thrown the object may have been partially written to the file! This can happen to dump() (but not dumps() since no files are involved). So, I advise you always pickle your object with dumps(), take the resulting serialized data, get a file descriptor with open(file, 'wb') and write the serialized data into the file descriptor that you got.

Want to catch both of these exceptions at the same time? Use pickle.PickleError instead.

Serializing multiple objects in a file

Python has a ready-made class for you for putting an object into a file called pickle.Pickler(filehandle, protocol=None), one at a time. Also there is a class for reading one object at a time from a file which is called pickle.Unpickler(filehandle, protocol=None). These classes return a Pickler and Unpickler object respectively.

An example might help clear things up:

>>> import pickle
>>> writefile = open('somefile', 'wb')
>>> p = pickle.Pickler(writefile)
>>> p.dump([1, 2, 3])
>>> p.dump('string')
>>> p.dump(None)
>>> writefile.close()
>>> readfile = open('somefile', 'rb')
>>> u = pickle.Unpickler(readfile)
>>> u.load()
[1, 2, 3]
>>> u.load()
'string'
>>> u.load() # This loads None but the result is not shown.
>>> readfile.seek(0, 0) # Rewinds to beginning of file
0
>>> u.load()
[1, 2, 3]
>>> u.load()
'string'
>>> u.load() # This loads None again.
>>> u.load()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
EOFError: Ran out of input
Enter fullscreen mode Exit fullscreen mode

These classes also have various advanced parameters which won't be discussed here. You probably don't need to know them either, as they are mainly useful for the Python maintainers.

TL;DR

Was this pickle post confusing? Here is how you get started with it at lightspeed:

>>> import pickle
>>> f = open('somefile', 'wb')
>>> pickle.dump({'some': 'dictionary'}, f)
>>> f.close()
>>> f = open('somefile', 'rb')
>>> pickle.load(f)
{'some': 'dictionary'}
Enter fullscreen mode Exit fullscreen mode

If you're going to send the object across a network connection, try this instead:

>>> import pickle
>>> obj = pickle.dumps({'some': 'dictionary'})
>>> # Send obj somewhere
>>> # ...
>>> # Some other python instance which received obj
>>> import pickle
>>> pickle.loads(obj)
{'some': 'dictionary'}

Enter fullscreen mode Exit fullscreen mode

And we're done

If you see any glaring mistakes in this post, be sure to notify me so I can fix them.

Top comments (0)