There <i>is</i> a way to read pickles without running them, but it is Python-only and still requires one to know how pickles work. The module `pickletools` can be used to disassemble pickles to bytecode, just like `dis` for normal Python objects. Honestly, though, I wouldn't say that this invalidates the point about unreadability, but just hammers in exactly how unreadable they really are.
To me it's interesting that <i>pickle</i> can be thought of as recording some of the implicit assumptions GvR made about the expected use of Python semantics.<p>Formally serialization/deserialization is very crunchy and precise. (And I remember how stoked I was to find out that Python included an implementation!) In practice, things get messy and we break the implicit assumptions.<p>Is it a flaw of the <i>pickle</i> module? Or are our designs too clever?<p>Patient: "It hurts when I do this."<p>Doctor: "Don't do that."<p>;-)
* Insecure: If you are unpickling insecure code, you have other problems. Deserializers should not be used as a protection against hacking.<p>* Old pickles look like old code: Again, convert your object into json and serialize that to your database. Oh no, you are missing an attribute. Pickle should not be used so you don't have to employ a release engineer.<p>* Implicit: No software works everywhere with defaults. So use copyreg.<p>* Over-serializes: USE copyreg.<p>* __init__ isn’t called: USE COPYREG.<p>* Python only: what's this for, then? <a href="http://www.picklingtools.com/" rel="nofollow">http://www.picklingtools.com/</a><p>* Unreadable: Great feature.<p>* Appears to pickle code: Another great feature.<p>* Slow: check again, it has been 8 years. I can't find any faster method.
I'm skeptical of the point about over-serialization. In my opinion, throwing an exception on an unserializable attribute is a good default. If an object is using a file, it will more often than not be unusable when deserialized without the file.<p>This is one of the few things Java gets right about its built in serialization: if you have an object that can't be serialized, anything using that object has to declare it as transient, meaning it won't be serialized or deserialized. Hopefully you'll think about whether the result makes sense before using the keyword.<p>If you don't mark an unserializable field transient, you'll get an exception at runtime. It's not enforced by the compiler, which would be ideal, but linters will warn you.
Hawking my own (incomplete) contribution to Pickle security/analysis <a href="https://github.com/moreati/pickle-fuzz#rehabilitating-pythons-pickle-module" rel="nofollow">https://github.com/moreati/pickle-fuzz#rehabilitating-python...</a>
This seems to seriously misunderstand the point of pickle. It's not for data interchange. It's for e.g. caching objects or debugging. That's it.<p>The fact it keeps "old code" is a feature. The object is exactly as it was at the time it was saved.
I think these flaws are fairly minor, at least you seem to be nudged towards use cases where you're not overly reliant on pickle for complex work.<p>If readability is an issue there's a JSON version that's quite useful.<p>Other than that, most of the other concerns are addressable. If security matter perhaps use an encryption lib around the pickle, rather than ask for it to be built into it? As for speed, you're already using python and chances are you're not constantly pickling and unpickling?
I am surprised when people use pickle NOT as a last resort.<p>For numeric data, H5 is nice. For configs, JSON is pretty much a standard. For Python code... well, nothing beats Python code.
Pickle's greatest flaw is the complete lack of forward and backward compatibility. The compatibility is not guaranteed between when upgrading any of the dependencies. Dependencies should stay the same over releases, halting forward progress in the development process.
Data serialization is hard and the artifacts are much longer lived than executable code and even our API interfaces. YAML, JSON, XML are all flawed. There are many competing binary serialization frameworks. Beware. Dar be Dragons in Durable Data.
I find it interesting that everyone so far has suggested JSON as a pickle alternative. Depending on why you are serializing and deserializing the data, a lot of times the true replacement for pickle is a full-fledged database.