TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Pickle’s Nine Flaws

64 pointsby giladalmost 5 years ago

13 comments

Kednicmaalmost 5 years ago
There <i>is</i> a way to read pickles without running them, but it is Python-only and still requires one to know how pickles work. The module `pickletools` can be used to disassemble pickles to bytecode, just like `dis` for normal Python objects. Honestly, though, I wouldn&#x27;t say that this invalidates the point about unreadability, but just hammers in exactly how unreadable they really are.
carapacealmost 5 years ago
To me it&#x27;s interesting that <i>pickle</i> can be thought of as recording some of the implicit assumptions GvR made about the expected use of Python semantics.<p>Formally serialization&#x2F;deserialization is very crunchy and precise. (And I remember how stoked I was to find out that Python included an implementation!) In practice, things get messy and we break the implicit assumptions.<p>Is it a flaw of the <i>pickle</i> module? Or are our designs too clever?<p>Patient: &quot;It hurts when I do this.&quot;<p>Doctor: &quot;Don&#x27;t do that.&quot;<p>;-)
nurettinalmost 5 years ago
* Insecure: If you are unpickling insecure code, you have other problems. Deserializers should not be used as a protection against hacking.<p>* Old pickles look like old code: Again, convert your object into json and serialize that to your database. Oh no, you are missing an attribute. Pickle should not be used so you don&#x27;t have to employ a release engineer.<p>* Implicit: No software works everywhere with defaults. So use copyreg.<p>* Over-serializes: USE copyreg.<p>* __init__ isn’t called: USE COPYREG.<p>* Python only: what&#x27;s this for, then? <a href="http:&#x2F;&#x2F;www.picklingtools.com&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.picklingtools.com&#x2F;</a><p>* Unreadable: Great feature.<p>* Appears to pickle code: Another great feature.<p>* Slow: check again, it has been 8 years. I can&#x27;t find any faster method.
评论 #23857608 未加载
评论 #23860274 未加载
评论 #23857594 未加载
hyperpapealmost 5 years ago
I&#x27;m skeptical of the point about over-serialization. In my opinion, throwing an exception on an unserializable attribute is a good default. If an object is using a file, it will more often than not be unusable when deserialized without the file.<p>This is one of the few things Java gets right about its built in serialization: if you have an object that can&#x27;t be serialized, anything using that object has to declare it as transient, meaning it won&#x27;t be serialized or deserialized. Hopefully you&#x27;ll think about whether the result makes sense before using the keyword.<p>If you don&#x27;t mark an unserializable field transient, you&#x27;ll get an exception at runtime. It&#x27;s not enforced by the compiler, which would be ideal, but linters will warn you.
moreatialmost 5 years ago
Hawking my own (incomplete) contribution to Pickle security&#x2F;analysis <a href="https:&#x2F;&#x2F;github.com&#x2F;moreati&#x2F;pickle-fuzz#rehabilitating-pythons-pickle-module" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;moreati&#x2F;pickle-fuzz#rehabilitating-python...</a>
ChrisSDalmost 5 years ago
This seems to seriously misunderstand the point of pickle. It&#x27;s not for data interchange. It&#x27;s for e.g. caching objects or debugging. That&#x27;s it.<p>The fact it keeps &quot;old code&quot; is a feature. The object is exactly as it was at the time it was saved.
评论 #23860415 未加载
评论 #23859865 未加载
lordnachoalmost 5 years ago
I think these flaws are fairly minor, at least you seem to be nudged towards use cases where you&#x27;re not overly reliant on pickle for complex work.<p>If readability is an issue there&#x27;s a JSON version that&#x27;s quite useful.<p>Other than that, most of the other concerns are addressable. If security matter perhaps use an encryption lib around the pickle, rather than ask for it to be built into it? As for speed, you&#x27;re already using python and chances are you&#x27;re not constantly pickling and unpickling?
评论 #23862768 未加载
staredalmost 5 years ago
I am surprised when people use pickle NOT as a last resort.<p>For numeric data, H5 is nice. For configs, JSON is pretty much a standard. For Python code... well, nothing beats Python code.
edejongalmost 5 years ago
Pickle&#x27;s greatest flaw is the complete lack of forward and backward compatibility. The compatibility is not guaranteed between when upgrading any of the dependencies. Dependencies should stay the same over releases, halting forward progress in the development process.
评论 #23858422 未加载
sradmanalmost 5 years ago
Data serialization is hard and the artifacts are much longer lived than executable code and even our API interfaces. YAML, JSON, XML are all flawed. There are many competing binary serialization frameworks. Beware. Dar be Dragons in Durable Data.
cmwelshalmost 5 years ago
I find it interesting that everyone so far has suggested JSON as a pickle alternative. Depending on why you are serializing and deserializing the data, a lot of times the true replacement for pickle is a full-fledged database.
评论 #23860289 未加载
varbhatalmost 5 years ago
It must only be used for small programs and it serves the purpose well.
评论 #23860128 未加载
forgotmypw17almost 5 years ago
Similar to PHP&#x27;s serialize() and unserialize()
评论 #23857195 未加载