The article is not explaining the point, which I believe is: type your dicts if you want to provide strict guarantees to your downstream about data shape.<p>If you know precisely what the data is used for - great, go ahead - type system is your friend.<p>If you don't know how the data should be used, it's often a different story. Wrapping data in hand typed classes is a terrible idea in the typical data engineering scenarios where there might be hundreds of these api endpoints, which also might be changing as the upstream sees fit. Perfect way to piss off your downstream users is to keep telling them "sorry the data is not available because I overspecified the data type and now it failed on TypeError again". Usually the downstream is the domain expert, they know which fields should be used and they don't know which ones before they start using it. Typically the best way is to pass ALL the upstream data down, materialize extra fields and NOT modify any existing field names, even when you think you're super smart and know better than domain experts. Too often it happens that a "smart" engineer though he knew better and included only some fields. Only for then to be realized that the data source contained many more gold nuggets, and it was never documented that these were cleverly dropped.
Python's strapped on type annotations have been designed around traditional OOP, and it feels like a bad fit for the language. Duck typing is a tremendously powerful form of polymorphism, and none of the PEPs for type annotations do a great job of supporting it. Protocols don't work well with dataclasses and not at all with dicts. TypedDicts could have been perfect, but they explicitly disallow extra keys. Why even use a TypedDict instead of a dataclass? Why make yet another traditional OOP abstraction that was already well served by multiple other features of the language? Even more frustratingly, TypedDicts show that it could have been done. They just decided to break it on purpose.<p>TFA accidentally even brings up the reason by dicts are so powerful: they enable easy interoperability between libraries (like a wire format). Using two libraries together that insist on their own bespoke class hierarchy is an exercise in data conversion pain. Further, if I want a point to be an object containing fields for "x" and "y", I'd much rather just use a dict rather than construct an object in some incompatible inheritance nightmare.
Interesting how Clojure takes the complete opposite approach by simply making dicts immutable.<p><a href="https://chasemerick.files.wordpress.com/2011/07/choosingtypeforms2.png" rel="nofollow">https://chasemerick.files.wordpress.com/2011/07/choosingtype...</a>
This is something I enforced in a big rewrite at a previous company.<p>People would take a full API response, and pass bits of it around with mutations. Understanding what the object looked like 5 functions deep was really hard. If the API changed... Oh boy.<p>I found many bugs just tracing the code like this. It made me a big proponent of strong typing, or at least strong type hinting.
This opinion gets at the heart of the reason to use type languages or not. After all, what is a dict but an untyped struct?<p>Untyped languages are excellent for smaller code bases because they are more comfortable to program in and faster and more general. Types of polymorphism possible in these languages are simply not possible or much harder in typed languages. Also, as others have said, the problem domain may not be as explored yet.<p>Typed languages really start to shine as a code base gets huge. In these instances well maintained untyped language code bases start collapsing under the weight of their own unit tests, while moderately well or poorly well maintained instances of untyped language code bases become a mess. Mostly this is due to difficulties in communication when the code base gets worked on by so many people that it's hard for them all to communicate with each other. In these cases a typed language keeps everyone on the same page to some extent.<p>Both camps will hate me for saying this I think, but it's what I've observed over the years.<p>It also may sound like I prefer typed languages, but in fact my favorite languages to work in are Clojure and Python. My code bases as a DevOps engineer rarely pass the 10,000 line mark and never pass 100,000 line mark. It's much more comfortable for me in these untyped languages.<p>Untyped languages also really shine in microservices for the same reason.
* Don't let dicts spoil your <i>python</i> code<p>Maybe that was implied?<p>Anyways, a lot of languages take another stance. E. Elixir where using dicts along with pattern matches calls for quite powerful abstractions.<p>As long as the dicts are kept shallow and the number of indirection in the code in general so, then it is alright to navigate and use.
Glad to see pydantic get mentioned here. It’s a great solution for this exact problem. I was introduced to it by FastAPI and have been using it in all my projects since.<p>At the end of the day you really can’t escape typing. It just makes life easier. We should stop letting languages try to remove it.
Took me a really long time to learn this lesson. IMO this is a variation of the primitive obsession code smell, although I'd say it's way more harmful. I was really reluctant to add data classes to my code when the good old PHP array could get the job done without holding me up with a bunch of beaurocracy. Of course they give no guarantees and enforce no structure, so inevitably you get slight variations depending on what you need, or maybe you happen to have a dict that's a superset of what you're feeding in, and it just becomes really hard to reason about things. And of course since it's not a named type, tracing things back becomes really hard.
Yes to the principle. But typed dict is useful for more than just "the wire".<p>There are places where you just dont need the overhead of a class. Yes slotted classes make this much cheaper but so do named tuples.<p>If the behavior of a thing is to map values then it should stay a dict.<p>If the behavior is a bag of attributes then yes pick something better.
I really liked the structure of this blog post. But It misses the positive aspects of using dictionaries. Like when you are the owner of the api you consume and just want the JSON to flow through your “application tier”
When JavaScript added hash/object deconstruction (both at the argument level and assigning variables) I noticed code has been using Dict-like function arguments everywhere. It makes typing them a bit more of a pain in the ass (especially without default arguments).<p>I haven’t decided if I like it better than just breaking up objects into arguments in a more simple functional style.<p>On one hand it’s more predictable but on the other most complex apps start passing around objects for everything. Typescript of course helps with that, as does nearly modularized code (ie not passing in full typed objects outside of the parent module which owns/generates them unless they uniquely operate on the full object).<p>These are the small rescissions you end up making a hundred times.
I don't program professionally, and I struggle with dicts and classes. On one hand, I want to avoid the Java world of needing to learn 8 new classes to use any library. So dicts are lightweight and extensible and feel like the modern way of doing things. One the other hand, all the problems listed in the article are right. You really do need to document different expected dicts somehow, which is basically structs/classes.<p>The other thing that always burns me are lists. Specifically lists of lists and lists of strings. Since python allows you to index into strings the same way as lists, for some reason I always loose track of where I am in the unpacking stack. This is when I switch to type hints.
I would really like a language where you can swap simple data collections like dicts or arrays with others, better defined, employing better suited algorithms without changing everywhere in your code how you access them.<p>So if getting a field using simple structure is mycol[key] it should look exacty the same when mycol is no longer a flexible dict containing adhoc objects but complex strongly typed immutanble trie or btree indexed array because at some point of evolution of your code it became apparent that this is exactly what you need.<p>The only language that I know of that has consistent interface between simple and complex (also custom) collections is Scala.
> Don't let dicts spoil your code (2020) (roman.pt) * Conditions applies<p>* Apply only for when parsing I/O. Do not substitute primitives with classes inside your code base for no good reason. Unless validation is needed, prefer a NamedTuple.
<i>Functions that accept dicts are a nightmare to extend and modify.</i><p>Compared to what? I see the article's point about dicts being, like everything else in programming, a tradeoff with benefits and limitations. But the article's needless dramatization of a pretty mundane point (and the button-pushing title) are, to these jaded eyes, a definite turn-off.<p>Meanwhile I'll keep using dicts when the use case calls for them, thank you. As a sibling commenter put it:<p><i>If you don't know how the data should be used, it's often a different story.</i><p>Exactly. The whole point (and benefit) of dicts is that they're squishy. Sometimes you need squishy.
My take is that dicts are fine as long as your code is well tested. Yeah, dataclasses and frozen classes have much better typing support, but if you code is mostly reading and writing JSON like many modern cloud apps, it can be easier to use plain dicts combined with decent tests to make sure you don't break downstream services.
I don’t usually write software in python, but when I do I try not to end up with a bag of dicts.<p>There are better ways to structure data for otherwise reusable functions.