Data evolution with set-theoretic types

127 点作者 josevalim4 个月前

6 条评论

Author here. This is probably the article that took me the longest to write, roughly 15 months, and I may still not have explained all concepts with the clarity they deserve and I intended to. If there is any feedback or questions, I'd be glad to answer them!

评论 #42710835 未加载

评论 #42702372 未加载

beders4 个月前

These are great examples of difficulties people will encounter in all popular statically typed languages sooner or later.I find the solution presented interesting but it is limited to the 3 operations mentioned.The alternative to this runtime schema checks and treating your data as data - the data you are working with is coming from an external system and needs to be runtime-validated anyways.Throw in some nil-punting and you can have a very flexible system that is guarded by runtime checks.

评论 #42716489 未加载

评论 #42710149 未加载

Moosieus4 个月前

This is some interesting shit, I love it! At least that parts I think I understand :P> The goal of data versioning is to provide more mechanisms for library authors to evolve their schemas without imposing breaking changes often. Application developers will have limited use of this feature, as they would rather update their existing codebases and their types right away. Although this may find use cases around durable data and distributed systems."Hard cutover" is definitely a lot less laborious, but often times incremental updates are necessary - large teams, existing tables and whatnot. To that end, I would foresee a lot of appeal in app dev when applied to database migrations.

airstrike4 个月前

That was an interesting read. Thanks for sharing and congrats on getting it out there.Two bits stood out and made me think, somewhat in disagreement:> If “data_schema” changes a user facing type, such as the Schema type above, in a backwards incompatible manner, its authors have to release a new major version. A new major version can be a fork on the road. Libraries that depend on “data_schema”‘s new version won’t accept the old ones. Old libraries that depend on “data_schema” have to be updated and potentially forced to also release a new major version, cascading the problem. If any package along the way is unmaintained, it can stall and complicate further downstream updates.I wouldn't call it a fork on the road. More like growing pains. This is to be expected of any dependency you have, and it's on you to keep your own code up-to-date. If the original Schema was indeed incorrect, it warrants fixing, even if it's an edge case.And then> In our v1 of Schema, we allowed name to only be a string(). In a statically typed language, we are effectively proving that all uses of Schema has a string() name. When we allow the name to also be nil in v2, that should not introduce bugs in our software yet because, by definition, there is no instance of Schema where the name is nil!> In other words, all existing code remains correct, but most type systems would immediately flag them as wrong, on the possibility they may receive a nil value. There must be a better way and that’s what we study in this article.Personally I don't think that what is proposed here is strictly better. I'd rather have to explicitly change my code to match the new shape of the Schema than to have it magically work. If I don't make changes now, I'm just punting that complexity for later and it's going to come back to bite me in some other issue that will be twice as hard to debug.

harrisi4 个月前

Some initial concerns when thinking about this:- Understanding how it expands the surface area of both the API and discussion of a struct (and thus also libraries and dependencies of the librar{y,ies}). Now you don't need to ask someone if they're using v1.0 of some package, but also which revision of a struct they're using, which they may not even know. This compounds when a library releases a new breaking major version, presumably - `%SomeStruct{}` for v2 may be different from `%SomeStruct{}` v1r2.- Documentation seems like it would be more complex, as well. If `%SomeStruct{}` in v1.0 has some shape, and you publish `v1.1`, then realize you want to update the revision of `%SomeStruct{}` with a modified (or added) field, would there be docs for `v1.1` and `v1.1r2`? or would `v1.1` docs be retroactively modified to include `%SomeStruct{}v1.1r2`?- The first example is a common situation, where an author realizes after it's too late that the representation of some data isn't ideal, or maybe even isn't sufficient. Typically, this is a solved with a single change, or rarely in a couple or few changes. I'm not sure if the complexity is worth it. I understand the desire to not fragment users due to breaking changes, but I'm not sure if this is the appropriate solution.- How does this interact with (de-)serialization?I happen to be working with the SQLite file format currently, and I generally really enjoy data formats. It's not exactly the same as this, since runtime data structures are ephemeral technically, but in reality they are not. The typical strategy for any blob format is to hoard a reasonable amount of extra space for when you realize you made a mistake. This is usually fine, since previous readers and writers ought to ignore any extra data in reserved space.One of the things one quickly realizes when working with various blob formats is the header usually has (as a guess), 25% of it allocated to reserved space. However, looking at many data formats over the last several decades, it's extraordinarily rare to ever see an update to them that uses it. Maybe once or twice, but it's not common. One solution is to have more variable-length data, but this has its own problems.So, in general, I'm very interested in this problem (which is a real problem!), and I'm also skeptical of this solution. I am very willing to explore these ideas though, because they're an interesting approach that don't have prior art to look at, as far as I know.EDIT: Also, thanks for all the work to everyone on the type system! I'm a huge fan of it!

评论 #42710369 未加载

评论 #42708718 未加载

Groxx4 个月前

Yea, this is an issue rather near and dear to my heart (due to pain). I very much appreciate strong and safe types, but it tends to mean enormous pain when making small obvious fixes to past mistakes, and you really don't want to block those. It just makes everything harder in the long term.As an up-front caveat for below: I don't know Elixir in detail, so I'm approaching this as a general type/lang-design issue. And this is a bit stream-of-consciousness-y and I'm not really seeking any goal beyond maybe discussion.---Structural sub-typing with inference (or similar, e.g. perhaps something fancier with dependent types) seems like kinda the only real option, as you need to be able to adapt to whatever bowl of Hyrum Slaw[1] has been created when you weren't looking. Allowing code that is still provably correct to continue to work without modification seems like a natural fit, and for changes I've made it would fairly often mean >90% of users would do absolutely nothing and simply get better safety and better behavior for free. It might even be an ideal end state.I kinda like the ergonomics of `revision 2` here, it's clear what it's doing and can provide tooling hints in a rather important and complicated situation... but tbh I'm just not sure how much this offers vs actual structural typing, e.g. just having an implicit revision per field. With explicit revisions you can bundle interrelated changes (which is quite good, and doing this with types alone ~always requires some annoying ceremony), but it seems like you'll also be forcing code to accept all of v2..N-1 to get the change in vN because they're not independent.The "you must accept all intermediate changes" part is in some ways natural, but you'll also be (potentially) forcing it on your users, and/or writing a lot of transitioning code to avoid constraining them.I'm guessing this is mostly due to Elixir's type system, and explicit versions are a pragmatic tradeoff? A linear rather than combinatoric growth of generated types?>It is unlikely - or shall we say, not advisable to - for a given application to depend on several revisions over a long period of time. They are meant to be transitory.An application, yes - applications should migrate when they are able, and because they are usage-leaf-nodes they can do that without backwards compatibility concerns. But any library that uses other libraries generally benefits from supporting as many versions as possible, to constrain the parent-library's users as little as possible.It's exactly the same situation as you see in normal SAT-like dependency management: applications should pin versions for stability, libraries should try to allow as broad of a range as possible to avoid conflicts.>Would downcasting actually be useful in practice? That is yet to be seen.I would pretty-strongly assume both "yes" and "it's complicated". For end-users directly touching those fields: yes absolutely, pick your level of risk and live with it! This kind of thing is great for isolated workarounds and "trust me, it's fine" scenarios, code has varying risk/goals and that's good. Those happen all the time, even if nobody really likes them afterward.But if those choices apply to all libraries using it in a project... well then it gets complicated. Unless you know how all of them use it, and they all agree, you can't safely make that decision. Ruby has refinements which can at least somewhat deal with this, by restricting when those decisions apply, and Lisps with continuations have another kind of tool, but most popular languages do not... and I have no idea how possible either would be in Elixir.---All that probably summarizes as: if we could boil the ocean, would this be meaningfully different than structural typing with type inference, and no versioning? It sounds like this might be a reasonable middle-ground for Elixir, but what about in general, when trying to apply this strategy to other languages? And viewed through that lens, are there other structural typing tools worth looking at?[1] <a href="https://en.wiktionary.org/wiki/Hyrum%27s_law" rel="nofollow">https://en.wiktionary.org/wiki/Hyrum%27s_law</a>