Ending Dependency Chaos: A Proposal for Comprehensive Function Versioning

47 pointsby davibuabout 2 years ago

18 comments

jmullabout 2 years ago

It good people are thinking about this problem, but this proposal doesn't address some fundamental issues. E.g,- why developers/maintainers choose the package granularity they currently do. e.g., you can have tiny granular packages today (npm famously has single-simple-function packages, which is widely derided, BTW). Developers break down packages in a way that makes sense to them to best develop, test, maintain, and release the package. If you reduce the overhead of a small "grains" of package, developers might choose to go a little more granular, but not a lot.- why people want or need to update. People want or need security updates. People want or need new features and functionality.So even with this magically fully in-place (there's some tooling implied here), I don't think there would be much impact on updating.(And people who tried to implement it or use packages that implemented it would be getting burned by version update mistakes -- this seems almost pathologically error-prone -- and when something does go wrong, it will take some new class of tool to even diagnose what went wrong where. People will end up with issues triggered by their personal upgrade path.)BTW, patch updated don't have to be done at a source or function level at all. (e.g. upgrade from version x to x+1 could be expressed as a delta. Or x to x + 2 for that matter.) This has been popping up for decades, but it seems the practical value must not be worth the trouble because it doesn't seem to catch on in a big way.

评论 #35010236 未加载

ulrikrasmussenabout 2 years ago

I think this is effectively achieved by the Unison language: <a href="https://www.unison-lang.org/learn/the-big-idea/" rel="nofollow">https://www.unison-lang.org/learn/the-big-idea/</a>

评论 #35009964 未加载

RcouF1uZ4gsCabout 2 years ago

This will make the chaos worse. Instead of having to figure out compatible versions of dozens of packages, you will now have to figure out compatible versions of thousands of functions.The solution to dependency chaos is grouping dependencies together and versioning the larger group, not splitting into even more dependencies.

评论 #35009577 未加载

quickthrower2about 2 years ago

I don’t think the proposal helps as it puts more burden on package maintainers (honourable semvar for the whole package is burden enough!).The problem is in NPM culture, and how much churn there is in packages and especially unnecessary breaking changes.Avoid that and then the problem is reduced from constantly fighting to play API keepup to simply letting security updates flow through.Let your patch version number go to the moon (which is no real problem practically, computers do big numbers and it is auto automatable.)

评论 #35008907 未加载

评论 #35008852 未加载

评论 #35009785 未加载

aconbereabout 2 years ago

Just hash the whole function and be done with it.Joe Armstrong made a proposal for this (I’m pretty sure half tongue in cheek).<a href="https://joearms.github.io/published/2015-03-12-The_web_of_names.html" rel="nofollow">https://joearms.github.io/published/2015-03-12-The_web_of_na...</a>

评论 #35011564 未加载

rcmeabout 2 years ago

What if the functions modify some type of external state. E.g. in TypeScript, what if a module property is updated by one function and referenced in a different function? How would two functions share the same state if they were at different versions?

评论 #35008867 未加载

kazinatorabout 2 years ago

ELF shared libraries like Glibc do this at the binary level. If some function changes in a way that breaks backward binary compatibility, then it gets versioned; so that existing compiled programs use the compatibility version.E.g. suppose that there is a new version of pthread_mutex_lock(&mutex) which relies on a larger structure with new members in it. Problem is that compiled programs have pthread_mutex_lock(&mutex) which pass a pointer to the older, smaller structure. If the library were to work with the structure using the new definition, it would access out of bounds. Versioning take care of this; the old clients call a backwards compatible function. It might work with the new definition, but avoids touching the new members that didn't exist in the old library.But this is a very low-level motivation; this same problem of low-level layout information being baked into the contract shouldn't exist in a higher level language.

rehevkor5about 2 years ago

Seems like this person may have been inspired by Rich Hickey's talk <a href="https://youtu.be/oyLBGkS5ICk" rel="nofollow">https://youtu.be/oyLBGkS5ICk</a>Regarding "nothing stopping us from making this versioning system completely automated" it seems like that depends on whether your language's type system supports that, and whether programmers follow the rules. For example, if you're relying on varargs/kwargs too much, it's going to be difficult to tell before runtime whether you've broken something.

Aqueousabout 2 years ago

Doesn't just stopping using version ranges also help with this? I've never understood why people would allow a package manager to update a piece of their code for them automatically. Using specifiers like ^1.5.3, allowing package manager to go all the way up to version 1.999 automagically is just asking for trouble.Find a set of versions that is self-compatible and works, and pin all your versions to those specific versions, with a hash if possible. Upgrade on your schedule, not someone else's. Thoughts?

评论 #35009995 未加载

评论 #35010004 未加载

nme01about 2 years ago

Isn’t the proposal simply saying that making libs more granular will solve the problem?I don’t know what’s everyone else’s experience but I was updating dependencies due to either bugs identified in old versions, because I wanted a new feature or because the old version was not supported anymore. Setting dependency to a fixed version was not an option. Using in your code function with given version fixed seems to be problematic.During updates the problem was to update all other dependencies as a result of the update. I can’t see how the proposed approach would solve it.Another problem which I sometimes faced (less annoying) was the api change i.e. start using function B instead of function A which requires slightly different parameters. Those kind of automatic refactors could be supplied with library upgrades (some libs already come with automatic migration “scripts”)

js8about 2 years ago

I agree, it would be nice to have a refactoring workflow that every program modification only creates new functions, never changes existing ones. Then we could get automated testing of new functions against old functions, or even, automated proof that the change doesn't affect the result.

评论 #35008654 未加载

MontagFTBabout 2 years ago

When a dependency changes versions, I need to update my own code to account for the dependency changes. Then I have to go through the (possibly arduous) process of reconciling those changes with other dependencies that have yet to go through this process.Version information is essentially a lossy compression- all the changes that go into a given release are summarized into a handful of numbers. Whether this happens at the component level or the function level only changes how lossy the versioning step is. I am not convinced it improves the workflow described above.

th3iedkidabout 2 years ago

What stops statement level versioning?

none_to_remainabout 2 years ago

My system's .so files have had versioned symbols for ages.(This problem is not a technical problem.)

chriswarboabout 2 years ago

Version numbers are just part of a name; we can't rely on them, any more than we can rely on package names (e.g. anyone can make a package with the name "aws-sdk"; that doesn't mean they can be trusted with our AWS credentials!)To actually get dependencies for our software, we need two mechanisms:- (a) Some way to precisely specify what we depend on- (b) Some mechanism to fetch those dependenciesMany package managers (NPM, Maven, etc.) use a third-party server for both, e.g.- (a) We depend on whatever npm.org returns when we ask for FOO- (b) Fetch dependency FOO by attempting to HTTP GET <a href="https://npm.org/FOO;" rel="nofollow">https://npm.org/FOO;</a> fail if it's not 200 OKDelegating so much trust to a HTTP call isn't great; so there's an alternative approach based on "lock files":- (a) We depend on the name FOO with this hash (usually 'trust on first use', where we find the hash by doing an initial HTTP GET, etc. and store the resulting hash)- (b) Fetch dependency FOO by looking in these local folders, or checking out these git repos, or doing a HTTP GET against these caches, or against these mirrors, or leeching this torrent, etc. Fail if we can't find anything which matches our hash.The interesting thing about using lock files and hashes, is that our hash of dependency FOO depends on the contents of its lock file; and that content depends on the contents of FOO's dependencies, including their lock files; and so on.Hence a lock file is a Merkle tree, which pins all of the transitive dependencies of a package: changing any of those dependencies (e.g. to update) requires altering all of the lock files in-between that dependency and our package. That, in turn, alters our lock file, and hence our package's hash.The author is complaining that such dependency-cascades require a whole bunch of version numbers to get updated. I think it's better to keep track of these things separately: use your version number as documentation, of major/minor/patch changes; and keep track of dependency trees using a separate, cryptographically-secure hash. The thing is, we already have such hashes: they're called git commit IDs!Other advantages of identifying transitive dependencies with hashes:- They're not sequential. Our package isn't "out of date" just because we're using hash 1234 instead of 1235. All that matters are the version numbers. In other words, we're distinguishing between "real" updates (a version number changed) and "propagation" (version numbers stayed the same, but a dependency hash changed).- They're unstructured; e.g. they give us no information about "major" versus "minor" changes, etc. (and hence no need to decide whether an update is one or the other!)- They can be auto-generated; e.g. we might forget to update our version number, but there's no way we can forget to update our git commit ID!- They're eventually-consistent: it doesn't matter how updates 'propagate' through each package; each sub-tree will converge to the same hash (NOTE: for this to work we must only take the content hash, not the full history like a git commit ID!).For example, take the following ("diamond") dependency tree:<pre><code> +--> B --+ | | Our package --> A --+ +--> D | | +--> C --+ </code></pre> When D publishes a new version, B and C should update their lock-files; then A should update its lock-file; then we should update our lock-file. However, this may happen in multiple ways:- B and C update; A updates (getting new hashes from B and C)- B updates; A updates; C updates; A updates- C updates; A updates; B updates; A updatesUsing version-numbers (or git commit IDs!) would result in different A packages (one increment versus two increments; or commit IDs with different histories). Using content hashes will give A the same hash/lock-file in all three cases. This also means we're free to propagate updates whenever we like, rather than waiting for things to 'stabilise'; and it's safe to use private forks/patches for propagating updates if we like, without fear of colliding version numbers.Note that some of this propagation can be avoided if our build picks a single version of each dependency (e.g. Python requires this for entries in its site-packages directory; and Nixpkgs uses laziness and a fixed-point to defer choosing dependencies until the whole set of packages has been defined)

cryptonectorabout 2 years ago

Versioning every API element is not really scalable.

cwpabout 2 years ago

I've done something like his for HTTP APIs. Instead of having versions of the entire API, eg with paths like `/v1/user/9893`, each endpoint had versions. The client would request the specific version using the Accept header.For example:<pre><code> GET /user/9893 Accept: application/json; charset=utf8; version=1 </code></pre> No semantic versioning, just bumped the version number for each significant change. And yup, "significant" is in the eye of the caller, but it worked out well.Now this is a bit different from TFA, because the server supported all the versions at the same time, so the caller could choose whatever mix of versions it wanted. This proposal is about assigning version numbers to individual functions rather than the library as a whole - essentially just a documentation/metadata change, with support from package managers.Here's why this is relevant: the fact that the API was versioned this way had a big impact on how it evolved over time. At first it was pretty much the same as the usual `v1/user/9893` design. But as new versions of specific resources were added, it forced a decoupling of the underlying data model from the schema that were exposed in the interface. Each endpoint-version became an adaptor layer between the contract it offered to the caller and the more generalized, more abstract functionality offered by the data layer. That had costs as well as benefits. New endpoint versions often required an update to the data layer, which in turn required refactoring of older versions to work with the new data layer while continuing to adhere to their contracts. It worked out well, but it did require a change in implementation strategy.I think the lesson for this proposal is that changing the way package metadata is handled is just the first step. Adopting it could then create pressure for mix and match packaging of the interface functions - "Hey can I get a version of this library with addFunction 1.2.16 and divFunction 2.0.1? I don't want to change all my addition code just to get ZeroDiv protection." That could be done with the right tooling and library design.Or maybe it makes DLL hell worse because now you have to solve semantic versioning compatibility for every function in a library and that's slower and more sensitive to semantic versioning mistakes. You could get work-arounds like "only ever change one function when you release a new version of the library" or "just bump all the major versions even if they haven't changed."Or maybe linkers would get built that can do the logic, like "when package A calls package B, use addFunction 1.2.16, but when package C calls package B, use 1.3.1"Anyway, I don't think this proposal is sufficient on its own. It would either have ripple effects throughout the language ecosystem, or be ineffective because of developers working around it, or not be adopted at all.

jjgreenabout 2 years ago

TL;DR (by ChatGPT)Stopped reading there.

评论 #35008459 未加载

评论 #35009921 未加载

评论 #35008436 未加载

评论 #35008719 未加载

评论 #35008758 未加载