Why do we need modules at all? (2011)

155 点作者 thomas11超过 10 年前

24 条评论

thomas11超过 10 年前

Armstrong's proposal reminds me a bit of Emacs extensions. Since Emacs Lisp doesn't have namespaces or modules, all functions must be uniquely named which is done by prefixing them: foo-replace. This is not that different from having a module foo, as Armstrong notes: "managing a namespace with namees like foo.bar.baz.z is just as complex as managing a namespace with names like foo_bar_baz_z".But what it enabled is an Emacs community where single functions are freely shared, for example on <a href="http://www.emacswiki.org/emacs/" rel="nofollow">http://www.emacswiki.org/emacs/</a>. People just copy them into their Emacs init file. Sometimes they modify them a little and post them again with their own prefix. This has obvious downsides such as lack of versioning and organization. But it provides a low barrier to entry and creates a dynamic community.

inflagranti超过 10 年前

To me this is the same question whether we need directories or not in a file system. Ideally, your file system is a flat database and files are indexes by a vast array of automatic and manually added metadata that allows to easily retrieve them. Microsoft tried to go this direction with WinFS that was eventually cut for Vista, maybe because it wasn't practical (yet). Looking how people use the Internet though, where 90% of browsing will start at Google, this does seem a very reasonable approach for many things in the future. At the end, why should humans do manual indexing and retrieval if the computer can facilitate this part?

felixgallo超过 10 年前

I think a lot of people are focusing on the implementation details here, which is fun and great, but the real deep insight here is the idea of a global registry of correct functions.If you postulate for a minute that the (truly nontrivial) surface problems are all solved, and concentrate only on the idea of a universally accessible group of functions that accretes value over time -- like a stdlib that every language on every runtime could access -- that seems like a pretty exciting idea worth thinking about.I had something like that idea almost two decades ago (<a href="http://www.gossamer-threads.com/lists/perl/porters/26139?do=post_view_threaded#26139" rel="nofollow">http://www.gossamer-threads.com/lists/perl/porters/26139?do=...</a>) but at the time it was all in fun. But these days, that sort of thing starts looking pretty possible, especially for the group of pure functions.

andrewstuart2超过 10 年前

Because humans suck at serialized content.7 +- 2. [1] That's the number of things our prefrontal cortex/short term memory can track at once. That's why we (humans) organize things into hierarchies. That's why the best team size is around that number. Etcetera.Heck, everything in the world on a computer is serialized into memory or onto disk. Or addressed as some disk in a serial array of disks. Serialized as in, "there's some data somewhere in these 2TB that tell me where in the same 2TB the rest of the data is." Computers excel at this. Humans are terrible at this.I guess my point is, humans are the reasons we need modules.[1] <a href="http://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_or_Minus_Two" rel="nofollow">http://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_...</a>

评论 #8574699 未加载

shaurz超过 10 年前

I quite like the idea. I think it would probably still make sense to have "collections" where a bunch of related functions can be grouped together, discovered and worked on as a unit (this would just be an optional extra layer on top of the global function database). Although there would no exclusivity in collections so a function might appear in more than one, or zero, collections.Another idea: Unit tests could be stored as function metadata.

cwmma超过 10 年前

JavaScript works similar to this this and apps/libraries that wrap themselves in a giant closure work almost exactly like this. The disadvantage of this over using modules is in dependencies between functions. When you don't have modules and you try to refactor you get this annoying tendency for function a in file b to break when you change function y in file z. When you have modules you can easily tell before changing function a whether it is exported or not, and if it is to see in file z wither file a is imported.Not saying this Erlang idea isn't good or wont work, just these are the pitfalls besides the obvious name spacing and conflicts.

评论 #8572935 未加载

评论 #8572846 未加载

Verdex超过 10 年前

I saw Joe's strange loop talk [1] a while ago and I get the same vibe reading his post as I did when watching the video. It sounds very cool, but I can't shake the feeling that it only works for 85% of the code. That is to say if you program in exactly the right way, you will be able to do everything you want and it will work with this system, but there are ways of programming that won't work with this system.More specifically I feel like there are two problems. 1) It feels suspiciously like there's a combination of halting problem and diagonalisation that shows there are an uncountably infinite number of functions that we want to write that can't be named (although I would want to have a better idea of how this is supposed to work before I try to hammer out a proof). 2) I don't understand how it's possible for any hashing scheme to encode necessary properties of a function such that the function with necessary properties has a different hash than an otherwise identical function without these properties. For example can we hash these functions such that stable sort looks different than unstable sort? Wouldn't we need dependent typing to encode all required properties? And if that's the case couldn't I pull a Gödel and show that there's always one more property not encodable in your system?[1] - <a href="https://www.youtube.com/watch?v=lKXe3HUG2l4" rel="nofollow">https://www.youtube.com/watch?v=lKXe3HUG2l4</a> [2][2] - <a href="https://news.ycombinator.com/item?id=8572920" rel="nofollow">https://news.ycombinator.com/item?id=8572920</a> (thanks for the link)

评论 #8574820 未加载

derefr超过 10 年前

A function's true name should be its content hash. (Where that content hash is calculated after canonicalizing all the call-sites in the function into content hash refs themselves.) This way:- functions are versioned by name- a function will "pull in" its dependencies, transitively, at compile time; a function will never change behaviour just because a dependency has a new version available- the global database can store all functions ever created, without worrying about anyone stepping on anyone else's toes- magical zero-install (runtime reference of a function hash that doesn't exist -> the process blocks while it gets downloaded from the database.) This is safe: presuming a currently-accepted cryptographic hash, if you ask for a function with hash X, you'll be running known code.- you can still build "curation" schemes on top of this, with author versioning, using basically Freenet's Signed Subspace Key approach (sort of equivalent to a checkout of a git repo). The module author publishes a signed function which returns a function when passed an identifier (this is your "module"). Later, they publish a new function that maps identifiers to other functions. The whole stdlib could live in the DB and be dereferenced into cache on first run from a burned-in module-function ref.- function unloading can be done automatically when nothing has called into (or is running in the context of) a function for a while. Basically, garbage collection.- you can still do late binding if you want. In Erlang, "remote" (fully-qualified) calls don't usually mean to switch semantics on version change; they just get conflated with fully-qualified self-calls, which are explicitly for that. In a flat function namespace, you'd probably have to make late-binding explicit for the compiler, since it would never be assumed otherwise. E.g. you'd call apply() with a function identifier, which would kick in the function metadata resolution mechanism (now normally just part of the linker) at runtime.Plug: I am already working on a BEAM-compatible VM with exactly these semantics. (Also: 1. a container-like concept of security domains, allowing for multiple "virtual nodes" to share the same VM schedulers while keeping isolated heaps, atom tables, etc. [E.g. you set up a container for a given user's web requests to run under; if they crash the VM, no problem, it was just their virtual VM.] 2. Some logic with code signing such that calling a function written by X, where you haven't explicitly trusted X, sets up a domain for X and runs it in there. 3. Some PNaCl-like tricks where object files are simply code-signed binary ASTs, and final compilation happens at load-time. But the cached compiled artifact can sit in the global database and can be checked by the compiler, and reused, as an optimization of actually doing compilation. Etc.) If you want to know more, please send me an email (levi@leviaul.com).

评论 #8572970 未加载

评论 #8574593 未加载

评论 #8573070 未加载

评论 #8572990 未加载

评论 #8574423 未加载

评论 #8573063 未加载

评论 #8573737 未加载

评论 #8573703 未加载

评论 #8576634 未加载

评论 #8573434 未加载

protomyth超过 10 年前

Lambda the Ultimate's discussion <a href="http://lambda-the-ultimate.org/node/5079" rel="nofollow">http://lambda-the-ultimate.org/node/5079</a> is pretty interesting.

philbo超过 10 年前

To answer the question in the title directly, I think modules are to aid reading and discovery.The fact that it is difficult to decide which module a function belongs in doesn't make them pointless. People who have to read or debug your code use them to quickly zero in on areas of likely interest.

al2o3cr超过 10 年前

In my experience, telling programmers "all functions must have unique names" means you get a half-ass module system tacked on via common prefixes. In other words, you get "foo_bar_function1", "foo_bar_function2" etc.

ryanisnan超过 10 年前

While you're talking about Erlang specifically, the concepts you bring up can be applied to programming in general.Why does Erlang (or any other language) have modules?The biggest reason for me (and I think the one with the most merit) is for clarity and usability.Modules exist as ways of grouping units of code by the responsibilities of that code. If you removed this hierarchy, wouldn't things become a lot more difficult to navigate and understand as a developer?

brianshaler超过 10 年前

Is the author's use of the term `module` specific to erlang? To me, it sounds like he's advocating for modules that are comprised of a single function, rather than utility belt modules that contain many functions. As I understand it, I agree with what the author proposes, and I feel like a subset of npm already provides what he's talking about. The best example is probably underscore.js versus lodash.js, which both have many functions and a wide API surface area. What's notable is that you can cherry-pick individual lodash functions and depend on a specific version[0]. (Admittedly, I lazily pull in the full lodash module instead of importing only the function(s) I'm using)Lately, I've been moving more toward the proposed design in my Node.js projects. It keeps individual files concise, makes code sharing trivial, encourages stateless methods, and it makes writing tests a breeze.[0] <a href="https://www.npmjs.org/browse/keyword/lodash-modularized" rel="nofollow">https://www.npmjs.org/browse/keyword/lodash-modularized</a>

Alex3917超过 10 年前

This is basically what Urbit is doing, among other things.

评论 #8572912 未加载

tel超过 10 年前

The problem is now you either have zero data abstraction or uncontrolled data abstraction without even a convention like "these functions work together as a bundle" to save you.That said, a nice SML module probably could work as the base abstraction here.

rymohr超过 10 年前

The problem with this approach is you need to consider every existing function name in order to define a new one.The beauty of commonjs modules is they allow you to focus on implementation, rather than identification. All functions can be anonymous, identified only by their path and named at the whims of the caller.

endergen超过 10 年前

Related to this would be all the cool content addressable third-party meta data. Services could automatically generate pre-compiles of things or alternate optimizations. Or auto complete data, or statistics, test suities, behavioral diffing, example code, documentation, the options are endless.

jbert超过 10 年前

So, immutability and/or api contract is important here.If I'm pulling in a function, I want it to do what I think I want. Sometimes I want that to change (get a bug fix), but sometimes I don't (someone introduces a bug, or makes the func more general and introduces slowdowns).This feels like a job for a content-addressable git-like tool. How about this:I can discover my function (via whatever means). The function is actually named 8804ea505fda087da53b799434c377f015933707 (the sha-something of it's (normalised?) textual representation).I then import it into my codebase as "useful_fun". My code reads like:<pre><code> useful_fun("do it", "to it") </code></pre> but I have some kind of dependencies/import record which says that "useful_fun" is actually 8804ea505fda087da53b799434c377f015933707. That means one and only one thing across all time, the func with that hash.So how do we handle updates? If we want a golang-like model, the developer could run something like "update deps". This would:- go back to the central repository, looking for updates to 8804ea505fda087da53b799434c377f015933707. It might find 5. Local policy then determines what happens. Could be "always choose the original authors update" or "choose the one with the most votes" or "always ask the dev, showing diffs".Note that because the unique name is based on the function content, any change to it creates a new item in the db. (Content-addressability, same way git and other systems do it.)- stuff can be grouped and batched. If I pull in 10 functions tagged with the same project ('module') and they've all been updated, I can say "and do the same with all the others".- This kind of metadata allows all kind of good stuff. I can subscribe to alerts on the functions I've imported and get told about new versions, or security warnings. This kind of subscription information can be used as a popularity contest to solve the "which fork on github do I want to use" problem?- people can still publish modules. They now look like a git directory or tree. A git tree is a blob which contains the hashes of the files within it. A 'module' could be a blob which specifies which (immutable) functions are in it.If we use normalised functions, we've now got a module representation which allows arbitrary functions to be pulled together. At fetch time, we can denormalise into the user's preferred coding style. At push time, we renormalise. We aren't grouping stuff into files, so a 'project' or a 'module' consists solely of the semantic contents, nothing to do with artificial grouping for the file system.Seems like an interesting future.

评论 #8572920 未加载

the_cat_kittles超过 10 年前

this talk about modules as a way to organize similar code makes me wonder- if you had all the functions in a global namespace, you could probably automatically generate some kind of organization by extracting relevant features from each function and doing some kind of clustering. maybe some features could be the function's dependencies, who depends on it, what it returns, its signature, and maybe even nlp in the hope that people are actually using descriptive variable names.

fat0wl超过 10 年前

isn't this issue sortof analogous to the expansion/contraction of a language core?Except in this case the core is user-generated and ever-expanding.I bet there are a lot of issues in Java history that could predict possible bumps in the road for such a system (since it was essentially concurrently designed by a bunch of actors -- except in that case they were corporate entities)

hyp0超过 10 年前

reminds me of gmail: instead of hierarchical directories ("modules"), just search, and have multiple tags, so an email can be in more than one directory ("metadata").Seems especially applicable to fp (like erlang), where code reuse is more often of small functions.

moron4hire超过 10 年前

I think what you're discussing is really just namespacing ala C++, Java, or .NET. Especially with Java and .NET, you don't import a self-contained module directly from individual source files. The modules are technically all accessible at any time (or at least, the ones linked in to the build, which in the case of the Java and .NET standard libraries is quite a smorgasbord). You just reference the class/function you want in some way: either with using statements or with fully qualified names.Because, really, if you start throwing everything into one store, you're going to run into the naming conflict issue, and any attempt at addressing the naming conflict issue is going to either look like importing modules or look like namespaces. You either have to explicitly state what your program has access to, or you explicitly state what function you mean when you have access to everything. Realistically, if you give every function a unique name and don't use namespaces, then there will start to be functions called system_event_fire() and game_gun_fire() and disasters_house_fire() and you're right back to having namespaces, just not in name or with a syntax that makes things nice when you know you're dealing with specific things.Though, it'd be nice if types weren't the only thing that could be placed into a namespace directly in .NET. I'd like to put free functions in there. The Math class in the System namespace only exists because of this. I'd have prefered there to be a System.Math namespace and Cosine and Sine be members of it. Then I could "using System.Math;" and call "Cos(angle)". Instead, I'm stuck in a limbo of half-qualified names.And I like it. I like it a lot more than Python, Racket, Node.js, etc. and having to import this Thing X from that Module Y. I like the idea that linking modules together is defined at the build level, not at the individual source file level. These languages are supposed to be better for exploratory programming than Java and C#, but actually, you know, doing the exploring part is harder!Sometimes, I really do just want to blap out the fully qualified name of a function, in place in my code. System.Web.HttpContext.Current.User. If I'm doing something like that, it's a hack, and I know it's a hack, and having the fully qualified name in there, uglying it up, makes clearer that it's a hack. Though, I suppose I'm one of the rare people who actually do go back and clean up my hacks.EDIT: I thought I wrote more, weird.The network-accessible database of every library, ever, is definitely a great idea. I think it's where we're heading, with tools like NPM, NuGet, etc. It seems like a natural progression to move the package manager into the compiler (or linker, rather, but that's in the compiler in most languages now). Add in support in an editor to have code completion lists include a search of the package repository and you're there.

zo1超过 10 年前

I don't know Erlang, so I might be missing something key here."I am thinking more and more that if would be nice to have all functions in a key_value database with unique names."Yeah, sure... Sounds good, right. Until you have naming conflicts.So then the patch is "oh, let's just add another column to make it more unique", without realizing that you've just, in essence, created a "module" of sorts except it's stored in some sort of giant key/value database.And then you've come full-circle back to the dilemma the author complains of which is that he doesn't know where to put a function that seems to belong in two modules.Eventually, I'd say this is a general failing of modules that could potentially solved by some sort of inheritance. Maybe even a tagging mechanism if you really want to be "patch-work joe" about it.

评论 #8572904 未加载

评论 #8575876 未加载

tracker1超过 10 年前

dibs on create_uuid_v4!!