Ask HN: Code should be stored in a database. Who has tried this?

22 点作者 vaughan大约 2 个月前

To me it seems obvious that code should be stored in a database rather than a hierarchical, text-based format.The main way we navigate and organize code is by folder hierarchies. Everyone has a different approach: by feature, by module, by file type (template, component, etc.), by environment (backend/frontend).Rather than folders and file names, everything could just be tagged in different ways.Who has tried this and what is the best tool for working like this today?

36 条评论

igouy大约 2 个月前

"ENVY/Manager augments this model by providing configuration management and version control facilities. All code is stored in a central database rather than in files associated with a particular image. Developers are continuously connected to this database; therefore changes are immediately visible to all developers.""<a href="https://www.google.com/books/edition/Mastering_ENVY_Developer/ld6E19QIMo4C?hl=en&gbpv=1&pg=PA1&printsec=frontcover" rel="nofollow">https://www.google.com/books/edition/Mastering_ENVY_Develope...</a>~pdf 1992 Product Review: Object Technology’s ENVY Developer<a href="http://archive.esug.org/HistoricalDocuments/TheSmalltalkReport/91_95/SMAL0202.PDF" rel="nofollow">http://archive.esug.org/HistoricalDocuments/TheSmalltalkRepo...</a>

Lutger大约 1 个月前

You don't have to store things in a database to do this. Code is almost almost read from disk into some kind of in memory data structure that is amenable to such analysis, maybe even more so than a generic database. Doesn't matter if you use vscode or vim, most developers have some kind of tool that does semantic analysis and which affords navigation and organization of code.Its just that the main way code editors present navigation follows the path hierarchy, also because its often intimately tied to how programming languages shape modules. Most editors have at least some alternative navigation however, and most people are using at least some of them: outlining by declaration symbols, search, changes, unittests, open files, bookmarks, etc.So in a way, this is already how it is done, except the 'database' part is really tied to the code editor and its storage component nicely decoupled (in the end, databases are usually also just a bunch of files).I think any real improvements on this model can only come from a new programming language design, and as others have pointed out, this hasn't caught on in the past. The reason for this is probably not that file oriented modularity is the best thing there is, but rather the escape velocity needed to get out of the vast ecosystem of tooling around files, like the OS, git and existing code editors and whatnot.

827a大约 1 个月前

1. The biggest advantage of storing on the filesystem is that Files are the core primitive on UNIX based operating systems, and is core-enough on Windows. Giving that up would require tremendously good reasons.2. Everyone organizes projects into folders differently, but in most languages the only reason why you organize things into folders is to make it easier for humans to find things. The computer doesn't care where the files are stored. So, you're proposing: Give up a feature that exists solely to make it easier for humans to find things; its extremely difficult to envision a world where this results in a more ergonomic world.3. The hierarchy is only one way of thinking about how you can browse a filesystem. There is nothing stopping editors from indexing files in different ways, allowing you to browse by, for example, files tagged with some comment at the top, or files which contain classes versus interfaces. In fact, more comprehensive IDEs like JetBrains already do this for some languages. You don't have to change the storage substrate to get 100% of the benefits you propose, with the extremely small cost of some indexing process when a project is first opened.4. There are twenty billion programming languages out there, like four of them do what you're suggesting, yet no one uses any of them for anything meaningful. There is nothing, period, stopping any new language from doing something like this. Golang could have been designed like this; but it wasn't.

评论 #43555081 未加载

评论 #43555043 未加载

jrjsmrtn大约 2 个月前

Hmmm... Smalltalk, a pure object-oriented language, stores everything in an image, and has tons of different browsers to inspect its "object soup". Install a Squeak Smalltalk if you're curious :-)Userland Frontier was a wonderful scripting environment born on macOS and ported to Windows. It was a mix of an object database, storing code and data, an extensible scripting language called UserScript, and very powerful InterApplication capabilities, based on Apple's Open Scripting Architecture. Dave Winer, its author, worked on the XML-RPC standard afterwards.

评论 #43527315 未加载

CLPadvocate大约 2 个月前

SAP did it - they store the code in the database <a href="https://www.reddit.com/r/SAP/comments/jsgb1c/where_and_how_is_abap_code_actually_stored_in_an/" rel="nofollow">https://www.reddit.com/r/SAP/comments/jsgb1c/where_and_how_i...</a>

lutzh大约 1 个月前

In the Unison language, code is stored in a database, with a hash code of its content as the key. Quoting <a href="https://www.unison-lang.org" rel="nofollow">https://www.unison-lang.org</a> :A new approach to Storing code. Other tools try to recover structure from text; Unison stores code in a database. This eliminates builds, provides for instant nonbreaking renames, type-based search, and lots more.

cdirkx大约 1 个月前

I don't think this is unique to code, but a limitation of filesystems in general. You could make the same argument for photos: I want them sorted by date, by tag, by person in the image, by location.I can do this in Lightroom or my "Photo" app, but then you are always reliant on some third-party tool. It would be nice if there was some native way for files to not have to commit to a single hierarchy, but able to switch views on the fly (without it being insanely slow for larger amount of files).

unilynx大约 1 个月前

We did this for a long time for our CMS - although we did simulate a filesystem structure. We also set up a git-like system to store versioning information and set up WebDav to mount it all and allow direct source code editing. It worked pretty well for years.We eventually stopped because we were relying much more on external tools (eg npm, webpack) which had all sort of issues over webdav mounts. Maintaining all this code management infrastructure in parallel wasn't worth it in the end, and we moved the code back to disk, switched to git, etc.And photoshop silently ignoring webdav I/O errors when saving designs didn't help either.You already have tagging by type on the filesystem - the file extension. That allows you to limit file searches. Add extra metadata to extensions if the same extensions have different roles (.backend.ts, .frontend.ts, .html.template, .text.template)These days I prefer to structure for easy removal of code - everything for eg. a widget (frontend, backend, css) goes into a folder and I only need to remove that folder when the widget is retired, and linting/validation will show me the few remaining path references I need to cleanup.

andrewaylett大约 1 个月前

I do store all my code in a database. It's got time-travel functionality, the ability to switch into parallel universes, and a nice hierarchical view that lets me find things easily if I don't want to use my language-specific indexes.Yes, that's git, a filesystem, and an IDE -- and the physical layout of the code isn't the way I normally navigate it. It's useful structure for the tooling, though.It's definitely true that "using git" or "putting our code on the filesystem" aren't ends in themselves, they are means to an end. If we found a way to meet our requirements that has fewer trade-offs to git then I'm sure we'd jump. Git and filesystems are possibly the worst options for organising code and history, except for all the other options out there :P.

r24y大约 1 个月前

That's basically what an LSP is. It's true that it's built on top of the file system, and most IDE users will navigate using the folder hierarchy, but it still stores information about the name, type, and connectedness of the codebase, and allows querying. Your idea about arbitrary tags (feature, environment) would be useful but does not seem to be supported by the spec [^1] yet.[^1]: <a href="https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#languageFeatures" rel="nofollow">https://microsoft.github.io/language-server-protocol/specifi...</a>

codingdave大约 1 个月前

Lotus Notes did that. The database held the code, the data, the UX, the security. There was a standard UX for accessing different types of code, design elements, and the data.On the positive side, DevOps was a breeze - push a DB to a server and everything just worked. Pushing new code to all the DBs was a breeze. Any dev could immediately jump into an app and have a sense of where they would find elements of the app. All apps ran the same way, so it was realistic for small shops to deliver large products.On the downside, source control was sub-optimal. That was a weakness in the platform even 25 years ago when it was modern, and never quite improved... although there are ways to import/export the code to make it work with modern source control like git. It also made each app heavier than it needed to be - instead of sharing centralized code, each app had its own copy. Your infrastructure footprint got big, fast.For a modern take on it, I think other comments are hitting the key point - you might want to have fuzzier definitions of what a database and a file system are. At the end of the day, they are both ways of storing data to disk with different access methods. But it sounds like you are more concerned about DX. To get to your vision, I'd focus more on an IDE that lets you navigate code how you desire, while leaving the actual code storage as a DevOps exercise where they can focus on whatever solutions optimizes delivery and reliability.

jFriedensreich大约 1 个月前

I looked into four interesting incarnations of this over the years:1. During the peak phase of couchDB as application server (2006 - 2009) it was common to store not just the data but all the app assets and code in the database and replicate everything together. Plenty of the community tried to bring this to the extreme with every function being stored as versioned document (i see it as precursor to FAAS) and the whole application being editable with an integrated IDE. Also functions in my incarnation of this system were not loaded by filename but with a content addressed manifest. You would reference functions by name but the name would be resolved with a hash manifest.2. There were several systems with erlang/BEAM to take the hot code replacement to the extreme in similar way, storing code in i believe mnesia.3. I think bloomberg (i cannot find the hn post to confirm it was them, if someone has the link that would be great) has/had a bespoke code database with custom version control and fully integrated IDE. They leveraged this for some pretty interesting workflows4. Probably not exactly what you mean as it does not include the runtime integration, but google and sourcegraph are building code databases with indices on symbols and semantic understanding of references and more. I hear great things from people who worked with it especially

评论 #43580113 未加载

movpasd大约 1 个月前

I can think of an argument for justifying the status quo.The folder structure reflects the subdivision of code into modules. Each module may have submodules, and each module decides the visibility of its children to other modules at the same level as itself, and to its own supermodule. This is a naturally hierarchical structure, which file systems lend themselves well to. A code database would have to replicate this structure within it somehow anyway.A non-hierarchical tag system would help model situations where you have multiple orthogonal axes along which to organise the code (as you point out). But in these cases, which axis gets the top-level hierarchy just doesn't matter. Pick one, maybe loosely informed by organisational factors or by your problem conceptualisation.On the flipside, in situations where a stricter hierarchy would improve modularity, the tag system might _discourage_ clean crystallisation, and cause responsibilities to bleed into each other. IMO, it's more important for there to be modules at all than for their boundaries to be perfect.

sim7c00大约 2 个月前

what problem should it solve? you can store anything in a db fetch it and run it. binaries, what not. parts of then. web components.id ask, is it really a bottleneck. fetching code. maybe in some systems or types of execution environments it could be worth it. really dont know.Id assume data is stored in databases because it needs to be viewed from different angles. (join statements etc ) and it has different peeformance and layout requirements.most code is 'read only' too, so theres no need to do stuff like synchronization / locking on writes and ordering stuff.then again, there might be systems that dont have this aspect, and somehow have very high load on fetching code, and maybe code is writable too, and could have queries to extract certain parts of code, or combined code from various files/tables.i think tho the main reason is this difference between how code and data are fetched and used will be the reason why in the general case it works like it works. its not been needed to work differently. so no one looked for a solution. no big problems in the space. (my guessing)

tony-allan大约 2 个月前

What benefits do you expect from this approach compared to the hugh number of tools that work very well with folders and text files?

评论 #43555296 未加载

mickael-kerjean大约 1 个月前

I have been playing with the exact opposite, representing a database as a file structure where databases show up as top level folders, tables are subfolders, and each row appears as a form like file automatically generated from the schema. You can see a screenshot of such form in [1] which you can edit and save back, effectively enabling anyone familiar with Dropbox to edit data on a database as it just look like a form to fillThe project is oss [2] and the storage connector is "mysql". It even handles foreign key by creating links to another folder with a search query to find the table row it's associated with[1] <a href="https://i.imgur.com/OBJGIeg.png" rel="nofollow">https://i.imgur.com/OBJGIeg.png</a>[2] <a href="https://github.com/mickael-kerjean/filestash" rel="nofollow">https://github.com/mickael-kerjean/filestash</a>

评论 #43555211 未加载

Twey大约 1 个月前

From two directions:There are programming languages that store code in some kind of non-hierarchical format. For example, Unison (<a href="https://www.unison-lang.org/" rel="nofollow">https://www.unison-lang.org/</a>) stores code in a database just as you suggest, and projects it down to text for editing. A more established example is probably Smalltalk, which stores the code as part of an image that is edited live in the Smalltalk environment.On the other side, you can have filesystems that are not hierarchical, for example semantic filesystems like Tagsistant for Linux — these can be used for more flexible relationships between any kind of file, not just code.

MarceColl大约 1 个月前

I implemented something similar for Common Lisp: <a href="https://github.com/marcecoll/rekishi" rel="nofollow">https://github.com/marcecoll/rekishi</a>The idea was that you don't have files, just functions that you can bring in and out of scope while editing. You have branches per-function. This all worked more or less transparently to the user using the normal emacs Sly Common Lisp flow.It was implemented overriding the +DEFUN+ macro, so function re-definitions automatically serialized and created a new entry in the DB.The Proof-of-Concept used SQLite, but I also envisioned a postgres backed version for jamming on programs with your friends in real time.

评论 #43558634 未加载

beardyw大约 2 个月前

At first I couldn't see the point of it. But perhaps one could visualise a quite granular approach to code with rows corresponding to short blocks or lines of code. Worth considering, but would need a revolution in coding practice!

9rx大约 1 个月前

> To me it seems obvious that code should be stored in a databaseWhere are you storing code if not in a database?> rather than a hierarchical, text-based format.Okay, so you mean not a hierarchical database, but rather a... Relational database, I guess?> The main way we navigate and organize code is by folder hierarchies.Organize I can buy, I suppose. But I navigate by AST representation (as provided by an LSP in this day of age). It turns out code is a database too!> Rather than folders and file names, everything could just be tagged in different ways.So you are looking for WinFS? While it suffered from many technical issues, its biggest problem is that users really didn't gain much from it.

rl1987大约 1 个月前

There's CodeQL, but it seems to be mostly limited to security research (code review automation to find vulns). See: <a href="https://codeql.github.com/" rel="nofollow">https://codeql.github.com/</a>

AndrewDucker大约 1 个月前

In order to uniquely reference one piece of code from another it needs to have a unique name/namespace/reference. Whatever organising principle you use for that will tend to become the hierarchy that your code is stored by.This doesn't stop you from also accessing it in other ways. And with modern IDEs you can search across a fairly chunky codebase near-instantly, which would allow you to treat it as if it's in a database.

mamcx大约 1 个月前

Yes.And the keyword here `database` not need to mean the typical one. In fact, most tools (like git) are databases over the code. IDE, parsers, etc. POOR ONES, and probably in the way of 'any program is a poorly implemented half of lisp', but intentionally create a database interface with a relational(enhanced!) view with intentional CRUD+Queries make a lot of sense.

fergie大约 1 个月前

You are talking about taxonomy, and specifically multifaceted classification. In practice a module system with unique names is sufficient- like for example npm.However its worth noting that all of the systems that rely on databases to store code (SharePoint, SAP, Power Platform) suck haaaaaaard, mainly due to issues with versioning and configuration management.

alex7o大约 1 个月前

Isn't the filesystem also a KV database, depending on it some even have versioning and deduplication. Although I agree a language focused on being stored in a SQL database would offer new capabilities for versioning not available in stuff like git or svn/hg. With that also new challenges will arise the will need to be explored.

willdealtry大约 1 个月前

Slang code at Goldman Sachs is all stored in a database, which is very useful 'cos if someone is having a problem with part of the infra that you're responsible, you can access their scripts and dependent libraries (assuming they decide to make them public)

barotalomey大约 1 个月前

> To me it seems obvious that code should be stored in a database rather than a hierarchical, text-based format.So, granted a filesystem does exhibit CRUD, and hierarchical relations, it's already a relational database.I take this as you are arguing about the utility of a text based format?

评论 #43555172 未加载

kazinator大约 1 个月前

Not only should code structure not be modeled in a database, even textual code shouldn't be stored in a database. We had version control systems that used databases, and they all sucked. A lot of that suckage had to do with the database.

oliviergg大约 1 个月前

The code is meant for humans, and in my view, the database is not really for humans. There are plenty of tools in IDEs that transform code into a 'database,' and in 20 years I've never really needed to use them on a daily basis. Except for getting familiar with legacy code, and even then to create a mental map, the database seems like overkill to me. That said, I should note that I've primarily worked on startup or business codebases, not FAANG or equivalent ones.Edit : Reformulation because I was voted down : Why change the storage format if the IDE already manage it ? And I add, For storing in database, you have to think about the granularity of your data. and it rapdily become the line, if not the character. Working daily with code stored in database, (salesforce), where the granularity is the class, is really anigthmare from a Content Version point of view.

mikequinlan大约 1 个月前

IBM Visual Age for Java was the first product I used that did this. The problem was that all utilities and other processes required access to the source code, which had to first be exported from the database then re-imported.

compsciphd大约 1 个月前

<a href="https://en.wikipedia.org/wiki/Source_Code_in_Database" rel="nofollow">https://en.wikipedia.org/wiki/Source_Code_in_Database</a>

attila-lendvai大约 1 个月前

the key idea is to admit/realize that a program is a graph, and that a flat string of characters is not an ideal way to store such complex structures.once that leap is made, a whole lot of the complexities of namespaces, modules, source control, and parsing become much simpler/better. this comes at the cost of more complexity in the editor/infrastructure, but that is a singular place while in return it is simplifying every program written.

attila-lendvai大约 1 个月前

there's this interesting proof of concept written in common lisp:<a href="https://github.com/projectured/projectured" rel="nofollow">https://github.com/projectured/projectured</a>the depth work is almost done. IIRC there are only a couple of nontrivial issues left, but it's been abandoned.

octocop大约 1 个月前

>text-based formatI'm sorry i don't read binary files.

compressedgas大约 2 个月前

Don't need to put the code in a database to do that. You can do that entirely with specially formatted comments and a projectional editor.

评论 #43533694 未加载

remyp大约 1 个月前

Maybe I’m being pedantic, but isn’t a database ultimately a bunch of files on disk? Unless you’re using a pure in-memory DB?

评论 #43559040 未加载

36 条评论

igouy大约 2 个月前

Lutger大约 1 个月前

827a大约 1 个月前

评论 #43555081 未加载

评论 #43555043 未加载

jrjsmrtn大约 2 个月前

评论 #43527315 未加载

CLPadvocate大约 2 个月前

lutzh大约 1 个月前

cdirkx大约 1 个月前

unilynx大约 1 个月前

andrewaylett大约 1 个月前

r24y大约 1 个月前

codingdave大约 1 个月前

jFriedensreich大约 1 个月前

评论 #43580113 未加载

movpasd大约 1 个月前

sim7c00大约 2 个月前

tony-allan大约 2 个月前

What benefits do you expect from this approach compared to the hugh number of tools that work very well with folders and text files?

评论 #43555296 未加载

mickael-kerjean大约 1 个月前

评论 #43555211 未加载

Twey大约 1 个月前

MarceColl大约 1 个月前

评论 #43558634 未加载

beardyw大约 2 个月前

9rx大约 1 个月前

rl1987大约 1 个月前

There's CodeQL, but it seems to be mostly limited to security research (code review automation to find vulns). See: <a href="https://codeql.github.com/" rel="nofollow">https://codeql.github.com/</a>

AndrewDucker大约 1 个月前

mamcx大约 1 个月前

fergie大约 1 个月前

alex7o大约 1 个月前

willdealtry大约 1 个月前

barotalomey大约 1 个月前

评论 #43555172 未加载

kazinator大约 1 个月前

oliviergg大约 1 个月前

mikequinlan大约 1 个月前

compsciphd大约 1 个月前

<a href="https://en.wikipedia.org/wiki/Source_Code_in_Database" rel="nofollow">https://en.wikipedia.org/wiki/Source_Code_in_Database</a>

attila-lendvai大约 1 个月前

octocop大约 1 个月前

>text-based formatI'm sorry i don't read binary files.

compressedgas大约 2 个月前

Don't need to put the code in a database to do that. You can do that entirely with specially formatted comments and a projectional editor.

评论 #43533694 未加载

remyp大约 1 个月前

Maybe I’m being pedantic, but isn’t a database ultimately a bunch of files on disk? Unless you’re using a pure in-memory DB?

评论 #43559040 未加载