Codebase as Database: Turning the IDE Inside Out with Datalog (2020)

258 pointsby rohitpaulkover 2 years ago

12 comments

The biggest challenge with incrementality in the IDE (I'm not even talking about incremental parsing or whatever) is you're subject to the whims of a given language's build system and how it behaves on multi-million LoC codebases. With sufficient human suffering thrown at the problem, you can approximate decent incrementality, but it ain't easy.It's not written about anywhere, but there is a long and storied history of trying different takes on "IDE Incrementality" for the Visual Studio IDE, all in service of making it a more palatable experience for people working in utterly massive codebases. Most failed, and the one that stuck (which I worked on for a bit, yay!) still has some weird issues with the experience where an end-user "knows what they want" and get bothered by the fact that stuff is still getting initialized, and they may or may not get what they want in a completion list. It's better than the legacy behavior, where the IDE loads context for all things up front before freeing up the UI (which would mean 10+ minute IDE loading screens for some users). But it's an inherently flawed system.

评论 #33083163 未加载

bjconlanover 2 years ago

This is a well composed idea. This reminds me slightly of (Rich's?) Codeq <a href="https://github.com/Datomic/codeq" rel="nofollow">https://github.com/Datomic/codeq</a> although codeq is only outlining code/scm relationships and not syntax trees etc. I think I was always hoping codeq would add something like this (for doing what you are doing to validate forms) but the input mechanism probably needed more hammock time

it4rbover 2 years ago

Martin Odersky (lead designer of Scala) also pursued this approach some years ago, not sure if any of this actually went into Scala3. He mentioned this in many talks before, such as <a href="https://www.youtube.com/watch?v=WxyyJyB_Ssc" rel="nofollow">https://www.youtube.com/watch?v=WxyyJyB_Ssc</a> (JVMLS 2015 - Compilers are Databases)

phyrexover 2 years ago

Meta uses something very much like this in production. It’s open sourced at <a href="https://glean.software/" rel="nofollow">https://glean.software/</a>

评论 #33079633 未加载

jillesvangurpover 2 years ago

It's not a new approach but actually a very old one. The author mentions Intellij but fails to note the history of Eclipse which, via IBM's Visual Age, you can actually trace back all the way to Smalltalk, which arguably had one of the first modern IDEs three decades ago.Smalltalk programs were stored in a database. There were no .sm source files on disk anywhere. It had a refactoring browser, which was a tool that allowed you to restructure the code. This came straight out of the first research papers on refactoring. It came with a class hierarchy browser as well. It was pretty amazing for the time. Smalltalk was very clever about exposing its internals to developers. Refactoring, introspection, reflection etc. were all things that it supported out of the box. And meta programming, which is something that came out of the Lisp community, was also supported of course.Visual Age was basically built around IBM's tooling for Java, which included an incremental compiler they built. A lot of the people involved with that had a Smalltalk background (it actually also was a Smalltalk IDE). It stored code in a database. Later, they created Eclipse which dropped the database but kept a lot of the compiler internals that allowed it to be way faster at refactoring and working with Java code than Intellij ever was. I use that every day and I still miss the two orders of magnitude difference in speed that Eclipse used to have. You must think I'm exaggerating. I'm not. Eclipse would be able to compile and be ready to run your code in between key presses. We're talking milliseconds here. In Intellij it's never less than 4-5 seconds and usually a lot more. Even on an M1. I have one. It helps. Faster is better. But it's still slow. It's just architected wrong to be that fast. It would need to internalize the compiler for it to be that fast and it just never did that. It relies on external build tools running via forked processes. It mitigates with a lot of (flaky) caching. That's why it has a top level menu option labeled "Invalidate Caches". Because cache coherence is a hard problem and they have plenty of bugs related to that.So, it's an old idea and a very good idea. The ultimate version of this idea would be intentional programming (<a href="https://en.wikipedia.org/wiki/Intentional_programming" rel="nofollow">https://en.wikipedia.org/wiki/Intentional_programming</a>) where the core idea is that programming is manipulating abstract syntax trees and that text is a mere serialization of that syntax tree. With intentional programming, you use tools (including editors) that do things with that syntax tree. Sadly, Simonyi, the person who came up with this never really got any traction with the company he founded out of Microsoft for this.However, modern compiler design is finally starting to acknowledge that there is value in being IDE friendly. A compiler necessarily has to do a lot of the same things an IDE does but with very different requirements. Simply running a compiler from and IDE is slow and leads to a lot of repeated work. A much better approach is a compiler that exposes its internals to the IDE directly and runs incrementally. Like IBM did with Java in the late nineties. The Rust community has also been working on this lately and it's a topic that you see discussed in the context of other compilers. And with VS Code gaining popularity, a lot of languages are now supported through language servers that expose (some) basic refactoring functionality and other things. Mostly this is limited by the underlying tools; i.e. compilers and interpreters. It's nowhere near what Intellij does but it's better than nothing. And it's creating some stimulus for compiler makers to do better.Jetbrains took their sweet time figuring that out (given they make IDEs and a major language) but their upcoming version of the Kotlin compiler frontend takes some steps in that direction as well. Java is slow in intellij. Kotlin is slower. Because the compiler is doing a lot of work and it can easily take something like half a minute to compile our code base; which isn't that large. I love Kotlin but I don't like how much time it sucks out of my day waiting for stuff to happen in intellij. And it needs to re-compile things quite often. The flipside is that it's a very good IDE that provides a lot of time saving functionality. I have a love / hate relationship with it. Love what it does, hate how slow it is doing it.

评论 #33076907 未加载

评论 #33077193 未加载

评论 #33080384 未加载

评论 #33078017 未加载

评论 #33081694 未加载

fithisuxover 2 years ago

I similarly thought codebase as a database, but the author makes a serious attempt. I'm pretty sure that Lisps fit the model.

评论 #33076978 未加载

samsquireover 2 years ago

This is very good workOne of my ideas is "lazy invariant maintenance". It should be possible to layer the tip of execution (what should go on next) with lazy incremental materialized invariants. Let the computer decide what is the most efficient approach to solving the computational problem. In other words everything is a query and queries are layered ontop of eachother in arbitrary directions. I think storage or source of truth should be decided by the software. It might be more efficient to arrange data a certain way to sustain multiple types of queries.<a href="https://github.com/samsquire/ideas4#20-lazy-arrange-or-invariant-maintenance" rel="nofollow">https://github.com/samsquire/ideas4#20-lazy-arrange-or-invar...</a>

wokwokwokover 2 years ago

I mean…Fundamentally they've done the right thing for the wrong reason.Yes, code as database instead of code as text files is an old idea that has been tried many times and never really works.…buuuut, if you look at the places where “not as text” coding works, like blueprint in unreal, you’ll see the successful projects that do this make a new language that takes non text input to the compiler / runtime.So, building your own language (which is what they did here) to run in your database-ide is actually going to work. There are examples of other things in this category that have worked before.Thus, the work probably seems quite promising when viewed naively.…but, I think any conclusions about datalog or general applications of the approach to existing text based programming languages are probably misguided.

okennedyover 2 years ago

The idea of code as data is something that's been churning around in the heads of my colleagues and me for a while now. Compiler optimization in particular is something that can really benefit from this view: The typical fixpoint-style approach of writing optimizers usually requires a ton of fine-tuning in the rules (e.g., manually triggering one rule after a second one fires), and you're still limited to how big a query you can run. On the other hand, start thinking of the optimizer like an IVM system... not could we automate a lot of the hand-tuning, but we can start thinking of scalable cross-module optimizations.

abbeyjover 2 years ago

How does this compare/relate to things like <a href="https://en.wikipedia.org/wiki/Source_Code_in_Database" rel="nofollow">https://en.wikipedia.org/wiki/Source_Code_in_Database</a> ?

评论 #33076705 未加载

euroderfover 2 years ago

Is there a challenge similar to one posed by technical documentation ? In that, the current working methods use files and directory hierarchies, but what if we made a code pool (like a content pool, a pool of topics and other types of content entities) that is easily browsed.

openfutureover 2 years ago

I gotta say. Lately it feels like the internet is always trying to bait me into explaining something properly. My game has been a bit about not doing that because these ideas are old and others have already explored and explained the various tradeoffs involved in the choices I've been making.However I'll do a bit more now than drop the datalisp.is link :PWhat is metaprogramming? Do we metaprogram when we create feature branches in our version control system? Do we metaprogram when we give input to running programs? Do we metaprogram when we use type systems to prove things about our program? Do we metaprogram when we build machine learning models? Do we metaprogram when we package software? For me: "yes" to all of the above.Isn't "metaprogramming" such a wide-reaching term as to be essentially meaningless? I believe that some words like "language" or "game" can be stretched to cover everything.. so rather than words they are more like perspectives. Metaprogramming is such a thing, it is a perspective, and it happens to be the perspective that I'd like to be able to employ more effectively.Originally I was deep in math world looking to learn what the future of programming would look like, but I got increasingly frustrated with all the amazing tech they'd developed (the mathematicians) because it wasn't manifested in the cyberworld.... So I figured: I want to build distributed systems. In order to build distributed systems I need a concrete representation for data so that the peers can communicate.Thanks to hn and lobsters I was aware of many possible representations but for various reasons I believe that canonical S-expressions are sufficient.. For the sake of argument let's assume this is the case; then suddenly there is a concrete place to manifest all the ideas.. okay so what ideas are most urgent? In my mind it is to explore the space of coordination-free programs and this space is spanned by datalog-ish expressivity. From there this name is born: datalisp... I sincerely believe that this is the easiest thing to agree on as a somewhat eternal foundation for further developments in software engineering, otherwise I would still be looking for that foundation.However there are too many people who are not ready to move on from our current way of keeping score in society and it is hindering our ability to respond to reality. Afaict the highest priority is climate change (at the moment), everything else is noise. We should be working to bring people useful tools for coordinating a response to that crisis but instead I'm shunned, broke, homeless, etc. All because I refuse to be a part of the problem (and I refuse to be a "hero" so I don't do anything well enough for others to admire, I've been trying to leave space for some ambitious academic type or business type but I'm reaching the end of my patience).What is the goal? why do we work towards it? For me the answer is clear: we need economics that are less vulnerable to the tragedy of the commons. The reason is we need a way to systematically curb pollution.Now my approach is debatable but no one seems interested in even having the debate. This is perplexing.