Databases have failed the web

88 点作者 ash大约 8 年前

30 条评论

agentultra大约 8 年前

I serve many, many millions of rows, collect real-time statistics, push real-time updates, and maintain data integrity and consistency... all from a single database server. If our workloads require it we're prepared to scale out horizontally. I'm really looking forward to PostgreSQL 10's new parallel query features for some of our analytics work.Stack Overflow runs everything across what, 4 MS SQL Servers in total?How has that "failed the web?"

评论 #14051180 未加载

评论 #14051329 未加载

评论 #14051739 未加载

评论 #14051142 未加载

pbnjay大约 8 年前

I really disagree with this - separation of concerns is incredibly important. Why does the database server need to do everything my application server does now?I don't want to have to know about how my database works internally just to implement a new feature in my application. I don't want to worry about a junior dev corrupting data while building a login page.My "simple old-school" database is reliable and consistent BECAUSE it's been tested for years. My application server is not because it's solving a new problem (And that's OK, I won't lose customer data if someone can't login).

评论 #14051170 未加载

评论 #14051236 未加载

评论 #14051203 未加载

评论 #14053781 未加载

thehardsphere大约 8 年前

The conceit of this article seems to be that all web applications are merely CRUD apps that do nothing but talk to the database, thus databses are "failing the web" for... not being more than just a database.I see a number of problems with this:1. Not every web app is merely CRUD. Some of us work on apps that do quite a bit more than CRUD, and the CRUD part is relatively small and boring.2. There's very little respect here for the idea that good utilities should do one thing and do that thing very well. Lots of people need relational databases. Not everybody needs a relational database that can automagically filter out XSS attacks and serve HTTP responses.

评论 #14052413 未加载

masklinn大约 8 年前

> Access control on modern databases is too course. You want to whitelist which queries a user is allowed to make and you want fine-grained permissions around updatesOnly allow direct access to stored procs, not queries. Or restrict access to specific views and use rules (<a href="https://www.postgresql.org/docs/current/static/sql-createrule.html" rel="nofollow">https://www.postgresql.org/docs/current/static/sql-createrul...</a>) but intuitively that seems more dangerous (with CTE, I believe SQL is turing-complete) and completely unnecessary.> Databases only talk custom binary TCP protocols, not HTTP. Not REST. Not websockets. So you need something to translate between how the server works and how the browser works.<a href="https://postgrest.com/" rel="nofollow">https://postgrest.com/</a>> You want to write complex logic for user actions<a href="https://www.postgresql.org/docs/current/static/plpgsql-structure.html" rel="nofollow">https://www.postgresql.org/docs/current/static/plpgsql-struc...</a>> with custom on-save triggers<a href="https://www.postgresql.org/docs/current/static/plpgsql-trigger.html" rel="nofollow">https://www.postgresql.org/docs/current/static/plpgsql-trigg...</a>> and data validation logic.<a href="https://www.postgresql.org/docs/current/static/ddl-constraints.html" rel="nofollow">https://www.postgresql.org/docs/current/static/ddl-constrain...</a>

评论 #14052391 未加载

shanemhansen大约 8 年前

I think databases have been so wildly successful because they were actually based on some reasonable research about data access models and set theory. I encourage anyone who wants to criticize the relational model to start with E.F. Codd's "A relational model of data for large shared data banks".

sbuttgereit大约 8 年前

The PDP-11 were considered minicomputers, not mainframes, like the IBM System/360. The notion of midrange computing still exists with IBM's iSeries (AS/400), as well mainframes like IBM's zSeries. One could argue that some of these definitions get mushy, but I haven't ever heard disagreements about where the PDPs, AS/400s, and System/360s of the world live in this hierarchy.This seeming error was an early red flag in this article; having some good facts but having an incomplete picture. While some of the historical observations of what computing was is warranted, this same sort of not having a complete enough picture to draw correct conclusions shows up again in some of the main theses presented.Consider the statement about the coarseness of permissions. In many modern database systems this simply isn't true. What is true is that is the fully set of security features offered database systems are 1) not widely understood; 2) not bothered with by developers that choose to implement security elsewhere.Clearly the author spent some time with article and it is well structured, it's simply a matter of the author not having spent as much time getting to the heart of the matter as he/she should have.

评论 #14055179 未加载

jandrewrogers大约 8 年前

In the mid-1990s, some of the most complex web applications were implemented inside the database! I was the technical lead for a production web app written in... 250,000 lines of PL/SQL. This model had some significant advantages and worked surprisingly well considering that Oracle was not designed with that (ab)use case in mind.So why didn't this model become common given that it was relatively elegant and capable? A few reasons:- It required developers to be sophisticated at both using the database and creating web front-ends, since they were inextricably mixed. Even today, most developers are strictly one or the other, not both.- The tooling inside the database was not designed for this use case, so while the architectural model was elegant, the development environment was piggybacking on functionality designed for reporting systems to drive interactive websites. This got better with time but by then no one cared. For a minimal website, hacking together a couple Perl scripts had a lower learning curve but was less capable.- At the time, only a couple databases had the level of sophistication and features to make this feasible. Like Oracle. The upfront licensing costs were outrageously high, so there was no cheap way to bootstrap or incrementally grow into your application.A 2017 version of this would work very well, database engines have much more sophisticated capabilities than back then that would make the development and operations experience pretty efficient and nice. As a practical matter no one designs web apps this way any more so there is no market for it.

评论 #14052477 未加载

Spooky23大约 8 年前

I think the author failed to educate himself.Databases are a miracle product. If you think of an application as a car, the database is the engine.The idea that you have a platform that can do everything without the abstraction of a separate data storage/query platform, that exists too. I'd argue that FileMaker, Lisp, MUMPS, and a few others basically do this in different ways. I used to be a DBA at a company where the entire company ran on Informix 4GL code (which was sort of like the Informix version of PL/SQL) within the database. Also a similar approach.But... they also have significant drawbacks. You're permamently married to that app/database stack. If any component of the system doesn't scale... you're fucked.By chunking out the solutions to include databases, app tiers, etc, you gain complexity but lose a lot of risk. If you cannot afford Oracle anymore, you can invest in labor to move to Postgres. If you're hitting a limitation with MySQL, you can move to Oracle. If you wrote your app in PHP, it goes viral, and you cannot scale it, you can migrate to a Java Application Server layer.

评论 #14052065 未加载

评论 #14051703 未加载

ericHosick大约 8 年前

> All because we're programming against a frozen database spec.Relational databases, unlike XML, JSON, Key/Value stores and ORMs, do not pre-suppose document structure. On top of that, it is very easy to create new relations (entities) using Views. On top of that, you get a real algebra to play with: relational algebra.SQL makes it crazy easy, in real time, to see your data in any hierarchal manner you like (via denormalized entities).The one thing SQL 'lacks', and JSON shines at, is a way to return data in a hierarchical format (aka: to return JSON directly). I have 'lacks' in quotes because there are SQL solutions that can consume/spit out JSON.An interesting idea then is to provide a way to easily convert between SQL and JSON. To that end, there is an open source project <a href="https://github.com/erichosick/sql-json" rel="nofollow">https://github.com/erichosick/sql-json</a> that attempts this. The results are promising but there is a lot of room to grow.

评论 #14051911 未加载

cookiecaper大约 8 年前

I don't really understand his point. He says that SQL has been around for 40+ years, as if that's a bad thing, and doesn't really say anything else.I guess he is complaining about the idea that the user's input doesn't get insert directly into the database, i.e., the connection would be browser->db, and instead we need an application component that reads from the browser and shapes to the DB.At this point, we must ask if the author has ever worked with one of the thousands of applications that implements their "API" through stored functions. I won't condemn this method wholesale, but people have mostly moved on to more flexible representations for good reason.The whole article seems predicated on a belief that most work should be on the consumer workstation, which is why he starts by talking about how much better it is that people now have desktop computers instead of clients that connect to a mainframe in the basement.The author apparently doesn't grasp that the web acts just like those old terminal clients he refers to in beginning of the article. There's a big server running the application in someone's datacenter, and your browser is a thin client over the top of it, an interface into its inner workings.He blames this on databases. (???) The browser doesn't provide the mechanisms to directly connect to arbitrary protocols.I don't really think there's much substance here.

ElatedOwl大约 8 年前

I'm admittedly bias, I love SQL.That said, I think having the database server worry about being a database is a good thing. In my career I've had a few projects that I've been too ambitious with; by trying to do too many things it failed to do any of them well.>Databases only talk custom binary TCP protocols, not HTTP. Not REST.Let's pretend that the database can now talk via HTTP/Rest. Is the database now responsible for handling business rules? Is it responsible for per row authorization/authentication? How does this impact performance? What if we want to export the data in another format, say into an excel spreadsheet, should it be responsible for that as well and the formatting? Where is the line drawn?>protect against SQL injection attacks.I mean, how would the database know the difference between a legitimate request that should be allowed and one that shouldn't? This is the point of parameterization.>Check for XSSI think it's plenty legitimate for a database to return some HTML data, how would the database know when it's malicious or not?-------------------In full I think the grievances the author raises are with middleware, not a problem of the database.

cr0sh大约 8 年前

I haven't read all the comments here yet, but I'm going to throw this out anyway...My confidence in the author took a hit at the point he called a PDP-11 a "mainframe". One would think a computer scientist would know what computers fall into what "generation". I guess "history of computation" is just not taught as part of such a degree anymore...?But...I decided to read on, thinking maybe things would get better, and to give him the benefit of doubt. I think somewhere in there was maybe a few points to think about, but ultimately it almost looks like he has some weird problems with "separation of concerns", and maybe doesn't understand why that would be a bad thing for scaling......which again, I find odd for a computer scientist.Furthermore, he seems to ignore the great amount of improvements and changes which have occurred in the database software/engine and server world; today's DBs and DB systems are -nothing- like they were back in the early 90s when I started my career (as a fresh high-school graduate). Yeah, we still used VT-100 terminals (then quickly transitioned to PCs - running VT-100 terminal emulators, of course), and things were starting to transition to PC apps communicating to the servers - and I am sure there were SQL injection issues (and no, we didn't think about that) - but things have, over the decades (yeesh - getting old here!) have changed for the better!Could they be better? Certainly! Are there things the DB server could be doing to make life easier for the app? Yes (and some of this has been implemented - ie, when developed properly, your queries can be "sanitized" at the DB server level - but you know, you should still do this at the app and browser level too - just in case). Realtime updates and notifications? That's pretty much there as well - but ultimately, a lot still has to be done at other levels....and that's not a bad thing, imho.

PaulHoule大约 8 年前

It isn't that "everybody emulates the VT-100 for some reason" but that the VT-100 was the first terminal to support the ANSI standard for control codes and that that standard has been evolving ever since.

评论 #14051752 未加载

评论 #14051155 未加载

obstinate大约 8 年前

To be honest, this article seems pretty insubstantial. To the best of my ability to discern, the only concrete complaint is that access control is too coarse on modern databases, although it's not really specified in what way this is true.

评论 #14051149 未加载

kuschku大约 8 年前

This is an interesting article.And he gets a lot of things right – having a way to do queries against a database directly, being able to get changes directly is useful.And GraphQL, for example, offers exactly that – there’s even a plugin to directly build a GraphQL API from your postgresql database. Although this still can’t send changes over the net, so you can’t have an always up-to-date view of the database.But while this and Firebase¹ solve the problem of offering an API directly for a database, he’s missing to address the other task frontend servers do: They can render stuff on the server, in case you actually do have just a dumb terminal.And that’s something that’s very useful for websites, as usually servers are more powerful than smartphones, and you need to do your computations somewhere. A web service applying deep dream to an image can’t run it in the phone, nor in the database – it has to run it on a specialized serverBut it’s indeed a good question why there’s little academic research into changing the way web applications work. We’re already making databases directly open via GraphQL APIs, we’re separating statically hosted content into CDNs, so how can we combine this, and work truly "serverless"²?[1]: Firebase is a great tool for prototyping your app, but if you want to run it it’s usually too expensive, and relying on proprietary Backend-as-a-Service technology has proven to be a bad idea already when parse.io shut down.[2]: Serverless here meaning that you have no specialized application server – you have a general database able to handle all your applications, a general CDN, and all special code for the app is handled within the database, or triggers of it.

评论 #14051086 未加载

snuxoll大约 8 年前

The only part of this I agree with is his comment on database permissions.Every modern SQL database has a concept of users and permissions that are divorced from your application, you're left with three options all of which are flawed.1. Handle security inside your application. This is the worst choice if users need to get a LIST of records they have access to and it's determined by something more than a simple WHERE owner_id = :user_id. Think multi-tenant applications where records can belong to a tenant, and users have access based on a org hierarchy or other criteria. Suddenly you're having to filter a whole list of records out in your application code, and this makes implementing pagination awful (do you requery until you fill up a page, or present a partial page?). You are also taking full responsibility for security, if you modify your queries to filter records out you open up to human error where someone forgets to filter this one query.2. You implement a method to synchronize your application users with the database, and use the database engines RLS support to handle access control. This is probably the best approach for web applications, but the caveats that come from it still suck. You have to make sure the connection is set as the user performing the action, this is doable with PostgreSQL, MSSQL and Oracle at the least (SET ROLE / EXECUTE AS) without destroying your ability to use a single connection pool - but for all the security you get out of this your web application user still has all the keys and if that account is compromised or there is a flaw that can cause your application to not switch security contexts you just failed at protecting your data.3. Just use the database directly, it's handles authentication and you never change security contexts for a session - you can safely utilize the RLS functions of your database without any real risk since that database session is fixed to the user it is handling. Downside, you just lost connection pooling and while pooling middleware like pgPool can help you still have Y more connections since you need to maintain a separate one for each user or tenant at best.Approach 2 is by far the best we have, and you can make it safer by doing things like utilizing pg_hba.conf to limit access to the application user to the servers hosting your applications - but maybe you're using docker and IP addresses aren't fixed anymore (well, shit!). Also, how are you going to ensure the database connection is in the correct state when you make a query? Where are you going to plug that into your request pipeline?I'd really like to see modern tooling around this problem, I don't know what exactly it would look like but it would be nice to have SOME improvement in the area.

评论 #14052436 未加载

mybrid大约 8 年前

Follow the money. The notion that databases have failed the web was put to Mike Stonebraker of Ingres fame at a seminar I attended back in the late 1990s. The actual question was this: "Why aren't database vendors building object oriented databases for the web?"Object oriented databases store binary objects as opposed to object-relational databases Mike pioneered with Postgres. The plan was for CORBA, common object request broker architecture, to shuffle versioned objects around the web and object oriented databases would store binary code objects.Mike's answer? "Because the money is in transaction processing. Banks pay millions of dollars for transaciton systems. Transactions are the meat-and-potatoes of database sales."In some sense database companies are wise for ignoring the web. Web companies are now hell bent on "move fast and break things." A transaction database is the antithesis of move fast and break. Transactions move deliberately with consistency. Durability says I can unplug my database at any time and bring it back up. This is the opposite mind set of move fast and break things.Here in the valley whole QA departments are being disbanded in favor of LEAN and move fast and break things. All of this is antithetical to the database world of transaction processing.At they say: you get what you pay for. What the enterprise companies are paying for is exactly what is being delivered. Open Source fits the every changing, move fast and break things of the web world.To whit, you might see database vendors move into the web space some day if in fact companies are willing to pay millions of dollars for them.

评论 #14051993 未加载

carsongross大约 8 年前

Look, kids. You are going to have to have an execution layer you can trust somewhere, you can't just expose your data store directly to the outside world or you run into an infinite number of security issues[1]. (Not that this is stopping people from doing exactly this these days.)That somewhere is going to be a server side execution environment, and it will be separate from your data store so you can scale these concerns independently. Additionally, you are probably going to want a DSL for your data access as well as a highly tuned indexing system for your data store.We have systems that do this, they are called databases. Are they perfect? No. Nothing is. But the idea that they have "failed the web" is so over-the-top childish that, like the author says at the end of his post, I'm not inclined to be charitable.[1] - <a href="http://intercoolerjs.org/2016/02/17/api-churn-vs-security.html" rel="nofollow">http://intercoolerjs.org/2016/02/17/api-churn-vs-security.ht...</a>

评论 #14052743 未加载

mamcx大约 8 年前

The traditional RDBMS have failed because are all-or-nothing.A lot of people here (and elsewhere) think is non-sense to build a full app with the full business logic inside the "db".WHY?????That is a VERY NARROW viewpoint.But when we say "let's build a full-app inside a Virtual Machine, yeah that actually is ok!"And why is ok to build a full app on lisp? Or in a OO language (a grah of objects)? Or a array language (a array is relation with 1 column!)If you think that:print([1, 2, 3])Id OK. then YOU MUST ACCEPT THAT:print([Code = 1, 2, 3; Name= Miami, New York, Bogota])Is ALSO OK.The relational model is just move from 1-columns arrays to 2 N-Columns (In rows of columns as internal storage) plus some universal operations.WHERE THE RDBMS FAILED EVERYONE IS:Because them (the guys at the DB side) insist in adding: transactions, triggers, Surrogate-Keys, Inter-Relation dependencies, storage, sub-query languages, catalogs, views, etc.So at the end, you get a full half/big semi-OS virtual machine tailored to a specific niche.------Was only when the artificial divide between the RDBMS and the front-end language appear (and the death by MS of Fox/Vb to only focus in .NET) that building database apps start to suck big time.I have talk about in HN before about this, and instead consider that make even MORE sense to build the logic inside the DB, however, is necessary to re-think how it look to make it more useful. Is not a novel concept. The dBase family was almost that, and the people like me that use it was very happy and productive.Why make more sense? Because Program = Data + Algo.Data is not to be treated as third-class citizen. Must be a first class. The relational model make it first class (as with lisp model and array).And what about separation of concerns and all that? That is pure architecture and is tangential to be or not inside a DB, the same is tangential inside a VM.

lstroud大约 8 年前

I would argue that databases tried to do too much for too long. If you want a database that has a built in web server, rest apis, security, etc...well Oracle has had that for years. Problem was, that wasn't what people wanted (or wanted to pay for).The more recent trends have been to decompose the database into something that is great at storing and retrieving data...leaving all of the other stuff to products that do that well.

matheweis大约 8 年前

It sounds like the author is unhappy with the fact that web apps have a middle layer that speaks with the database over a terminal emulation.Ignoring the fact that it happens to make a very nice boundary for an abstraction layer...> Databases only talk custom binary TCP protocols, not HTTP. Not RESTI take it the author hasn't heard of SCIM?

aug_aug大约 8 年前

"Access control on modern databases is too course." - I think you mean "coarse" here.

brlewis大约 8 年前

I agree with the article that modern middleware is too fat, and we could make more reliable systems by leaning on a full-featured database.I disagree with the three reasons he gives for why we have fat middleware today. PostgreSQL solves 1 and 3, and PostgREST solves 2.

Animats大约 8 年前

And, to fix it, he says, we need functional programming at the database level! What?Today's databases are probably the best part of the server side stack. The parts that talk HTTP, JSON, and do business logic are usually worse.

评论 #14052538 未加载

mnm1大约 8 年前

This rant basically asks: why don't databases do, out of the box, all the custom functionality we program into our web servers?In other words, why don't databases program the app for me?What a waste of bandwidth.

elchief大约 8 年前

How is a db with row and column security "coarse"? I mean, row+column security means you can secure down to individual cells.Sounds like the author doesn't know what he's talking about

dan31大约 8 年前

No plumbing is required in modern architectures such these of SAP HANA, Starcounter, or Tarantool. When the application server and database are combined, the access control is arbitrary, databases talk whatever you want, no need to torn the code apart into stored procedures, backend and frontend code. Plus to this, lots of unnecessary moving parts are removed in such architectures, so that the whole thing runs on 2 servers instead of 20, while the code is simpler than ever.The referred article needs clarification. What it really addresses are the flaws of conventional software architectures. Fortunately, this critique does not generalize.

z3t4大约 8 年前

So you want to implement a new feature, and test it locally, then run tests to make sure you didn't break anything, and when you are done, upload the changes to the production server. This is sane devops, nothing fancy, yet impossible with todays databases. You basically have to manually do the same changes you did in development to the production database, then test if it works on live data in production. Then your finger might slip when writing an SQL query and you have to reset all production data from backup.

评论 #14055256 未加载

aykutcan大约 8 年前

Interesting.Should humans interact each other over HTTP or REST too ?

fs111大约 8 年前

failed the web? They made it big in the first place!