Truth First, or Why You Should Mostly Implement Database-First Designs

205 pointsby MarkusWinandalmost 7 years ago

21 comments

ben509almost 7 years ago

This is a really important piece for developers to read.We get very used to the ability to blow away everything and start from scratch, and that's very often the right answer for build tools, but the database is fundamentally different from that, and you need to internalize this.A database, in most cases, is the business. It's essentially a collection of all the contracts made with customers, and your mode of thinking "I can blow this away and start over" that you've learned from all sorts of build tools is completely contrary to that.An infamous example of this is in Active Record migrations[1], and possibly some other migration tools.> Note that we define the change that we want to happen moving forward in time. Before this migration is run, there will be no table. After, the table will exist. Active Record knows how to reverse this migration as well: if we roll this migration back, it will remove the table.These are smart people, and indeed there are some rare cases where this is the right thing to do. But only our strongly ingrained belief that data is ephemeral would cause us to think this should be the normal behavior, that we'd want to blow away customer data to force the database to track what version X of the code thinks it should.[1] <a href="http://edgeguides.rubyonrails.org/active_record_migrations.html" rel="nofollow">http://edgeguides.rubyonrails.org/active_record_migrations.h...</a>

评论 #17251218 未加载

评论 #17249623 未加载

评论 #17253084 未加载

评论 #17250996 未加载

评论 #17249173 未加载

Diggseyalmost 7 years ago

No, you shouldn't use a code first, or a database first design.The article explains the problems with the code-first approach very well, but there are just as many problems with the database-first approach:- There is never just one database. There will be your production database, a staging database, one or more testing databases, local development databases, etc. This makes the database useless as a single source of truth.- When you write code against your development database, there's no guarantee that it will be the same as the production database, and you shouldn't be able to connect to your production database while developing locally to verify that. Usually you want to deploy identical build artefacts to staging and production, so there's a danger that your staging database gets out of sync with production (eg. after an issue was caught and improperly rolled back on staging) and then you deploy your fully tested code from staging to production and your entire system breaks down.- When you need to restore a backup of your database, you need to somehow find a compatible version of the code to deploy as well.My recommendation is to use your migration scripts as your source of truth. Each service should access separate schemas within the database, or at the very least separate tables. If you really need multiple services to share tables then those services should be versioned and deployed as one.The database should have a table which stores a list of the migrations which have been applied, and you should use a migration tool such that when you deploy, outstanding migrations are applied as required.If you want something easier to read than the history of migration scripts, then you should keep an up-to-date schema as well next to your code, and verify as part of CI that running all the migrations results in that exact schema.

评论 #17252946 未加载

评论 #17252229 未加载

nikisweetingalmost 7 years ago

From a startup perspective I strongly disagree, I think the UI should almost always be designed first. The HTML/CSS/JS frontend that the users see comes first, and have it hooked up to an empty backend that does nothing. Then once the UI requirements are fleshed out and the final specs nailed down, only then the database schema should be designed. The backend glue e.g. Django/Rails comes last.I find this approach has led to the fewest number of code rewrites and database migrations needed to get from initial mockup to finished product, because UI requirements can change much faster and are more fickle than DB schemas and backend code.

评论 #17253311 未加载

评论 #17252326 未加载

评论 #17258907 未加载

评论 #17252685 未加载

WorldMakeralmost 7 years ago

My two cents: always source control your databases.I don't care if you are code first or database first, but I do think database schema and migration scripts are code artifacts and need to be source controlled somewhere.You should be able to track who made a change and when and what for, and maintaining by hand a "commit list" in a comment at the top of a stored proc doesn't count.You shouldn't need a full backup of a Production database to spin up a new test server, and you shouldn't need to rifle through every developer's documents and desktop folders looking for random .sql script files to figure out if you are missing a migration/update/refactor script somewhere.(The argument that you rarely spin up a new Production server is I think tempered by how many developers and testing environments you need. Setup time for developer/testing instances indicates a part of how long onboarding new developers may take; and here I lean to the side of developer productivity and good backups.)Most code first methods give you source control "for free", which makes them easy to recommend to junior developers / small startup projects.Visual Studio has decent SQL Server Project support to help with the database-first crowd, as one option I've relied on heavily. Unfortunately, I've yet to see an open source, more database agnostic relative to it.

ams6110almost 7 years ago

First supervisor I had in my 2nd job out of school had a mantra: get the data model right and everything else falls into place.Has worked for me for 30+ years.

评论 #17248250 未加载

评论 #17249299 未加载

评论 #17250909 未加载

sixdimensionalalmost 7 years ago

One use case for when you may consider code-first - pure experimentation and fast iteration around the data model that supports your application.I have used both models, and did database-first extensively, for at least 15 years. However, having been working in .NET lately and having EF Code First migrations (painful, but read on), I did discover a sort of hybrid model that worked.Eventually, I believe you will always end up at database-first, even if you start code-first, unless your database is relatively small (by # of table/procs/views count) or limited in scope. But as applications grow, multiple databases become a reality, the database might grow truly huge(# of tables/procs/views), you might need customization to your database platform, etc. - limiting it by the application / code-first becomes impossible.Where code-first works well, for me, is limited scope OR, more importantly, when you need really fast iteration for prototyping and don't want to think about building the tables. I've found that use case, code-first can be really helpful to think in code, let the system generate the database, at least enough to get to an MVP or PoC type-level of work product. Then, once it's working enough, to really take it all the way, switching to a database-first approach to do all the customizations to a particular database platform's strengths, etc. can work - I've done it myself on several projects successfully.The tooling is almost there to do this... just start code first, and then once you have your system bootstrapped, generate the database first model and switch. But, you have to be careful with naming things so that names of things in your code match up (or can easily be updated) if you switch from code-first to database-first - something that not all code generators let you control completely (most template-driven ones do, though).

steve-chavezalmost 7 years ago

If you're using PostgreSQL, I invite you to try PostgREST <a href="https://postgrest.org/en/v5.0/intro.html" rel="nofollow">https://postgrest.org/en/v5.0/intro.html</a> which takes the database-first approach even further, with PostgREST you develop all your business logic in SQL, there's no code generation, no ORM, which no matter how elegant will always impose a performance and abstraction overhead(to do an aggregate you have to read the ORM docs, learn/understand the abstractions, add the code, all of this in addition to what you already know how to do in SQL).Check the API <a href="https://postgrest.org/en/v5.0/api.html" rel="nofollow">https://postgrest.org/en/v5.0/api.html</a> for more details on all the features that work out of the box, no ad-hoc code needed.

mnm1almost 7 years ago

I couldn't agree more. The code first design is what kept me away from Rails. I'm sure they have sensible approaches as well, but unfortunately I couldn't get over my prejudice that anyone who teaches the creation of the database through the app's code instead of actually designing the database is someone I don't want to follow or learn from and whose framework I don't want to use even if it provides alternatives (I hope it does). Database design in a CRUD app is by far the single most important task. Show me your code and I know nothing, show me your data structures and I understand the whole app never applied more.The consequences of not following this advice I see everyday at my current job where we are stuck with an ORM that needs at least 3-4 files modified for each entity change and that requires management using an external GUI tool. The ORM adds 10x overhead to already unoptimized SQL queries so optimizing the queries is never a problem because the system's overhead is so slow even if it was possible to make the queries run in zero time, the overhead would still kill us. 10x. Query is 10ms and overhead is 100ms. And the trade-off for this? Nothing but a shitty ORM that makes no sense (Doctrine 2/PHP) and slows down development a few times over as well. Changing even the name of a column requires editing multiple files. We have camelCase, snake_case, and kebab-case for all our entities and depending on where you are in the system, a different case is required. Using the ORM to generate entities after you're in production? Don't make me laugh. That's impossible. And this isn't even the worst ORM I've used (though it's close). Everything the author mentions and then a whole bunch of stuff that's a million times worse has resulted from this drive to abstract away the most important component, the database, and not have it be the single source of truth. Frankly, it's fucking stupid and beyond maddening, not to mention slow and due for a rewrite. Never again will I tolerate this kid of design that ORMs encourage or the use of ORMs at all, especially if they replace solid database design. I'll leave that kind of fuckery and stupidity to the rookies.

评论 #17250975 未加载

评论 #17255685 未加载

评论 #17251623 未加载

vinceguidryalmost 7 years ago

What I want is a doctrine that gives an ideal approach for data management, similar to how 12 factor gives an ideal approach for overall application architecture management. Data architecture, even though it's a subfield of application architecture, is important and vulnerable enough to need it's own doctrine.What I like about 12 factor is that you don't have to get it all done at once. The concepts start high level and you can dive into the details as they become necessary. Initially, you can get by with dotenv for configuration management, and slowly build out more tooling as you need it.But data management seems to need a new lexicon and set of organizing principles. How do I think about the varied aspects of database design in a way that I can be reasonably sure won't bite me in the ass before I can get around to scaling it? How do I pick a database, what are the relevant decision criteria? At what point do you want to introduce different kinds of databases, like say a graph database or ElasticSearch?

评论 #17253385 未加载

soulnothingalmost 7 years ago

Jooq has become my favorite ORM with how bare metal it is. I have it set up in gitlab pipelines to auto build my schema into an ORM. Just kick build on a sql change, and then inherit the new package.I saw an ancillary remark about prod connections, being bad. I generally agree. I usually build mine off of my staging environment. Assuming a staging database schema, was promoted to production. Then I have a sister build job, that kicks off my jooq build, trigger on the promotion task.The biggest thing is the professional license for several databases. But it's not that bad really.It also goes to one of my thoughts as well. The data model is crucial to the application. This is like the foundation for your home. I always try and hammer in talk about the data model first. Where did it come from, where next, and finally where do you want it to go.

评论 #17251670 未加载

srikualmost 7 years ago

Sometimes, I get the feeling that we're all in the story of the "blind ones and the matter of the elephant". Design UI first? Design data model first? Design code first? Write documentation first? Write tests first?How many things can we actually do "first"? All of them are important. Writing documentation tells you about what you might need implemented. So does designing UI, but that might still leave out what users need. Which means we probably don't know enough about the data model anyway. Or the code.So ... here is one more blind person's view - iterate as quickly as you can on everything.

TimJYoungalmost 7 years ago

I'm not sure that the statement "All generated DDL is only an approximation of that." is necessarily correct and is database engine/server-dependent.It's typically pretty easy to reverse-engineer the production database schema, including any vendor-specific extensions, into a SQL file that can be used to recreate an empty version of the same database schema. Furthermore, it's just as simple with most database engines/servers to use an automated tool to compare two schemas and generate a DDL script that will upgrade the old schema to the new schema. It helps immensely because such tools will avoid circular dependencies with foreign keys, etc. that may trip you up if you're trying to manually write such upgrades.Other than that, this article is spot-on. The database should be the source of all truth and yes, you need to become proficient in SQL to be a good general application developer.

exabrialalmost 7 years ago

I know in the HN comments they'll be a whole bunch of "you don't do design when you're trying to iterate fast". I wholly disagree. Design phases expose problems in your thinking that you'll be tripping over very shortly, except now you didn't waste the time implementing something that wouldn't work anyway. Skipping the design phase is "productivity theater", where it appears you're being productive because code is being written, but in reality, you're just being busy.

drblastalmost 7 years ago

The database really is an API, in that client code depends on it and its structure will be difficult to change once you go live, whether you want that to be true or not. Writing the API before code that depends on it seems like an inherently good idea to me.You can pretend like you don't need to figure these things out first and get away with it for a while, but at some point the lack of a solid foundation is going to bite you.I think all devs should answer the question, "What if they don't want to use my UI or interface, or they want to automate something?" That's a good scenario to support for any client, including yourself.

评论 #17258681 未加载

Fradowalmost 7 years ago

Perhaps I am too new to back-end development, but what strikes me in this article is the misconception that code-first is going to give you a bad DB schema, and that migrations will loose data.I think part of it could be blamed on the tools: from the code I see, the ORM doesn't seem well-built, if it needs that much code to generate a proper DB schema.For the record, I'm using Django (for about a year, I was developping apps before), and my workflow is:1) code, while thinking about what the DB schema will be as a result2) generate the migrations3) apply them and check if what's generated is good and usable when you need to directly query the database (mostly for dev purpose). If need be, either change your code, or even manually modify the migrations. Migrations are code too, they are there to be overriden when required.Sure, your rollbacks can loose data, but only if you don't pay attention to them and don't change them to avoid that. As a developer, it's your responsibility to think about that if it's of critical importance to your business.Feel free to enlighten me if something I say seems very wrong, after all I'm still quite junior at this, and have not experienced huge scales / large team.

skybrianalmost 7 years ago

The article seems to advocate generating code from a live connection to the production database.This seems very risky. Typically you don't want developers to even have access to the production database most of the time, let alone make it a hard dependency to do a build. Development work should be done using temporary databases.Also, database schema migrations need to be tested somewhere other than the prod database before going live. All the code changes needed (including to applications) should be written and tested before actually doing the migration.But this approach would be reasonable with a bit of indirection: one developer (DBA?) makes a database schema change, resulting in a change to a schema file that's checked in (after testing), and code gets generated from the schema file. That way developers all have access to a history of the prod database schema, without access to the database itself.(This is pretty close to how things are done using protocol buffers.)

评论 #17248807 未加载

评论 #17250481 未加载

geebeealmost 7 years ago

Fantastic.There's an earlier article from the same blog that addresses this at a deep level<a href="https://blog.jooq.org/2014/01/02/why-your-data-will-outlast-your-sexy-new-technology/" rel="nofollow">https://blog.jooq.org/2014/01/02/why-your-data-will-outlast-...</a>A couple of fundamental principles here: Systems tend to outlast their developers Data tends to outlast the systems operating on itThis is why I'd almost always go with database first design.As a general rule (and test), a well designed database is useful outside of the context of the application that accesses it for persistence.As an example from the kind of apps I've worked on: suppose you are writing a system for tracking and managing inventory, which will be used through a web interface. If you have direct access to the database, can you get useful information about orders, lead times, existing inventory levels, and so forth? If not, you don't have a useful database, and you are at risk. Migrating to a new system, which is probably inevitable down the road, will be very difficult as you no longer have control over your data.Now, apps are useful, and code is useful. I wouldn't want to try to code up various supply chain management algorithms in SQL directly. The estimated lead time can be quite a calculation, and yeah, you are probably going to want to write python or ruby or java code to do that. But in the ruby world (with rails in particular), I have noticed databases used almost as a kind of object serialization and persistence, where the information makes almost no sense and can't be queried in any meaningful manner through the DB. It is simply used to store and retrieve information that needs to be drawn into and reassembled through various bits of code and configuration files in a rails app before it takes a form that makes any sense.In short, the database has no meaning, almost, outside the rails app.Now, I'm not saying this to knock rails. I've used it, and I enjoyed the experience. Metaprogramming, in fact, makes it very easy to map an object to a database table without a ton of irritating configuration or extra code. You don't have to generate your schema through rails to map models to tables, you can create them independently of the app and use rails to do the mapping for you. Also, it's entirely possible to use rails generators to create a very sensible database design. The problem is probably that a lot of people using these generators never considered databases or SQL or schemas as something that had meaning outside the context of a rails app. My guess is that this anti-pattern happens in almost all frameworks that allow developers to quickly push objects with relationships off into a database schema that they never took the time to consider or design. It's almost no different than if they serialized the objects for persistence, and used a hash to find and retrieve them.I also think this emphasizes why clarity is so important in code and database design. Sometimes people talk about upgrading an existing code base, and say "well, it's already written in [java/ruby/python/...] , so it'll be easier if we stick with a [java/ruby/python/...] based framework". Nah. If the code lacks clarity, it'll be hell to port it from python to python. If it has clarity, it won't be hell to port it from python to ruby.If you can use your database to answer meaningful questions about your data outside the context of an app, your database is not only useful, it also probably has clarity of design and purpose. If you can easily read your code and know what it is intended to do and why, you have clarity.I'm amazed with how often developers are willing to sacrifice clarity to meet the demands of a testing framework, or a particular architecture, or to accommodate a cutting edge UI/javascript approach. These can be good things, but you know that shit ain't lasting, and then, without clarity, where will you be?Well, we all know, the developer who gained experience with the nifty new framework will be: at a new, better paid job.Beyond that... eh, I suppose a code generator from a database can be useful and save you some typing, but if it's all cluttered and clunky and lacks clarity I'd rather just do it myself. It's not really that hard.

ngrillyalmost 7 years ago

Really great post. Code generation is seen by many as old school, but this post shows it's still well alive. We are seeing a revival of code generation with tools like jooq, Protocol Buffers, static website generator, and some Go tools for example.

lucidguppyalmost 7 years ago

There are some who say that architectures that are able to postpone choice of front end and DB are flexible and well designed.You should design your app to make it desktop or web... and choose any DB you want.

nickpetersonalmost 7 years ago

My favorite version of this, is when a new version of something is greenlit by the business, and a bunch of enthusiastic devs look at the old system and decide to rewrite. Instead of starting with the existing data, and determining how it's going to fit in the new design, they just make a new design and figure, "we accounted for most business requirements, so the data should be able to fit...". Then they set a launch date with the business, and just need to do that data migration. What could go wrong?

crb002almost 7 years ago

This goes with any system. Get the data pipelines hot then drive in the business logic. Otherwise you are surprised with latency and have to do major refactors.