Which is better? Performing calculations in sql or in your application?

53 pointsby hartleybrodyover 11 years ago

20 comments

sitharusover 11 years ago

This is something I've been through before.The first version of an app I worked from was very SQL heavy. Almost every calculation was done in a stored proc and the app servers just formatted that.As the product got popular this became the bottleneck. It's far easier to get more app servers than DB servers.So we restructured it to do straight index reads and aggregations in the DB, but more complex calculations in the app itself.It all depends on the circumstances, but I'd still advocate pushing as much in to the DB as you can without making convoluted SQL - your average RDBMS has amazing optimisations for aggregation, sorting and filtering.

评论 #6346788 未加载

fiatmoneyover 11 years ago

From a straight performance perspective:The rule of thumb is that DBs are bound on IO, so any calculation that you can get with the same amount of IO as returning the records is likely to be "free", and any calculation that requires more IO is likely to be "expensive" (unless that allows you to avoid IO by restricting returned records, avoiding future queries, etc. - it gets complicated fast).So things like column-column calculations, simple aggregates, etc. are likely to be good ideas on the DB; for anything else It Depends.

评论 #6346608 未加载

评论 #6346800 未加载

sehropeover 11 years ago

The big advantages for doing things "on the DB" are centralizing business logic and eliminating serialization/transport/deserialization. Having a view or stored proc with business logic lets it be shared across different components of the same app or even across separate apps that share a database. An extreme (yet common) example of calls best done on the DB itself is a SUM of price X quantity or any other grouping operation. Rather than moving N rows to the client, the server can aggregate it all and just send back the result.If CPU usage on your DB really is your bottleneck (and seriously it's probably not) then you should look into federating logic out to your app. Otherwise the centralization of app logic alone is worth it.

评论 #6346317 未加载

评论 #6346799 未加载

danielweberover 11 years ago

I've written some pretty awesome SQL queries that give back exactly what the application wants with the application sometimes not even needing to do anything else afterwards.I've usually always regretted those "awesome" queries.I've known some very smart developers who've gotten lost in SQL queries, while show them the equivalent Ruby/Python/Javascript/C code that parses through the results and they can understand it less than a minute.

评论 #6346093 未加载

评论 #6346052 未加载

评论 #6346422 未加载

评论 #6346634 未加载

j45over 11 years ago

Premature optimization is a bigger enemy than performance issues.That being said, the DB server is what's optimized to do calculations.The tradeoff is you have a second stack to maintain and performance tune now beyond being a datastore. A positive is you can independently write and run tests.The question is, can you resist building the perfect empire on day 1? Move stored procs and functions into the DB as they are needed. Whatever you're working on (including who is working on it) isn't that important.

评论 #6346715 未加载

JulianMorrisonover 11 years ago

Move things to the DB if the calculation aggregates data (the input is much larger than the output), to avoid materializing data and shipping it around. (Map-reduce is a subtype of this.) Otherwise don't.

protomythover 11 years ago

If you are using numeric data (not floats) in the database, I would say do your calculations in the database. Different languages and tools have different arithmetic handling which can cause some very long and psychotic debugging sessions. A single point of calculation is a good thing.[edit] I should explain a bit. If you use are in a multi-language environment[1] and are doing financial or weight / volume calculations, be extremely careful if you decide to not do all the calculation on the database. Having results calculate differently in two different places will drive you mad. I have noticed some serious problems with number handling in different languages and some mistakes in calculation will get you sued.1) SQL counts as one of the languages

评论 #6346852 未加载

jkaover 11 years ago

If you find yourself returning very large numbers of rows to the client, you might be doing something wrong - SQL databases are at their most efficient when you select only the columns and rows you need.If you perform aggregation/calculations in the DB, you can potentially save on-the-wire data transfer time (and potentially CPU time on your clients.. though obviously that is shifting the CPU work to the database).Similarly if you find yourself making multiple trips to the database, and then using loops to combine different data sets, you're probably too far on the 'client-side' and should look at using some joins and combination logic on the DB side to get what you need in a single (and likely more efficient) round-trip.

georgemcbayover 11 years ago

The top answer on the page says as much but the answer to pretty much any "Which is a faster way to do this, ABC or XYZ?" is "try it both ways, measure the results (adjust for load if necessary), and see". Doesn't matter if you're talking different pure code algorithms to achieve the same result, stored procs vs app code, network caching or not, etc. If doing it the absolute fastest way matters to you, try multiple approaches and measure the results. Even in situations where there are accepted "best practices" there may be variables specific to your own project/tech stack that cause your results to be surprising!

sooraover 11 years ago

In practice, the answer is not always obvious.Benchmarking multiple queries / approaches is generally worthwhile if performance is important.

评论 #6345898 未加载

borlakover 11 years ago

The top answer in that thread is excellent.Basically, "it depends". having dealt with extremely DB-intensive applications, I have developed a personal motto of "be nice to the DB".Let the DB be a secure storage of your data, not a calculating part of your application. But like the top answer says, sometimes it is not practical to do a calculation within the application. In my case, we had a few database servers set aside just for reporting, so we could slam them with difficult queries and not worry about affecting data.

johnwatson11218over 11 years ago

I have seen sql queries that were 30 pages long if printed out. The developer showing me this was looking for some other examples that he said were up to 100 pages long. These queries had multiple levels of correlated subqueries and tons of decode statements ( oracle syntax). There looked to be tons of duplication just in what I saw.How are you going to write unit tests for that stuff? Refactor? etc. etc.The examples always start out simple like summing a bunch of rows that match a predicate but once you start doing that it is hard to rewrite that to use application code once it becomes too complex.Also most databases are 20+ year old technologies and often have weird systems in place for storing the code in the db or something else just as odd. No more grep, no more static code analysis.As far as I am concerned the db is a pile of facts or observations. I tell the db something and later it tells me what I told it. When I am thinking about what goes in the db I think about using the past perfect verb tense. On this day such and such happened. Thats it. Preferably that never changes, you might get new info in the future so just record that new info along with everything else.Ideally we should be getting to a point to where resources are so cheap that CRUD can become CR - no more updates or delete just new facts.

评论 #6346810 未加载

danielharanover 11 years ago

Crucially, "performance" isn't defined.The one that matters to me the most is developer time.Write a damn SQL query. If it's too slow or the DB becomes a bottleneck, then reconsider.

评论 #6346530 未加载

mzarate06over 11 years ago

I default to relying on SQL. I've found that helps ensure calculations have a single source of truth, which helps maintenance and reliable sharing throughout your app, or across multiple apps.SQL also offers powerful aggregate functions to assist. Much simpler to use something like AVG() or SUM() in a SQL query than having to worry about deriving the same calcs in application code.

mmaunderover 11 years ago

It comes down to performance and sometimes what your hardware constraints are e.g. your DB is a faster machine than your web server and you have no say in the matter.You may also benefit from precalculating stuff in the DB and storing it. I wrote this 6 years ago which illustrates the point:<a href="http://markmaunder.com/2007/07/20/how-to-create-a-zip-code-distance-lookup-table/" rel="nofollow">http://markmaunder.com/2007/07/20/how-to-create-a-zip-code-d...</a>

InclinedPlaneover 11 years ago

Performance wise? Maybe writing hand-tuned assembly code will be faster. Or using CUDA. Don't optimize performance where you don't have to. Are small calculations the long pole in your app that talks to a db? Almost certainly not. Pick a method that maximizes robustness, ease of understanding, servicing, openness to feature changes, and accountability. In my experience that typically means client side code, not queries or sprocs.

benjaminwoottonover 11 years ago

I would always lean on the database to perform simple sums and aggregations unless I was really concerned about turning the database into a performance bottleneck.If you need to add simple logic above and beyond this, stored procedures aren't sexy but they can be a good compromise that avoids shipping data around and re-implementing SQL in your application server.

adamconroyover 11 years ago

It is interesting that nobody has mentioned testability, as in unit tests. Some people (such as Thoughtworks) advocate removing all business logic from the db server into app code so that it can be unit tested.I buy that argument to some degree but in practice I am SQL junkie and always implement calculations in SQL.

评论 #6346842 未加载

mathattackover 11 years ago

Isn't there a size past which it's always good to do it on the DB? Or to use buzzwords, "At some points, shouldn't most Big Data calculations be done on the database?"

_random_over 11 years ago

The most important factor is whether the DBAs in your company are cooperative and how easy it is to do a SQL release to production.