Bullshit graph database performance benchmarks

378 点作者 maxdemarzi超过 2 年前

23 条评论

szarnyasg超过 2 年前

A plug: if you are looking for TPC-style application-level benchmarks for database systems, check out the LDBC Social Network Benchmark [1]. It has workloads for both OLTP and OLAP systems. We designed both of these to prevent many of the common benchmarking mistakes. To ensure that implementations follow the specification and their results are reproducible, we have a rigorous auditing process (similarly to TPC's benchmarks) [2].[1] <a href="https://ldbcouncil.org/docs/presentations/ldbc-snb-2022-11.pdf" rel="nofollow">https://ldbcouncil.org/docs/presentations/ldbc-snb-2022-11.p...</a>[2] <a href="https://ldbcouncil.org/benchmarks/snb/" rel="nofollow">https://ldbcouncil.org/benchmarks/snb/</a>

评论 #34366545 未加载

评论 #34365274 未加载

PreInternet01超过 2 年前

While this seems to be a pretty egregious example of a vendor benchmark misleading through cherry-picked unrealistic results, I'm not sure I share the author's pessimism about how these kinds of stunts will hold back the graph database market.Why? Simple: pretty much any benchmark I've seen of anything, ever, was similar nonsense -- give people numbers to game and they'll do so, enthusiastically. Even supposedly gold-standard benchmarks like the TechEmpower framework benchmarks quickly devolve into "application server handling HTTP requests by responding with predefined strings", which is as fast as it's utterly useless in most people's version of the real world.The only way to get usable benchmark data is to run your own workloads in your own environment: everything else is pretty much noise.

评论 #34365074 未加载

评论 #34371160 未加载

评论 #34366594 未加载

maxdemarzi超过 2 年前

Author here to clear up a few questions: I did not run any benchmarks for Memgraph, just Neo4j on my machine and compared them to their numbers on their machine. My 8 faster cores to their 12 slower cores, so not apples to apples, but close enough to make the point that Memgraph is not 120x times faster than Neo4j. I used to work at Neo4j, then at AWS for Neptune, I work on my own graph database <a href="http://ragedb.com/" rel="nofollow">http://ragedb.com/</a>, and work for another database company <a href="https://relational.ai/" rel="nofollow">https://relational.ai/</a>If you want to be my hero, find a way to fix this problem: <a href="https://maxdemarzi.com/2023/01/09/death-star-queries-in-graph-databases/" rel="nofollow">https://maxdemarzi.com/2023/01/09/death-star-queries-in-grap...</a>

评论 #34519670 未加载

oxfordmale超过 2 年前

Benchmarks are generally useless unless they test real world scenarios. The DataBricks data warehouse record costed $5,190,345 USD to run over a period of 3 years. If I spend that amount of money, I will get fired.Such benchmarks also ignore the engineering expertise an organisation has. Do you need to be an expert to fine tune 6000 parameters or can you tune the system to an acceptable standard by reading a few blogs.Some people pointed out the actual query only coated $242. My counter argument is that this appears to be based on buying reserved instances from AWS for 3 years. In real life this query would also run daily, or at least you would need several iterations to get the results you want.The costs also include a super low budget laptop ($279). It is more than fine for running the query, however, you wouldn't use it a development machine. This shows these results have been heavily massaged.

评论 #34366248 未加载

评论 #34366254 未加载

评论 #34366274 未加载

alexchantavy超过 2 年前

Thanks for digging and sharing, I enjoyed your snark.> They decided to provide the data not in a CSV file like a normal human being would, but instead in a giant cypher file performing individual transactions for each node and each relationship created. Not batches of transactions… but rather painful, individual, one at a time transactions one point 8 million times. So instead of the import taking 2 minutes, it takes hours.Yeahhh I noticed this too when I looked at the repo when their blog was posted a couple weeks back. Running a transaction for each object will of course be very slow and real production code will (hopefully) not do this.> Those are not “graphy” queries at all, why are they in a graph database benchmark? Ok, whatever.I’m definitely interested in seeing more realistic scenarios of actual “graphy” queries with batched transactions comparing the two. Oh, and comparing against Neptune would be cool too since that supposedly uses openCypher now (which I hear is kinda close to neo4j cypher?).

评论 #34365333 未加载

dwroberts超过 2 年前

The author works on RageDB (<a href="https://ragedb.com/" rel="nofollow">https://ragedb.com/</a>) and this doesn't seem to be disclosed in the article

评论 #34365466 未加载

评论 #34370986 未加载

评论 #34369774 未加载

评论 #34365553 未加载

评论 #34365521 未加载

taubek超过 2 年前

For more context:- blog post that sparked the discussion - <a href="https://memgraph.com/blog/memgraph-vs-neo4j-performance-benchmark-comparison" rel="nofollow">https://memgraph.com/blog/memgraph-vs-neo4j-performance-benc...</a>- earlier discussion about this Memgraph benchmark HackerNews - <a href="https://news.ycombinator.com/item?id=33813781" rel="nofollow">https://news.ycombinator.com/item?id=33813781</a>- the benchmark results - <a href="https://memgraph.com/benchgraph/" rel="nofollow">https://memgraph.com/benchgraph/</a>- benchmark repo and methodology - <a href="https://github.com/memgraph/memgraph/tree/master/tests/mgbench">https://github.com/memgraph/memgraph/tree/master/tests/mgben...</a>

评论 #34365035 未加载

mhio超过 2 年前

I get that this is trying to point out that neo4j shouldn't be that far behind, but why are the i7/gatling test numbers being directly compared to memgraphs g6 test results? The conclusion is a bit premature without the other half of the test... What performance does memgraph have on the newer, single socket hardware?

评论 #34365367 未加载

mastermedo超过 2 年前

I wouldn’t say the benchmarks put out by graph databases are bullshit. But there is a need for a standardisation of how they’re produced.The main problem is that when you’re comparing two products you’re bound to be comparing apples to oranges. Every product solves a slightly or majorly different challenge.So when you run n tests on two different products some tests are bound to perform better on one product and some on the other. Misleading marketing comes into the picture if you only publish the ones that went your way or just partial results.But that’s why if you believe in your own product and want benchmarks you hire a reputable third party to do them on their own accord.

评论 #34365470 未加载

LAC-Tech超过 2 年前

What are people using graph databases for, and what do your queries look like?I've read about them briefly but I have to admit my imagination fails me as to how it would look in the real world.

评论 #34365258 未加载

评论 #34365639 未加载

评论 #34365748 未加载

评论 #34365252 未加载

评论 #34370293 未加载

评论 #34370354 未加载

评论 #34365281 未加载

评论 #34373968 未加载

college_physics超过 2 年前

my feeling is that graph databases face an uphill battle for mass adoption not because their architects or vendors doing anything wrong but some intrinsic aspects of information exchange in most current situations and use cases* information tends to be private and/or commercially sensitive, this severs the links that graph dbs are good at representing (and made the "node focused" SQL approach the ubiquitous model that it currently is)* objects in typical schemas have many more attributes that relations. while you could model things RDF style where everything is a relation, it is not the most intuitive for people* when the previous constraint does not apply (e.g. data from a centralized social network), it is typically not too hard to emulate an adequate graph structure on a Pareto 20/80 basis using an RDBMSso graph dbs end up being optimal only for a niche of situations and probably not the impact that the people / investors involved in their development would be happy withon the other hand the ginie is out of the bottle and the decades-long SQL monoculture seems to be coming to an end. but maybe what results is a relational database+ type thingy [0] rather than two disconnected paradigms[0] <a href="https://postgresconf.org/conferences/2020/program/proposals/postgres-as-a-graph-database" rel="nofollow">https://postgresconf.org/conferences/2020/program/proposals/...</a>

评论 #34367580 未加载

评论 #34366993 未加载

alfiedotwtf超过 2 年前

On a tangent, what Graph Database would people recommend in 2023? In particular, I would like something that's linked in like SQLite rather than a full blown service like MySQL etc

评论 #34365168 未加载

评论 #34365428 未加载

评论 #34365280 未加载

评论 #34365544 未加载

评论 #34365801 未加载

评论 #34373499 未加载

lbriner超过 2 年前

There are lots of what I would call "grey" marketing/sales type articles like this across virtually every saas business, it's how they get people onto their site.Unfortuantely, an article that overstates benefits without any caveats is not illegal so it will carry on.Many of us would like disclaimer e.g. "I work for the company" but also a much more bounded discussion, "this performance test works for this particular scenario" and perhaps "Please note, your scenario might be very different" and especially "Please contact me if you think I have missed something out".I have worked for a business where you felt compelled to amplify the good and not talk about the bad but the world keeps spinning...

0xbadcafebee超过 2 年前

Completely useless tangent: the word "benchmark" comes from a mark that surveyors would make in rock so that they could place a leveling rod for surveying. Benchmarks are made relative to other benchmarks so that surveying can be done relative to the height of one known fundamental benchmark.It could be argued that it isn't really a benchmark unless you can accurately calculate the result based off of a common fundamental benchmark.

评论 #34368241 未加载

AtlasBarfed超过 2 年前

Bwahahaha, it's been against the terms of use of Oracle to benchmark forever.Lies, damn lies, and benchmarks people, take them all with a huge grain of salt.

chairmanwow1超过 2 年前

This is a great article. Usually authors have snark and no substance but this author was able to back up his takes with excellent notes.

manv1超过 2 年前

The real problem with these kinds of "benchmarks" is that either the company doesn't have anyone on staff that's calling "bullshit" on it or the marketing people don't care that it's bullshit.Either one is a bad sign if they're going to be a vendor. At that point how can you trust their SLAs and/or their presales team?

评论 #34377875 未加载

bjornsing超过 2 年前

> Why would they do this? Because it’s a bullshit benchmark and they don’t actually want anybody looking too deeply at it.Very unlikely I would say. Most likely ”the clowns” never considered how the license terms would impact use of benchmarks.“Never attribute to malice why can be sufficiently explained by incompetence” as the saying goes.

AtNightWeCode超过 2 年前

Neo4J is bad at aggregated queries but that is not what to use a graphdb for in the first place.

beastman82超过 2 年前

Which graph database is actually fast and doesn't use deceptive marketing?

zwaps超过 2 年前

Ah yes, memgraph.I argued with them here about another bullshit benchmark <a href="https://news.ycombinator.com/item?id=33717766" rel="nofollow">https://news.ycombinator.com/item?id=33717766</a>they did reply tho

jerf超过 2 年前

One of my favorite things is "the thing that sounds obvious when I say it, but you didn't think of it before". Here's one related to benchmarking: For A to be 120x better than B in a comparable task, that has to mean that B is leaving that much performance on the table in the first place.Now, let's combine this with one of the persistent tendencies of developers to take one specific benchmark as indicative of the overall performance, which is often preyed on by benchmarkers trying to sell things.Is it really plausible that neo4j takes 120x longer than it needs to on all operations? A dedicated graph database that has been tuned and optimized for that task for quite a while now?I'm not quite going to rate that a 0 probability, but it's definitely a very big claim. While the probability is not 0, it is comfortably below "someone's gaming the numbers" and "the benchmark is not as comparable as claimed". There's a faint chance the latter may match a production use case; for instance, certain comparisons of NoSQL DBs and SQL DBs are "not fair" in that they won't be doing remotely the same things for the queries and the performance landscape is very complicated, with one side winning handily for some tasks and the other side handily winning for others, but if your use case falls into one of those big wins you may not care about the "fairness". But it's still a pretty big chunk of probability mass that it's just plain not comparable; how many times have we seen a ludicrous benchmarking claim of relative superiority just for the losing side to pop up and say something to the effect of "Hey, did you consider adding the correct index to the data, oh look if you do that we win by a factor of 4."Tell me you're 1.2x or 1.5x faster or something, or that your clever compression means I can remove 1/3rd of my systems or something. Keep it in the range of plausible.While I'm sure this won't affect the marketing of this company any, ludicrously large claims of 10x+ speed improvements actually turn me off, not attract me. You'd better have some sort of super compelling reason why you somehow managed to be that fast over your competitor, like, "we're the first to successfully leverage GPUs" or something like that. Otherwise I'm going to guess "Actually, you have an O(log n log log n) algorithm over their O(log n log n) algorithm and you cranked the data set up to the ludicrous sizes it takes to get an arbitrarily large X factor improvement over your competition" or something like that.(Always gotta love people comparing two completely different O(...) algorithms against each other and declaring one is X times faster than the other. This is another major source of "10,000x faster!"... yeah, O(n log n) is "10,000 faster!" than O(n^2), sure. It's also 100,000 times faster, 10 times faster, and a billionkajillion times faster, all at the same time.)

ashvardanian超过 2 年前

I am really stunned by this story. It made me check the MemGraph benchmarks section. Don't get me wrong, it may be 10-100x faster than Neo4J in even the most basic operations. Moreover, given the quality of Neo4J, it is hard not to be that much quicker. Even Postgres and MySQL are better at storing graphs than Neo4J.---Disclosure: I have worked on Graph Algorithms, Graph Databases, and Database Engines for years, and we are now preparing a commercial solution based on UKV [1]. I don't know anyone at MemGraph or Neo4J. Never used the first. As for the second, I am not a fan.---Aside from licensing, there are 3 primary complaints. I will address them individually, and I am open to a discussion.A. Using Python for Benchmarks instead of Gatling. I don't entirely agree with this. Python still has the fastest-growing programming community while already being one of the 2 most popular languages. Gatling, however, never heard of it. Choosing between the two, I would pick Python. But neither works if you want to design a High-Performance benchmark for a fast system. Without automatic memory management and expensive runtimes, you can only implement those in C, C++, Rust, or another systems-programming language. We have faced that too many times that the benchmark itself works worse than the system it is trying to evaluate [2].B. Using hardware from 2010 [3], weird datasets [4]. This shocked me. When I looked at the charts [5] and the benchmarking section, it seemed highly professional and good-looking. I wouldn't expect less from a startup with $20M VC funding. But the devil is in the details. I would have never expected anyone benchmarking a new DBMS to use now 13-year-old CPUs and an unknown dataset. Assuming current developer salaries, hiring people to design a DBMS doesn't make sense if you will be evaluating on a $1000 machine is just financially irresponsible. We buy expensive servers, they cost like sports cars or even apartments in poorer countries. It is hard to maintain, but they are essential to quality work. It is sad to see companies taking such shortcuts. But to be a devil's advocate, there is no 1 graph benchmark or dataset that everyone agrees on. So I imagine people experimenting with multiple real datasets of different sizes or generating them systemically using one of the Random Generator algorithms. In UKV, we have used Twitter data to construct both document and graph collections. In the past, we have also used `ci-patent`, `bio-mouse-gene`, `human-Jung2015-M87102575`, and hundreds of other public datasets from the Network Repository and SNAP [6]. There are datasets of every shape and size, reaching around 1 Billion edges, in case someone is searching for data. For us the next step is the reconstruction of the Web from the 300 TB CommonCrawl dataset [7]. There is no such Graph benchmark in existence, but it is the biggest public dataset we could find.C. Running query different number of times for various engines. This can be justified, and it is how current benchmarks are done. You are tracking not just the mean execution time but also variability, so if at some point results converge, you abrupt before hitting the expected iterations number to save time.---LDBC [8] seems like a good contestant for a potential industry standard, but it needs to be completed. Its "Business Intelligence workload" and "Interactive workload" categories exclude any real "Graph Analytics". Running an All-Pairs-Shortest-Paths algorithm on a large external memory graph could have been a much more interesting integrated benchmark. Similarly, one can make large-scale community detection or personalized recommendations based on Graphs and evaluate the overall cost/performance. It, however, poses another big challenge. Almost all algorithm implementations for those problems are vertex-centric. They scale poorly with large sparse graphs that demand edge-centric algorithms, so a new implementation has to be written from scratch. We will try to allocate more resources towards that in 2023 and invite anyone curious to join.---[1] <a href="https://github.com/unum-cloud/ukv">https://github.com/unum-cloud/ukv</a> [2] <a href="https://unum.cloud/post/2022-03-22-ucsb" rel="nofollow">https://unum.cloud/post/2022-03-22-ucsb</a> [3] <a href="https://github.com/memgraph/memgraph/tree/master/tests/mgbench#intel---hp">https://github.com/memgraph/memgraph/tree/master/tests/mgben...</a> [4] <a href="https://github.com/memgraph/memgraph/tree/master/tests/mgbench#pokec">https://github.com/memgraph/memgraph/tree/master/tests/mgben...</a> [5] <a href="https://memgraph.com/benchgraph/base" rel="nofollow">https://memgraph.com/benchgraph/base</a> [6] <a href="https://snap.stanford.edu/data" rel="nofollow">https://snap.stanford.edu/data</a> [7] <a href="https://commoncrawl.org" rel="nofollow">https://commoncrawl.org</a> [8] <a href="https://ldbcouncil.org/benchmarks/snb" rel="nofollow">https://ldbcouncil.org/benchmarks/snb</a>

评论 #34366487 未加载

评论 #34366299 未加载