Alas, there's not enough context or code here to even really know what's going on (or maybe I just don't know Django well enough to guess at what isn't said). It sounds like the author is comparing storing and retrieving a record with a JSON field in a PostgreSQL table versus storing and retrieving a raw string (the raw JSON) in Redis?<p>So it's no surprise that the Redis solution is faster. Besides parsing and indexing the data, PostgreSQL will also prioritize reliability and won't confirm your transaction until it's actually persisted to disk. Whereas Redis is prioritizing speed, and working straight to RAM--the disk serialization is an asynchronous process. If anything, I'm surprised that the Redis solution is _only_ 16-20x faster than PostgreSQL given what appears to be happening.<p>So, for this use case--storing a cache of Amazon API results for mostly-static data--the Redis solution is a no-brainer: durability of the data is not a serious concern; there's no real risk of data corruption because there's only one schema and no interrelations; you don't need the data store to know anything about the data; you don't need indexing beyond straight key-value. For other use cases, Redis might be entirely unsuitable because of durability or data integrity reasons, in which case the speed benefits wouldn't even matter.<p>Now, all that said, in the real world there are lots of other considerations: Are you using either of these databases already for some other purpose in the same app? Which data store is easier to set up? Which is easier to code against? Do you even care how fast it is if the slower speed is 11ms on a single-user system?
Its faster, but then, thats the point of it.<p>Its like saying that memcache is faster than postgres.<p>they do different things for different purposes.<p>I still wouldn't trust redis for anything other than ephemeral storage. Most of the use cases where we use redis assume that the data will at some point go away. The places where we use postgres assume that data is permanent.<p>Infact, we use both together, and it works really well, best of both worlds.
First off:<p>1. if you are claiming to compare PostgreSQL to Redis, do NOT test with django or SQLAlchemy or whatever, use the raw DBAPI driver, most likely psycopg2, but you might get much faster results with a more optimized driver like asyncpg.<p>2. Use the Python profiler to generate profiles and to actually see how much time is spent waiting on the network to get the value.
Something is very wrong for the median read of what should be a moderate sized blob with an indexed lookup to be 8ms.<p>The article doesn’t give enough details to really understand what is that is happening here, but I would recommend everyone to take this 14/16x report and not expect that to be the general case.
By default Django closes PostgreSQL connections after handling each web request. I’m not certain but what we’re probably seeing here in latency is mostly the substantial overhead in setting up a PG connection. Redis will still be faster but I’d guess the gap would be significantly tighter with persistent connections enabled.
A few problems:<p>- A comparison of Redis and Postgres for in-memory storage without use of TEMP or UNLOGGED[0] tables?<p>- SQL for table creation and queries not shown<p>- Using an ORM on one side and a more direct client on the other<p>While some data is better than none, the quality of this analysis is in question.<p>[0]: <a href="https://www.postgresql.org/docs/current/sql-createtable.html" rel="nofollow">https://www.postgresql.org/docs/current/sql-createtable.html</a>
No index, not the same size of datastore, queries in a loop the same entry, no cache config on the db side, only one hydrates an ORM... Don't get confused, it's not because data comes from a database that it comes from the disk when you query it.<p>I'm pretty sure we can go to a x3 max for the db.
I use PostgreSQL regularly and pull (and even search) JSON blobs. We have something like 150 million JSON blobs (dataset is 500Gb) that are searched almost every query, response time is maybe 150-200ms... That's on an AWS t3.medium RDS instance.<p>I don't really know what the hell is going on with this, but something isn't correct.
Bad benchmark with not enough information to draw any relevant conclusion.<p>My bet: Connection pooling is not enabled for PostgreSQL (Django default value for CONN_MAX_AGE is 0). A new connection is established for each request, which probably explained most of the overhead. Moreover, the connections are are probably TLS secured by default, which increases the handshake time a lot.
The problem of redis isn't speed of single object storage and retrieval.<p>We know for a fact that redis stores shit fast. And a blob of json is perfect.<p>The problem is multi-fold. Now you have a schema-on-read. You may spend a lot more processing time querying and accessing the data rather than getting exactly what you want from a sql query. Concurrency and locking. Constraints dbs offer.<p>So it is important to evaluate on more than JUST read/write speed.<p>I worked on a team that used nosql for their db needs. The performance was so terribly slow, not because of read/write speeds, but because of everything I listed above. They had to basically manually write a query optimizer vs sql. They quickly swapped to postgres and had somewhere along the lines of 10-100x performance (depending on what parts) improvements with only a week of work.
Everyone should read <a href="https://redis.io/topics/persistence" rel="nofollow">https://redis.io/topics/persistence</a> where it explains postgresql vs redis depending on fsyncness.
My team built a heavy duty load testing simulator in preparation for a upcoming launch. We kept dialing the simulated clients up until we finally saw timeouts talking to Redis (we used RedisLabs). The cause? We had saturated the 1 gig Ethernet link our AWS C4 instances had. Upgrading to the next tier of NICs solved the problem and Redis kept up the pace.<p>That said, Redis is hugely expensive for any meaningfully sized data set. Thankfully our use case allowed us to keep our data size bounded over time, or else I would’ve traded latency for drastically reduced cost (eg use Postgres).
Hey, Original blog post author here.<p>What a lot of people don't understand is that you want it both. The resilience of Postgres and the speed of fetching from Redis.<p>You have data stored. You want to retrieve it as painlessly (shorter time, less CPU usage, less complexity, etc.) as possible. What are the options? How much faster is Redis? How much RAM is that going to eat up? Different tools have different properties. Run some tests to answer some of those questions. Don't just guess. Sure Redis is going to save some time, but how much? Do you know by heart?
So reading from RAM in a single-threaded key/value store with a lightweight text protocol, is faster than reading from disk in a multi-threaded relational database with SQL parsing and ACID semantics. As measured by a heavy web framework running in a slow interpreted language that overshadows the speed differential so much that it's only 16x at the end.<p>This article is obvious in its conclusion, inaccurate in measurement, and useless for any discussion.
I saw something similar recently when reading JSON data from postgres via Rails, which was unbelievably slow. I found the reason to be that Rails would automatically parse the JSON value into a in-memory ruby object. Since this was for a API-like backend, the solution was simple: just read the column as a string ("SELECT payload::varchar..")<p>I don't know if dkango is doing the same, but using an ORM might just do that to you.
Seems a really bad study, since it doesn't mention the exact methodology (caching, exact queries, size of the JSON data, etc.), nor does it attempt to figure out why PostgreSQL is taking 8 ms to do something that should take much less and why reads and writes take the same time, or why Redis is taking 0.5 ms (i.e. millions of CPU cycles) for something that should essentially just do a memcpy.
These sort of benchmarks are both useful and useless, depending on what you want. The architectures of the two are very different and so speed depends very much on what you are trying to do.<p>Access patterns in Redis are very simple and it is a well optimized single threaded event loop. It is very fast but cannot scale up to multiple cores etc. So in this case, yes, storing and retrieving data will be fast, but uses will be narrower <i>and</i> if you need a high degree of concurrency, Redis will pose some issues for you.<p>At least in my experience, Redis's key problem is that while it is well optimized it is still a single threaded event loop and therefore with complex use comes a lot of headache. For simple things however, it is really hard to beat.
Curiosity strikes: I have to wonder how much these benchmarks would change on an actual Linux kernel? I don't know if the PostgreSQL team is necessarily tuning their project for macOS/Darwin. I'm sure Redis would still be faster (because memory vs disk), but I wonder by how much.
For a couple similar use-cases (write rarely, read many times, JSON data) I ended up using actual JSON files, on disk. Time to read and parse them is very small compared to asking Redis or PG for it, and this benefits from the Linux disk cache, which makes reading from those most-unchanged files almost the same as reading it from memory.
FWIW there is an update to this article at - <a href="https://www.peterbe.com/plog/update-to-speed-comparison-for-redis-vs-postgresql-storing-blobs-of-json" rel="nofollow">https://www.peterbe.com/plog/update-to-speed-comparison-for-...</a><p>TLDR; this update is not about redis vs postgresql but about caching!
Is this really that surprising?<p>Postgres is a RDMS, with transactional support and everything. While Redis is powerful, it is much simpler and focused, not nearly as comprehensive as Postgres.<p>TL;DR Postgres is doing a lot more heavy-lifting than Redis, it feels slow because that much of work is not necessary for simple kv lookup
Quote:"My current implementation uses PostgreSQL (via the Django ORM) to store this stuff."<p>At ORM the credibility of the author was all lost on me. I mean you went from cloud to local cache as a performance improvement and then you chose to do it via ORM? Why not make your own custom solution if you went this road anyway? It's like changing your alarms in your house from using the ones provided by a security company with making your own because you're an electronics engineer and then you build them using electronic kits from commerce and complain that using a different kit is better. It's your house dude, make them all custom for maximum efficiency if you went this way.