Ways to shoot yourself in the foot with Redis

179 点作者 philbo将近 2 年前

12 条评论

stlava将近 2 年前

My team manages a handful of clusters at work and I wrote on an internal redis client proxy (it's on my todo list to opensource). A few things I tell other teams to set them up for success (we use Elasticache):- Connection pooling / pipelining and circuit breaking is a must at scale. The clients are a lot better than they used to be but it's important developers understand the behavior of the client library they are using. Someone suggested using Envoy as sidecar proxy, I personally wouldn't after our experience with it with redis but it's an easy option. - Avoid changing the cluster topology if the CPU load is over 40%. This is primarily in case of unplanned failures during a change. - If something goes wrong shed load application side as quick as possible because Redis won't recover if it's being hammered. You'll need to either have feature flags of be able to scale down your application. - Having replicas won't protect you from data loss so don't treat it as a source of truth. Also, don't rely on consistency in clustered mode. - Remember Redis is single threaded so an 8xl isn't going to be super useful with all those unused cores.Things we have alarms on by default: - Engine utilization - Anomalies in replication lag - Network throughput (relative to throughput of the underlying EC2 instance) - Bytes used for cache - Swap usage (this is the oh shit alarm)

评论 #36927878 未加载

kgeist将近 2 年前

Another one: don't use distributed locks using Redis (Redlock) as if they were just another mutex.Someone on the team decided to use Redlock to guard a section of code which accessed a third-party API. The code was racy when accessed from several concurrently running app instances, so access to it had to be serialized. A property of distributed locking is that it has timeouts (based on Redis' TTL if I remember correctly) - other instances will assume the lock is released after N seconds, to make sure an app instance which died does not leave the lock in the acquired state forever. So one day responses from the third party API started taking more time than Redlock's timeout. Other app instances were assuming the lock was released and basically started accessing the API simultaneously without any synchronization. Data corruption ensued.

评论 #36924134 未加载

评论 #36924218 未加载

评论 #36925220 未加载

评论 #36923437 未加载

评论 #36924011 未加载

评论 #36924126 未加载

koolba将近 2 年前

> I wrote a basic session cache using GET, which fell back to a database query and SET to populate the cache in the event of a miss. Crucially, it held onto the Redis connection for the duration of that fallback condition and allowed errors from SET to fail the entire operation. Increased traffic, combined with a slow query in Postgres, caused this arrangement to effectively DOS our Redis connection pool for minutes at a time.This has nothing to do with the redis server. This is bad application code monopolizing a single connection waiting for an unrelated operation. A stateless request / response to interact with redis for the individual operations does not hold any such locks.

评论 #36923886 未加载

评论 #36924980 未加载

badrabbit将近 2 年前

Don't expose your redis to the internet (please!). Don't whitelist large swathes of your cloud/hosting provider's subnets either. Of course redis isn't special, mongo, elastic, docker, k8s,etc... even if it is a testing server and you will never put important data on it.

评论 #36923230 未加载

resonious将近 2 年前

> One common mistake is serialising objects to JSON strings before storing them in Redis. This works for reading and writing objects as atomic units but is inefficient for reading or updating individual properties within an objectI would love to see some numbers on this. My intuition says there are probably some workloads where JSON strings are better and some where one key per property is better.

评论 #36932235 未加载

scrame将近 2 年前

I had a jr dev connect and typed 'flushall' because he thought it would refresh the dataset to disk.thankfully it was on a staging env, I think he's at google now.

评论 #36926749 未加载

评论 #36924411 未加载

评论 #36924378 未加载

评论 #36927922 未加载

评论 #36928325 未加载

评论 #36927377 未加载

评论 #36926252 未加载

ljm将近 2 年前

I've found that your mileage will vary when using Redis in clustered mode because the even if there is an official Redis driver in your language of choice that supports it, this might not be exposed by any libraries that depend on it. In those cases you'll just be connecting to a single specific instance in the cluster but will mistakenly believe that isn't the case.I've noticed this particularly with Ruby where the official gem has cluster and sentinel support, but many other gems that depend on Redis expose their own abstraction for configuring it and it isn't compatible with the official package.Of course, I think that running Redis in clustered mode is actually just another way to shoot yourself in the foot, especially if a standalone instance isn't causing you any trouble, as you can easily run into problems with resharding or poorly distributing the keyspace. Maybe just try out Sentinal for HA and failover support if you want some resilience.

评论 #36923542 未加载

jontonsoup将近 2 年前

Has anyone seen max (p100) client latencies of 300 to 400ms but totally normal p99? We see this across almost all our redis clusters on elasticache and have no idea why. CPU usage is tiny. Slowlog shows nothing.

评论 #36924232 未加载

评论 #36947648 未加载

评论 #36926793 未加载

评论 #36924184 未加载

yawaramin将近 2 年前

Redis is great, it's a great piece of software. One way I shot myself in the foot (kinda) with it: used it with a 1:N query fanout pattern. I.e., issued N queries to Redis for 1 incoming query to my service. My service by design needs to do N queries (it's a long story). But Redis is not really designed to be used like this and I was putting it under very high load. I swapped it out with an SQLite cache recently and got rid of the errors that would pop up from putting extreme stress on the Redis server.

berkle4455将近 2 年前

> Crucially, it held onto the Redis connection for the duration of that fallback condition and allowed errors from SET to fail the entire operation.What? Was this inside a MULTI (transaction) or something? This isn't a flaw of Redis being single-threaded. Honestly all of these "footguns" sound like amateur programmer mistakes and have zero to do with Redis.

评论 #36925194 未加载

adventured将近 2 年前

I have been using Redis for a long time and one of the things I love about it, is how difficult it is to shoot yourself in the foot with it. From the first use, after briefly reading some basic tips on what not to do, it was ridiculously simple to just get to work with it. I've never once run into a security or performance issue with it.

welder将近 2 年前

Change the default `stop-writes-on-bgsave-error` to "no" or you're asking for trouble... a ticking time bomb.

评论 #36923386 未加载

评论 #36928806 未加载

评论 #36923178 未加载