TechEcho

10 comments

stlavaalmost 7 years ago

The post is good but just scratches the surface on running Kinesis Streams / Lambda at scale. Here are a few additional things I found while running Kinesis as a data ingestion pipeline:- Only write logs out that matter. Searching logs in cloudwatch is already a major PITA. Half the time I just scan the logs manually because search never returns. Also, the fewer println statements you have the quicker your function will be.- Lambda is cheap, reporting function metrics to cloudwatch from a lambda is not. Be very careful about using this. - Having metrics from within your lambda is very helpful. We keep track of spout lag (delta of when event got to kineis and when it was read by the lambda), source lag (delta of when the event was emitted and when it was read by the lambda), number of events processed (were any dropped due to validation errors?).- Avoid using the kinesis auto scaler tool. In theory it's a great idea but in practice we found that scaling a stream with 60+ shards causes issues with api limits. (maybe this is fixed now...)- Have plenty of disk space on whatever is emitting logs. You don't want to run into the scenario where you can't push logs to kinesis (eg throttling) and they start filling up your disks.- Keep in mind that you have to balance our emitters, lambda, and your downstream targets. You don't want too few / too many shards. You don't want to have 100 lambda instances hitting a service with 10 events each invocation.- Lambda deployment tools are still young but find one that works for you. All of them have tradeoffs in how they are configured and how they deploy.There are some good tidbits in the Q&A section from my re:Invent talk [1]. Also, for anyone wanting to use lambda but not wanting to re-invent checkout Bender [2]. Note I'm the author.[1] <a href="https://www.youtube.com/watch?v=AaRawf9vcZ4" rel="nofollow">https://www.youtube.com/watch?v=AaRawf9vcZ4</a> [2] <a href="https://github.com/Nextdoor/bender" rel="nofollow">https://github.com/Nextdoor/bender</a>edit: formatting

评论 #17526119 未加载

评论 #17526997 未加载

dbllxralmost 7 years ago

> For us, increasing the memory for a Lambda from 128 megabytes to 2.5 gigabytes gave us a huge boost.> The number of Lambda invocations shot up almost 40x.One thing I've learned from talking to AWS support is that increasing memory also gets you more vCPUs per container.-----Serverless is great in scaling and handling bursts, but you may find it VERY difficult in terms of testing and debugging.A while back I started using an open source tool called localstack[1] to mirror some AWS services locally. Despite some small discrepancies in certain APIs (which are totally expected), it's made testing a lot easier for me. Something worth looking into if testing serverless code is causing you headaches.[1]<a href="https://github.com/localstack/localstack" rel="nofollow">https://github.com/localstack/localstack</a>

评论 #17539153 未加载

cagenutalmost 7 years ago

The article doesn't appear to get into the end-destination of this data, it just says "to AWS".My initial thought is: restore last nights backup to another mysql instance on aws and then let it catchup on the binlog?But I guess the unstated assumption is that their goal is to also transform to some other datastore.

评论 #17523794 未加载

评论 #17524845 未加载

评论 #17523743 未加载

otterleyalmost 7 years ago

I'm curious about the economics of this design. Real-time stream consumption implies that the consumer is always running, and if you need to run software 24x7, running it on EC2 instances is likely to be far cheaper than running Lambda functions continuously.

评论 #17524690 未加载

评论 #17524821 未加载

评论 #17524535 未加载

djhworldalmost 7 years ago

We used to run an architecture similar to this a few years ago, I work for a broadcaster and unfortunately it failed badly during a big event.The Kinesis stream was adequately scaled, but the poller between Kinesis -> Lambda just couldn't cope. This was discovered after lots of support calls with AWS.It might be better these days I don't know, we moved to using Apache Flink + Apache Beam, which has a lot more features and allows us to do stuff like grouping by a window, aggregation etc.

sixdimensionalalmost 7 years ago

I'm not being negative about this, it's a cool setup.But just to be clear, the pattern (not marketing terms) is doing change data capture (essentially the database transaction log) to a message queue, with message/job processors that can take any action, including writing the messages to other databases.Kinesis is SQLStream underneath, which is probably why the lifetime of messages is limited - it's not originally intended to be Kafka or a durable message queue.EDIT: Note above, when SQLStream first came out it didn't seem intended as a long term store. That was like really early on when I saw it at Strata. It looks like they made the storage engine pluggable and Kafka is an option too, so my statement above is likely incorrect.Lambda is being used as a distributed message/job processor, much like any worker process processing a queue would be scaled up.

评论 #17524116 未加载

评论 #17524883 未加载

dmlittlealmost 7 years ago

> For us, increasing the memory for a Lambda from 128 megabytes to 2.5 gigabytes gave us a huge boost.I thought the maximum memory limit for Lamda was 3008 MB and that you couldn't bump this limit through a service request.Anyone knows if you can request to bump the memory limit or the uncompressed deployment package limit (250 MB)?

评论 #17542548 未加载

mpdalmost 7 years ago

With so much overlap in the functionality and use cases of Kafka and Kinesis, it's not clear why they increase their surface area by using both.Is Kinesis' write latency better than it was? IIRC it wrote to 3 data centers synchronously, which led to some pretty bad performance. This was almost 2 years ago though.

评论 #17524491 未加载

评论 #17524917 未加载

sheeshkebabalmost 7 years ago

Copying a billion records screams for streaming? do you mean like a couple of hundred gigs that could be copied overnight (and maybe transformed with some script)?

评论 #17539729 未加载

abledonalmost 7 years ago

Anyone else reading this post with the trivago repetitive jingle on loop in the back of their mind? Din dan din dan dan —

10 comments

stlavaalmost 7 years ago

评论 #17526119 未加载

评论 #17526997 未加载

dbllxralmost 7 years ago

评论 #17539153 未加载

cagenutalmost 7 years ago

评论 #17523794 未加载

评论 #17524845 未加载

评论 #17523743 未加载

otterleyalmost 7 years ago

评论 #17524690 未加载

评论 #17524821 未加载

评论 #17524535 未加载

djhworldalmost 7 years ago

sixdimensionalalmost 7 years ago

评论 #17524116 未加载

评论 #17524883 未加载

dmlittlealmost 7 years ago

评论 #17542548 未加载

mpdalmost 7 years ago

评论 #17524491 未加载

评论 #17524917 未加载

sheeshkebabalmost 7 years ago

Copying a billion records screams for streaming? do you mean like a couple of hundred gigs that could be copied overnight (and maybe transformed with some script)?

评论 #17539729 未加载

abledonalmost 7 years ago

Anyone else reading this post with the trivago repetitive jingle on loop in the back of their mind? Din dan din dan dan —

AWS Kinesis with Lambdas: Lessons Learned

10 comments

AWS Kinesis with Lambdas: Lessons Learned

10 comments