TechEcho

15 comments

nrmitchiover 5 years ago

If I'm reading this right, then this approach takes away any real safety in terms of deployment. There would be no easy rollback mechanism, and no real assurances that the new code version will actually run.I understand that the main goal here seemed to be avoiding time spent in ECS rollouts, but this solution seems to be sacrificing many of the guarantees that the rollout process is designed to provide.The root problem is explicitly called out (slow ECS deployments), and is tied to rate limiting of the ECS `start-task` API call. The post mentions the hard cap on the number of tasks per call, but I'm curious if the actual _rate limit_ could have been increased on the AWS side. Ie, 400 calls would still be needed, but they could be pushed through much faster.

评论 #20822536 未加载

benologistover 5 years ago

Whenever I see these posts I feel like Heroku narrowly missed out on shaping the rest of the cloud just by staying proprietary and expensive.

评论 #20824473 未加载

评论 #20822392 未加载

marcinzmover 5 years ago

My read seems to be: don't use ECS at large scale or you'll need some really convoluted hacks.

评论 #20822465 未加载

评论 #20821913 未加载

评论 #20822268 未加载

评论 #20822087 未加载

testuser5559191over 5 years ago

Slightly off topic:Does Plaid still operate via screen scraping? I'm a little perplexed as to why banks don't have easy to use APIs, especially given recent regulation. It seems against their best interests to allow a third party to screen scrape and provide a service which the banks themselves could easily reproduce.What am I missing? Is a bank with an easy to use API not a sound business decision from the bank's perspective?I know Monzo (challenger bank in UK) has/had an API, though I haven't heard of anyone using it.

评论 #20824326 未加载

sailfastover 5 years ago

Thanks for sharing these lessons!I don't use ECS at the moment but this is a well laid out post on how to avoid some performance issues that could have a huge impact.EDIT: Downvoted for expressing appreciation for someone taking the time to note lessons learned?.. OK.

fcolasover 5 years ago

- How did you guys scale that much w/o a bootloader before?That's what I don't get. All the design patterns are those of Unix. You boot the kernel with a ... bootloader. Then you've the kernel with all the system's params (call it ECS). Then each process is a child of the root process. And when you get by whatever mean the news that your app's source code has changed, you pull that code and start running it, while still having the old one live. Once the fork of the new code returns a proper response code, you kill the old one and set the new app live, otherwise you stay live with the old version.

swiftcoderover 5 years ago

> Engineers would spend at least 30 minutes building, deploying, and monitoring their changes through multiple staging and production environments, which consumed a lot of valuable engineering timeMan, startups have no idea how good they have it. It took a solid week to deploy a change at AWS.

maerF0x0over 5 years ago

> The rate at which we can start tasks restricts the parallelism of our deploy. Despite us setting the MaximumPercent parameter to 200%, the ECS start-task API call has a hard limit of 10 tasks per call, and it is rate-limited. We need to call it 400 times to place all our containers in production.From reading other comments it makes me wonder if you (Plaid) tried batching the tasks into N containers? Like if a task 50 containers, then you'd reduce the task call rate limiting by 50x...

评论 #20823042 未加载

crb002over 5 years ago

Google "checkpoint restart". HPC community has had these tools for years, many in userspace. Can't wait to see a Java or C# shop doing the same hot boots.

评论 #20829186 未加载

bsaulover 5 years ago

Side question : what’s the current best practice for ensuring that a server ( node or anything) isn’t currently processing some important information before you shut it down ?Is it a mix of waiting for request handlers to terminate upon receiving a sigterm then end the current process (and timeouting after a while) ? Does kubernetes handles those kind of things (waiting for a given process to stop before trashing the vm) or is there another layer or tool to do so ?

评论 #20823047 未加载

评论 #20822704 未加载

评论 #20822564 未加载

评论 #20822684 未加载

cagataygurturkover 5 years ago

Going to EKS would take less time than exploring hacks.

评论 #20822128 未加载

evantahlerover 5 years ago

Pretty cool! Actionhero uses the ‘require cache’ trick in development mode to hot-reload your changes as you go. It’s risky in that even though you’ve change the required file, you may not have recreated all you objects again. For that reason Actinhero doesn’t allow this is NodeEnv is anything besides development.

evantahlerover 5 years ago

Cool! I’m curious if this is something that nodemon/pm2 could do as task runners. You could call “npm update” and then hup your process...This is sort of how Capistrano handled deployments, changing a symlink to all project deps and then signaling the process to reload

shay_kerover 5 years ago

After all these years, how is deploying solely on AWS still worse than Heroku & Render?

mylampisawesomeover 5 years ago

Just FYI, you're "We're Hiring!" link is broken.

15 comments

nrmitchiover 5 years ago

评论 #20822536 未加载

benologistover 5 years ago

Whenever I see these posts I feel like Heroku narrowly missed out on shaping the rest of the cloud just by staying proprietary and expensive.

评论 #20824473 未加载

评论 #20822392 未加载

marcinzmover 5 years ago

My read seems to be: don't use ECS at large scale or you'll need some really convoluted hacks.

评论 #20822465 未加载

评论 #20821913 未加载

评论 #20822268 未加载

评论 #20822087 未加载

testuser5559191over 5 years ago

评论 #20824326 未加载

sailfastover 5 years ago

fcolasover 5 years ago

swiftcoderover 5 years ago

maerF0x0over 5 years ago

评论 #20823042 未加载

crb002over 5 years ago

Google "checkpoint restart". HPC community has had these tools for years, many in userspace. Can't wait to see a Java or C# shop doing the same hot boots.

评论 #20829186 未加载

bsaulover 5 years ago

评论 #20823047 未加载

评论 #20822704 未加载

评论 #20822564 未加载

评论 #20822684 未加载

cagataygurturkover 5 years ago

Going to EKS would take less time than exploring hacks.

评论 #20822128 未加载

evantahlerover 5 years ago

shay_kerover 5 years ago

After all these years, how is deploying solely on AWS still worse than Heroku & Render?

mylampisawesomeover 5 years ago

Just FYI, you're "We're Hiring!" link is broken.

How we reduced deployment times by 95%

15 comments

How we reduced deployment times by 95%

15 comments