I am working on a project. I have broken down the requirements into the following workloads:<p>- Data collection: This is a batch service that receives large amount REST data (expecting 20k request over few hours) when triggered. The trigger will happen once or twice a day.<p>- Pricing algorithm: This service will take the above data as input then price them. I'm expecting 100 req/sec at initial stage.<p>- Transaction: When a customer buys a new product, a webhook is received from payment provider then the database is updated.<p>- Admin: A back office service that will update less often. Here admin can set application parameters and able to get insights of the customers.<p>My current plan for the infrastructure is the following:
- application server to host the above 4 NodeJS services using PM2
- routing using Nginx
- ELB to loadbalance the app servers
- Elasticache redis to queue the batch data
- RDS DB for all services (both read and write)
- GitActions for CI/CD
- Monitoring using DataDog<p>Will the above infrastructure will be able to handle my requirements? Can it be improved?<p>My concern is- if there's a way to reliably reconnect if the data sync fails on 1st service?<p>PS: I have a time constraint too. So my team can't be trained to learn K8s for now.<p>Thank you all in advance.
<i>Will the above infrastructure be able to handle my requirements? Can it be improved?</i><p>Not necessarily an improvement, but a different perspective on the computing section. I assume you run the services in containers. Instead of handling EC2 instances, you can run the workloads in ECS Services with the Fargate Spot capacity provider and autoscale based on the number of requests per target group or cpu load or time intervals. Github Actions can continuously deploy such services after you <i>terraformed</i> the infrastructure.