General guidance when working as a cloud engineer

227 pointsby lockedinspaceover 2 years ago

21 comments

nielsoleover 2 years ago

Another random selection:* When choosing internal names and identifiers (e.g. DNS) do not include org hierarchy of the team. Chances are the next reorg is coming faster than the lifetime of the identifier and renaming is often hard.* The industry leading tools will contain bugs. From Linux kernel to deploy tooling, there are bugs everywhere. Part of your job is to identify and work around them until upstream patches make it to you if ever.* Maintaining a patched fork is usually more expensive than setting up a workaround* Your hyperscaler cloud provider has plenty of scalability limitations. Some of which are not documented. If you want to do something out of the ordinary make sure to check with your account rep before wasting engineering time.* Bought SaaS will break production in the middle of the night. Your own team will have the best context and motivation to fix/workaround them. When choosing a vendor, include the visibility into their internal monitoring as a factor for disaster recovery (exported metrics and logs of their control plane for example)

评论 #34135947 未加载

评论 #34137145 未加载

throwawaaarrghover 2 years ago

Truth is an interesting concept. It's often subjective and has many forms. Within the context of the cloud, almost all cloud services are only mutable, so "truth" is whatever the current state of the cloud actually is. Whatever is in Git is merely idealism.Whatever you are maintaining, read the docs completely first. And I mean cover to cover. Not just the one chapter you need to get a PoC up and running. You will wish you had later, and it will come in handy many times over your career. Consider it an investment in your future.Read books on microservices before you implement them. Whatever two-line quip you read on a blog will not be as good as reading several whole books from experts.Docker multi-stage builds won't work in some circumstances. Build optimization eventually gets complex, the more you rely on builds to be "advanced".

评论 #34134444 未加载

评论 #34137044 未加载

评论 #34145498 未加载

评论 #34136072 未加载

katorover 2 years ago

Don't forget "pets vs cattle", thinking of servers as ephemeral and working towards quickly being able to scale up/down based on demand. So often I see people "lift and shift" from a dedicated server model into the cloud and never convert their pets into cattle. This reduces flexibility later, not to mention makes it harder to respond to patching needs, scaling, and moving to optimize latency or costs.

评论 #34133033 未加载

评论 #34135321 未加载

评论 #34132556 未加载

birdymcbirdover 2 years ago

> A good monitoring system, well-organized repository, fault-tolerance workloads and automation mechanisms are the basis of any architecture.Monitoring/alarming, and knowing what to monitor. Also, properly instrument your services or whatever it is you have. Take time to reflect on what are the signals that tell you operational health. An error metric alone is useless if you don’t know the denominator. Also be careful to avoid adding noisy metrics that cause panic for no reason.I’m not sure what fault tolerance means in this context. Very handwavy statement. I think if you have dependencies, have a plan and understanding of which ones tipping over will bring down your service or how you can build resiliency. For example, some feature on your page requires talking to a recommendations service. If the service goes down, can you call back to a generic list of hard coded recommendations or some static asset?As for automation: yeah, have test workflows built into your CI/CD harness. And avoid manual steps there requiring human intervention. Use canaries to test certain functions are up and running as expected, etc

评论 #34132696 未加载

TrackerFFover 2 years ago

"Learn to say: I do not know about this/that. You cannot know everything that gets presented to you. The bad habit comes when the same technological asset appears for a second time and you still do not know how it works or what it does."Absolutely. I've seen so many junior engineers / devs go on about it like this:Someone higher up: Could you please look at this problem? I need it fixed ASAP.Jr. Engineer, presented with a problem he's never seen before: No problem, I will look into it!Someone higher up (the next day): Did you fix the problem?Jr. Engineer: Sorry, I haven't still gotten around to look at it / I'm still working on it / etc.Someone higher up: We really need it fixed today, please prioritize it and give me a call when it is fixed.Jr. Engineer works on the problem all night, feeling stressed out, not wanting to let down his seniors.

WolfOliverover 2 years ago

"Microservices should only perform a single task." -> I guess this advice is the reason there are so widely misunderstood, see: <a href="https://linkedrecords.com/challenging-the-single-responsibility-principle-9800f39c186f" rel="nofollow">https://linkedrecords.com/challenging-the-single-responsibil...</a>

评论 #34132149 未加载

评论 #34133097 未加载

pondidumover 2 years ago

> Do not make production changes on FridaysI ~hate~ dislike this advice. If you can't deploy on a Friday, you need to fix your deployment strategy. By removing Friday from when you can deploy, you're wasting 1/5 of your available days.Note: deploy != Release[1]. Use flags, canaries etc.[1]: <a href="https://andydote.co.uk/2022/11/02/deploy-doesnt-mean-release/" rel="nofollow">https://andydote.co.uk/2022/11/02/deploy-doesnt-mean-release...</a>Edit: hate is far too stronger word for this

评论 #34132316 未加载

评论 #34132283 未加载

评论 #34132485 未加载

评论 #34132497 未加载

评论 #34132248 未加载

评论 #34134124 未加载

评论 #34132196 未加载

评论 #34132286 未加载

评论 #34132538 未加载

评论 #34132741 未加载

abledonover 2 years ago

> If you need to build an architecture which involves microservices, I am sure that your cloud provider has a solution that fits better than Kubernetes. E.g: ECS for AWS.Thank you! So many people running unnecessary things on Kubernetes

评论 #34135662 未加载

raydiatianover 2 years ago

> If you need to build an architecture which involves microservices, I am sure that your cloud provider has a solution that fits better than Kubernetes. E.g: ECS for AWS. Kubernetes is a fantastic toolkit, but only shines when all that it has to offer, gets used.As far as FAAS goes, I think more people need to go check out Cloud Run as a Knative implementation. Having used it for sometime now it feels like a near-perfect FAAS solution. The only gripe I have is that versioning is a bit dopey. But hey, if I can have autoscaling services with absolute impunity over how my HTTP interface is shaped (looking at you AWS lambda) and without needing to worry about Kubernetes headaches, I’m perfectly happy to embed version names in service domains.

评论 #34135757 未加载

elricover 2 years ago

> Certify yourself with official courses.Can anyone recommend some certifications that are worthwhile? I realize that this is a very broad ask, but the advise is also rather broad.

评论 #34136534 未加载

评论 #34132716 未加载

zikduruqeover 2 years ago

EVERYTHING costs money. Tag every resource. Come up with ways to show cost avoidance and cost savings. This is will be appreciated more by management than any code you can bang out.

rr808over 2 years ago

I love monitoring but after a few decades working I still haven't found a good way to monitor everything. Still a mix of email, pagerduty, prometheus, cloudwatch, websites, kibana consoles. Surely there is a good way to do this? I figure some of the new BI dashboards would be good but haven't seen much usage.

nijaveover 2 years ago

>Before jumping straight into a new technology, read and understand their docsThe number of issues I've seen that turn out to be documented features... (or, more accurately, things just being configured incorrectly)

virgilpover 2 years ago

> Microservices should only perform a single task. If you are not able to achieve that isolation, maybe you should switch back to a monolithic architecture. Do not get fooled by the current trends, microservices are not meant for everything.I feel like this is spectacularly bad advice. "Do not get fooled by shades of grey, things are meant to be either black or white!"

mustafabisic1over 2 years ago

Some solid career advice in there as well.I feel like this could used as one of those "How to 10x career" articles - and be better than all of them.

myfirstprojectover 2 years ago

> Git should be your only source of truth. Discard any local files or changes, what's not pushed into the repository, does not exist.Completely agree with that.

评论 #34132450 未加载

评论 #34136682 未加载

评论 #34132889 未加载

bobismyuncleover 2 years ago

Some of these are lessons you only really learn once you make the mistake yourself

lockedinspaceover 2 years ago

A helpful list of things to have in mind when working with anything tech related.

raxitsover 2 years ago

One moreHave a good logging & rollback strategy well communicated across stakeholders

martynvandijkeover 2 years ago

Nice guide, just curious are there more of these guides ?

qaqover 2 years ago

Don't just read docs try things -- make a POC. The amount of time we hit something that "should work" according to the docs but doesn't is very high.