Reflections on 10k Hours of DevOps

84 pointsby polyrandabout 2 years ago

13 comments

dborehamabout 2 years ago

Going to dispute this one:> The value of a CI/CD Pipeline is inversely proportional to how long the pipeline takes to run.Priority #1 is : test your product properly. This means "if it is supposed to do something, then you should have a test to check that it does that thing".Sometimes it just takes some time to put together and run such tests.In my experience some people (usually my perception is they are younger, more impatient people) who have read somewhere that tests should run really quickly, will argue that Priority #1 above doesn't matter, in the interest of their patience being assuaged.Long ago we had product builds that took hours and tests that often ran overnight, and sometimes over days. And we shipped sh*t that worked!

评论 #35389646 未加载

评论 #35389981 未加载

评论 #35391041 未加载

评论 #35389498 未加载

justin_oaksabout 2 years ago

> Reproducibility matters> Immutable infrastructure removes a whole class of bugs.Story time. Back around 2010 I was tasked to figure out why a server application written in Java was running out of memory. It happened often enough that the application had to be restarted daily. I was not the one who maintained the application, and I wasn't the one maintaining the servers it ran on either. I was just someone good at solving these kinds of problems.The odd thing was that the problem only existed on one server in production, even though the other servers should be running the exact same thing. I ran load tests on a staging server, also running the same code, and was unable to reproduce the problem.With VisualVM and some heap dumps handy, I narrowed down the problem to database-related objects, specifically objects from the Oracle JDBC driver were taking the most memory. I tried checking what was different between the servers that would cause one to have an issue and not the others. Was one getting more load than the others? Were the processors different? Memory? Disk speed?To make a long story short, I spent entirely too long debugging the problem and eventually found out that one of the servers had a different version of the Oracle JDBC driver than the others. And it seemed that particular version of the driver had a memory leak bug. It took me so long because I assumed that nobody would have been dumb enough have different library files on different computers.I could kill whoever put that file on one server and not the others! It must have been done manually. I never did find out who did it or why.Immutable infrastructure would have prevented this problem, but this was back in 2010 where we ran our own servers and immutable infrastructure wasn't a possibility.If we had sufficiently reproducible deployments then we could have just redeployed the application and the problem would have gone away. Alas, the Oracle library wasn't part of the application code, but instead was part of library code that was installed on the computer and never changed (or so I thought).It's a godsend to be able to deploy containers that not only contain your application code, but also your server software and their dependent libraries. Reproducibility is so much better that way.

评论 #35390492 未加载

评论 #35401854 未加载

ilovecachingabout 2 years ago

The Makefile one is important. Make is still the best tool for managing a dependency graph, adding project commands, building artifacts of any kind, whether it be building docker images, binary objects, or spinning up integration environments. You can use make anywhere and everywhere, it's powerful but it's also simple on the surface and battle tested.For DevOps in general I have learned to KISS, always know how your tools work, and focus on observability and low hanging automation. If you can't observe or understand how your systems works, you're screwed. You need metrics, logs, statistics, in one place that you can easily build queries with. You should be able to see everything at the fleet level down to the innards of each machine with no troubles.

pydryabout 2 years ago

>Code is better than YAML.This is too broad a backlash against it. Where YAML is used in problem domains where the configuration complexity is strictly limited and "etched in stone" it works very well. Nobody tends to champion YAML in this use case because it becomes somewhat invisible. It Just Works and nobody complains, but nobody raves either.In domains where you have to start templating YAML, it fails horribly of course. This is YAML being used in a domain where it either shouldn't ever have been used or where APIs of some kind should have been provided alongside to accommodate dynamic use cases.What "works" as configuration is often very hard to know up front, but the above attitude is throwing the baby out with the bathwater rather than seeking the correct balance.Turing complete code is inherently more difficult to parse, understand and modify clearly than a config file and much more susceptible to technical debt.>Your Integration Tests are Too LongThis is also a wrong attitude, I think. Faster feedback loops are great, but an obsession with speed typically leads to a lot of very quick tests that test very unrealistically - often missing entire classes of bug.I've worked on a bunch of projects which had test suites that completed in under 10 seconds and never caught any bugs. Meanwhile, the team would end up leaning on manual QA to replace "slower" integration tests and the feedback loop on that made 5 hour CI runs look entirely worthwhile.

评论 #35389912 未加载

评论 #35390064 未加载

Aperockyabout 2 years ago

What is DevOps?I've always thought dev ops was where one is both a dev and ops, but formally dev, and then also responsible for operations.Apparently that was not the case, there is dev, and then there are devOps. Apparently the ops part is more important for devOps, and dev is actually geared towards the ops end and not actually developing features.I was confused because I design and develop features, I build ops tools, I build and deploy CI/CD pipelines and I go oncall for bugs that happen. I thought I was devops, but apparently this is not how devops is defined in the broader industry. And apparently everyone else have much stricter role separation.

评论 #35389670 未加载

0xbadcafebeeabout 2 years ago

I agree with most of his points, but disagree on some:- The value of a CI/CD pipeline is the value of its output. Doesn't matter if it took 5 seconds or 5 days. I have worked on pipelines that were shitty and long but delivered enormous value, and quick ones that were noisy and useless.- The "code is better than yaml" post misunderstands what YAML is, and then says you should use a general purpose programming language for configuration. But if you're programming, you're not configuring, you're programming. My point is, both code and YAML are completely wrong solutions to the problem. Look at configuration file formats for 20+ year old programs as an example of good configuration. (Hint: none of them are either YAML or code)- "Release early, Release often" only works for certain products/services. It's not a good idea to deploy 10x a day to an airplane in flight. (No, contrarian who's about to reply, it's not a good idea, just shut up.)- Declarative configuration isn't a thing. It's just configuration. Non-declarative configuration isn't a thing either, that's just instructions. Declarative doesn't mean anything. It's a red herring people repeat because they heard someone describe it and it sounded smart.- Do not 'set -o pipefail' by default. It will only waste your time when your scripts start failing because some command in the middle of a pipe returned non-zero while still outputting the result you wanted, and you spend 3 hours adding debugging to figure out where the error was coming from and write some extra error handling code. Just check if the result of the pipe looks accurate. Same result, fewer failures.- If you have to do a simple task more than 3 times, check if (T*N)>A, or if it's a value chain bottleneck. If neither is true, don't automate yet.

评论 #35401023 未加载

评论 #35406911 未加载

thedenabout 2 years ago

>Makefiles are unreasonably effective.I'm torn, I like Makefiles, but I've seen a lot of unreadable Makefiles with a lot of interpolation and ugly escaping, including cases where what should have been a proper shellscript (his point 25) crammed in a Makefile. I think Makefiles should be dead simple to see what a target doesOverall I agree, but IMO people ought to drop the 10k hours Gladwell term, to be frank it's bullshit.

评论 #35391332 未加载

评论 #35390527 未加载

评论 #35389653 未加载

justin_oaksabout 2 years ago

> Linear git history makes rollbacks easier.Not only that but it makes it easier to read the history. I haven't yet been convinced of the benefit of merging pull requests instead of rebasing them. The history ends up so convoluted.

ltaabout 2 years ago

I'm likely to share this doc to a few of my customers. I think I'd well summarized and reasonably balanced. I like very much they it explains well trade offs I tend to select naturally and sometimes direct have the energy/patience to explainI'd be a bit harsher on lock-ins, which are rarely worth the loss of control in the long run, but otherwise <3

d_semabout 2 years ago

I think the comment "The value of a CI/CD Pipeline is inversely proportional to how long the pipeline takes to run." is a good starting position to have, that is then supplanted by expertise in what actually adds value in a given projects workflow.

brodouevencodeabout 2 years ago

> Immutable infrastructure removes a whole class of bugs.I'd like to see some examples of this.

评论 #35393927 未加载

评论 #35389970 未加载

评论 #35401046 未加载

esjeonabout 2 years ago

This is gold. Really. Well balanced, and much details.

dudefauxabout 2 years ago

Really good read. Well balanced. // someone in faang

13 comments

dborehamabout 2 years ago

评论 #35389646 未加载

评论 #35389981 未加载

评论 #35391041 未加载

评论 #35389498 未加载

justin_oaksabout 2 years ago

评论 #35390492 未加载

评论 #35401854 未加载

ilovecachingabout 2 years ago

pydryabout 2 years ago

评论 #35389912 未加载

评论 #35390064 未加载

Aperockyabout 2 years ago

评论 #35389670 未加载

0xbadcafebeeabout 2 years ago

评论 #35401023 未加载

评论 #35406911 未加载

thedenabout 2 years ago

评论 #35391332 未加载

评论 #35390527 未加载

评论 #35389653 未加载

justin_oaksabout 2 years ago

ltaabout 2 years ago

d_semabout 2 years ago

brodouevencodeabout 2 years ago

> Immutable infrastructure removes a whole class of bugs.I'd like to see some examples of this.