TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

What Is ‘Site Reliability Engineering’?

186 pointsby peterkshultzabout 8 years ago

20 comments

StreamBrightabout 8 years ago
One thing that I was not aware prior to working as an SRE is how much they rely on statistics. This approach that you can use stats to determine what is a normal or abnormal level for a particular metric (like packet loss for example) became pretty useful through my career.<p>A quick example, I was called into a meeting at a company where I worked in a non-SRE role, and the team explained to me that they are not able to identify what is wrong but their cluster is misbehaving and there is a node that gets kicked out of it regularly. I pulled up a console and started to compare OS level metrics across the cluster. The sysadmin team thought I am stupid because they explicitly told me which node was in trouble. After the third metrics I checked I found out that the node in question was doing 5000x more packet loss than the second worst in the cluster. It was a faulty NIC at that time. The sysadmin team was checking all of the metrics on the broken node but never compared the results to the healthy ones.
评论 #14157339 未加载
SwellJoeabout 8 years ago
I have a theory that Google invented the SRE because they didn&#x27;t know how to hire system administrators, but they knew how to hire software engineers. So, they just hired software engineers and told them to figure out the systems.<p>I say this only partially in jest.
评论 #14155704 未加载
评论 #14157423 未加载
评论 #14155380 未加载
评论 #14155272 未加载
评论 #14155601 未加载
评论 #14154857 未加载
评论 #14155908 未加载
nailerabout 8 years ago
&gt; We care deeply about keeping SRE an engineering function, so our rule of thumb is that an SRE team must spend at least 50% of its time doing development.<p>Prior to SRE any good system administrator was doing this: &quot;if it&#x27;s worth doing, it&#x27;s worth automating&quot;. But there was another half who were cutting and pasting shit from Word files into Solaris boxes. sysadmin -&gt; SRE seems to have cleaned out the chaff.
评论 #14157282 未加载
评论 #14158902 未加载
imeshabout 8 years ago
I work at a web host, and my SRE title means being being a developer who gets constantly interrupted by alerts and customer chats.
评论 #14155209 未加载
评论 #14155567 未加载
评论 #14156382 未加载
评论 #14154812 未加载
atsaloliabout 8 years ago
<a href="https:&#x2F;&#x2F;www.usenix.org&#x2F;conference&#x2F;lisa16&#x2F;conference-program&#x2F;presentation&#x2F;closing-plenary" rel="nofollow">https:&#x2F;&#x2F;www.usenix.org&#x2F;conference&#x2F;lisa16&#x2F;conference-program&#x2F;...</a> is a video of Niall Murphy&#x27;s excellent presentation with Todd Underwood of how smaller organizations can implement SRE basics. Dec 2016. USENIX LISA in Boston. I had the privilege to attend it.
raz32dustabout 8 years ago
With more automation and containerization, I see the SRE role and dev role coming together, eventually merging into &quot;devops&quot;. Today, these roles are separate because they require slightly different skill sets. Maintaining production systems takes up about as much time as developing new features. As it becomes easier and easier, dev will be the ops, even in big companies.
评论 #14154664 未加载
NickNameNickabout 8 years ago
IEEE software engineering radio did a good episode on Site Reliability engineering.<p><a href="http:&#x2F;&#x2F;www.se-radio.net&#x2F;2016&#x2F;12&#x2F;se-radio-episode-276-bjorn-rabenstein-on-site-reliability-engineering&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.se-radio.net&#x2F;2016&#x2F;12&#x2F;se-radio-episode-276-bjorn-r...</a>
NotQuantumabout 8 years ago
I&#x27;ve fallen in love with SRE field. I&#x27;m a Computer Engineering senior currently. I&#x27;m used two kinds of classes: CS ones where you learn a lot of theory and apply it on a test, then the CprE ones where you also learn, but then have to make it work in labs. I&#x27;ve always liked lab based classes where you have to take a concept to fruition.<p>I&#x27;ve been interested in all aspects of CprE, and I taught myself how to run a Linux box along with DNS, VPN, and other services. Last year around this time, I was contacted by a recruiter for an SRE internship. At the time, I had no idea what SRE was and I thought it was just a glorified IT job. Boy was I wrong.<p>I got through a few interviews and got the position for the summer. About a week or two into the internship I fell in love. This job was all about designing and implementing systems that have to be resilient and must scale. The idea of building automation to make my job easier was and is great. It was just like the labs I enjoyed in college.<p>Fast forward to now, and I&#x27;m accepting a full time SRE position at the same company. I couldn&#x27;t be happier with my choice in specification. The need for resilient, distributed systems will only grow in the coming years, and I&#x27;m looking forward to being an SRE.
zatkinabout 8 years ago
&gt;We&#x27;ve held that hiring bar constant through the years, even at times when it&#x27;s been very hard to find people, and there&#x27;s been a lot of pressure to relax that bar in order to increase hiring volume. We&#x27;ve never changed our standards in this respect. That has, I think, been incredibly important for the group. Because what you end up with is, a team of people who fundamentally will not accept doing things over and over by hand, but also a team that has a lot of the same academic and intellectual background as the rest of the development organization. This ensures that mutual respect and mutual vocabulary pertains between SRE and SWE.<p>It seems like changing their hiring process is a double edged sword. If they change it to allow more hiring volume, then other employees might become frustrated with how easy it becomes to work at Google. On the other hand, keeping an old hiring process where false negatives continue to occur seems very bad.
评论 #14156114 未加载
robhirschfeldabout 8 years ago
SRE is a job function. By design, it&#x27;s intended to be equivalent in pay and status with developers (SWE) to overcome the bias against operators and sysadmins in organizations. This is an important recognition because cloud-first operations requires a lot of automation and coding expertise that previous operations roles did not demand.<p>DevOps is really a process definition with Lean system thinking and code workflow priorities. Many people will tell you that it is NOT a job function but a culture or approach. DevOps for developers generally means CI&#x2F;CD pipelines and owning code into production. DevOps for operators generally means building configuration automation and integrated monitoring tools. In this was, DevOps highly complementary of the SRE job function.<p>I&#x27;ve been writing a lot about this on my personal (robhirschfeld.com) and company (rackn.com&#x2F;sre) blogs. I&#x27;d be happy to discuss this in more detail here.
dogecoinbaseabout 8 years ago
SREs are a tool to turn N ops engineers paid X each into 1 SRE paid 2X and N manual laborers paid X&#x2F;4 each.<p>This doesn&#x27;t make the role bad. But it&#x27;s important to remember that the role exists as a cost savings to the org, not because it&#x27;s an inherently better way to run a technical infrastructure.
评论 #14154038 未加载
评论 #14154023 未加载
评论 #14156100 未加载
rodionosabout 8 years ago
It&#x27;s a euphemism for a system administrator with responsibilities to test, integrate, and automate systems with code.
HeavenBannedabout 8 years ago
A &quot;SRE&quot; is what happens when you want to pronounce the word &quot;SWE&quot; but can&#x27;t. For some reason you keep saying &quot;SRE&quot; over and over and over again.<p>They were overcompensating for the fact that SREs aren&#x27;t SWEs so hard. It&#x27;s like &quot;we get it, SREs are wannabe SWEs, stop trying to sugar coat it&quot;. 50% development? What a disaster. If half your job is the job that you want and the other half is administrative bullshit, why in the living fuck would you try to make a puff piece about that?<p>It seems as though from what everyone has said in this thread, that SRE is basically a scam along with DevOps and that the real job people want is the SWE.<p>I don&#x27;t like internal memo propaganda pieces by big companies. It&#x27;s not intellectually stimulating: it&#x27;s hogwash. Let the truth reign always.
评论 #14158968 未加载
sigi45about 8 years ago
Jepp thats how i always wanted to do software engineering: Understanding &#x2F; controlling the full stack and taking responsibility for it.
评论 #14155781 未加载
zeckalphaabout 8 years ago
Is this interview new or was it released as part of the book?
评论 #14154417 未加载
burntrelish1273about 8 years ago
Here&#x27;s a script to fetch an offline copy <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;steakknife&#x2F;76214a4bb378592669655e3bbc30a1cc&#x2F;" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;steakknife&#x2F;76214a4bb378592669655e3bb...</a>
grabcocqueabout 8 years ago
SRE: because DevOps isn&#x27;t buzzwordy enough these days.
评论 #14155708 未加载
评论 #14155769 未加载
traf68about 8 years ago
It is stupidity, hubris and a disposition to chaos.
deckardb26354about 8 years ago
SRE? Apparently the only &#x27;software&#x27; job Google has in Dublin. It doesn&#x27;t matter if you have a PhD or wrote your own kernel, want to write code for Google, move to mountain view. Oh and the seven interviews. Complete waste of time.
awkbugabout 8 years ago
I recently attended interviews at LinkedIn, attlasian and my experience was very bad. First round is online exam and I answered all the questions. Attlasian rejected even having 100% right with all test cases. No response from LinkedIn. They told I can use any language to solve and I chose bash. I think they didn&#x27;t like me using bash. The guy who interviewed me at LinkedIn is system administrator with sre title. Funny thing is he said he doesn&#x27;t do programming. Companies are just misusing these titles. They need software engineering who can do system administration. The types who run apt-get on Centos :p