A Product Engineer’s Guide to Platform Engineers (2019)

174 pointsby absolute100about 4 years ago

13 comments

juancnabout 4 years ago

I work platform, a quick project is one or two quarters, a decent sized one is multi year, will likely take several deployments to roll out and at least a quarter for testing and stabilization, but when we deliver it to the several thousands clusters it will serve, the chance of a serious incident is close to zero.For this to work, we need to anticipate future requirements. We monitor usage patterns, make proyections, study the market our company serves and try to see trends and shifts. We care about what competitors are pushing and what complementary industries are doing. We are aware of the business, but not in the same way product does.When our plans work (and they mostly do), we usually deliver at the right time when the company needs it. We saw a shift in market patterns that would affect load a year ago, so we started working on scaling certain part of the system. It took us a year to get there, but when it was needed, we had it ready.Sometimes people ask for things that come out of the blue. We were unable to anticipate them and the work to get them done will take at least several quarters.These scenarios are the hardest, when you need to push back, but it's not that we're not doing anything, we're reshuffling several quarters of planning to adjust to the shift in reality, to correct the blind spot, so the next time you ask for something, it's ready, because we started a year ago.

评论 #26939912 未加载

dbloomanabout 4 years ago

As a recovering platform engineer, nearly 2 years out of the game, i identify a lot here. My experience is that platform engineers are mostly people who never actually write code for the platform that they have built, other than toy services to test, which are usually written in a language which isn't the company default.Communication is difficult because platform, like SWEs, are supporting people, but the support is direct, so platform often have a torrent of people asking for things. This is a problem when the SWE doesn't know what they want because they are a newbie out of university, or worse, they know what they want, but don't know how to work the insane platform mono repo that requires 9 items on a checklist to get a IAM permission added that the platform team member told them about in a support slack channel.I think a lot of platform teams also make their own lives harder by making too many abstractions. If it is impossible for an SWE to add a new resource to the cloud without a platform engineer helping, then platform are always going to be disadvantaged.Control is also a big issue.Platform want all the control and believe giving control away is going to cause a disaster. In reality, SWEs want 5% control, just to do the thing they have to ask you as a platform engineer to do anyway. But maybe platform engineers like that hero complex.

评论 #26943528 未加载

评论 #26939387 未加载

评论 #26939491 未加载

评论 #26938573 未加载

yargabout 4 years ago

The problem with product is that they care about individual features and not features in general.This leads to an inability to see the overlap across product distinct features that all seem largely the same from a platform perspective.The goal for platform is to take the overlap and push it down the stack, behind an API - hiding the subset of complexity that product does not need to be exposed to.Platform gets to hide the dragons, and product gets an API that allows for features to be implemented quickly and cleanly (it also provides improved constraints from a testing perspective).However, when there's a feature that cannot be implemented against the constraints and features currently imposed and available, things slow down - until platform figures out what new abstractions should be presented to allow for the new feature to be implemented.It does not matter overly if platforms initial implementation is well implemented or hacked into place - fix it later, but try to do a good job on the abstractions (this will allow for product to cleanly implement the new feature and for the platform side technical debt to be handled later on, without major impact to product).

评论 #26937500 未加载

评论 #26941393 未加载

chaosphere2112about 4 years ago

This is very on point. We have platform and product in entirely different reporting structures that meet just below Sundar; this leads to hysterically different cultures. My team is trying to straddle the product/platform divide right now and it’s a constant learning experience.A recent example: someone asked me for my team’s planning schedule, specifically when we figure out what we will be doing for the next six months.My only response is “we don’t do that here”; my team has to constantly juggle projects to load balance downtime on projects (most of our work is very sinusoidal in how much time it demands in a week), and we pick up new ones that are important enough to add to the load from time to time.The platform folks set out in December last year with a mission for all of 2021; the design is flexible, but they know exactly what the goals for their year are.When one team has projects that take an arbitrary amount of time to “get right”, and the other has more reliably predicted cycles that just depend on coding output, things get a little hairy. Mostly just takes a lot of empathy for the folks on the other side, though.

评论 #26937567 未加载

tayo42about 4 years ago

As someone who works on an infra or platform team at work, this is a nice thing to read. Wish more people at work had some empathy for the teams that support them. I really hate having to do customer service tasks at work. Surprisingly there's a lot entitled engineers to deal with, despite the fact I only support a few hundred. In a company we are all co workers, we work together. I'm not here to respond to demands from any one. I'll stop before this turns into a rant hah

评论 #26938875 未加载

fidrelityabout 4 years ago

I believe the differentiation between product and platform engineers are creating an unnecessary divide:A platform engineer builds something for a customer, just that in this case the customer is a developer, usually within the same organisation.If anything applying product thinking to the problem helps the platform engineer: They have a clearer persona for their customers. They can more easily speak to their customers (if internally). They want to deliver value to their customers and fix their problems.Why create this artificial divide and silo thinking? If your customers are not happy with your work you're doing a bad job and your product sucks.

评论 #26941265 未加载

评论 #26944857 未加载

fooblatabout 4 years ago

> I can’t stop everything and clean up. As much as I would like to, I have a deadline.In my experience, if this way of thinking has become institutionalized you have bad leadership. It is literally unsustainable to constantly develop without any aftercare or clean up. It doesn't matter if it is on the Product or Platform side.The best solution I have found starts with leadership. Leadership understands the idea of building codes and does understand this concept as applied to software.I have found that when quality standards are agreed by all teams and supported by leadership, a lot of the inter-team friction falls away.

评论 #26945475 未加载

a-dubabout 4 years ago

she leaves out a third category: ai/ml/ds, which also has huge cultural differences with traditional product and infra/platform teams. check out hilary mason's recent interview in the twiml podcast. she confirms something i've suspected for a very long time: agile is a bad fit for ml enhanced product/project development.

评论 #26937372 未加载

kazen44about 4 years ago

is this "friction" to just the same as the traditional orthogonal relationship between developers and operational teams?One want to introduce new features into the system, the other has the job of maintaining stability so the service is actually available to it's users.in my opinion a good metric on how "fast to move" is hard to define. The "Move fast and break things" that is popular in sillicon valley does not work when your services are used by critical systems. (Think, water supply. power supply, medical etc). But moving too slowly has the disadvantages of being (too late) to market. It's a very fine balance to strike, if not impossible to reach.

评论 #26939448 未加载

NiceWayToDoITabout 4 years ago

Somehow I find this highly related in many cases I have experienced : <a href="https://www.youtube.com/watch?v=BKorP55Aqvg" rel="nofollow">https://www.youtube.com/watch?v=BKorP55Aqvg</a>so question is then what?

hayst4ckabout 4 years ago

This perspective is wrong because it assumes that there are platform engineers and product engineers and that they need to negotiate with each other as equals in the absence of an arbiter that can set priorities and explain the long term consequences of unreleased product to platform or the consequences of unmaintainable code to product. The absence of the big picture creates agitation rather than alignment.This blog post is an indicator of extremely weak leadership. Lack of a leader with both authority (from the org chart) and legitimacy (because people believe in their technical expertise) is exactly the situation that results in this kind of blog post being written.Full disclosure: I am a platform engineer.My take:<pre><code> * Product engineers overestimate their ability and expertise * Product engineers underestimate the damage they cause with several classes of errors, especially dependency errors * Organizations generally prioritize product engineers concerns over platform engineers concerns, product = money, platform = cost * Organizations under-invest in platform headcount, even when all product teams complain about the dev experience. * Platform engineers fail to create pleasant frameworks * Platform teams repeatedly fail to make the right choice the easiest/only choice * Product wants more from platform, but won't give up their headcount * Product is more than happy to spend platforms money (they should be oncall/do maintenance), but isn't happy when platform spends products money (we don't have time to refactor/migrate). </code></pre> Here is a list of things product teams might do that would make platform engineers lives hell:<pre><code> * add new dependencies, especially without consulting anyone * create circular dependencies/inline imports/other dependency abominations * use the global scope * change how the server works in the wrong location * fail to or completely ignore dependency injection * implement a solution to a problem they haven't yet understood * HOPE a short cut solution will work because they don't want to solve the problem the hard direct way * try to solve a performance problem by adding complexity, rather than reducing complexity * demand specific infrastructure changes in an area they don't maintain * make significant changes for a service they are not on call for * write crazy complicated tests that take forever and completely fail to unit test * contract out features to other companies without consulting platform * ask for new languages/paradigms to be supported (which also means maintained) * not do napkin math to determine how much space their new data will take * not do napkin math to determine how much a new feature will increase utilization * not warn platform teams of new/impending load * not consulting platform teams about complex new features BEFORE implementation * abandoning services/code they don't want to maintain, but can still break * failing to migrate or finish a migration * failing to actively or passively monitor the capacity/health of their own features * create product that someone else will be on the hook to maintain * implement the same behavior another team needs in their own way because coordinating is too hard * implement the same behavior another team is responsible for providing because they need it now * creating a new system without first understanding what was bad or refactoring the old system * failing to quarantine business logic, especially from infra logic. </code></pre> All the product, platform, and infra engineers in these toxic environments sit down trying to diagnose the key problems about why the dev experience sucks. Then ask themselves "how are we failing so bad? Why does product do such insane things? Why does oncall suck so much? Why is the platform so hard to use? What is infra even doing? Who is solving these problems?" It's clear something is wrong, but nobody knows exactly what it is.The problem is a lack of leadership. When people write posts about product and platform being at each other's throats, that is an abject failure of senior leadership at a company.A leadership that fails to interface with or solicit feedback from line workers is a failed leadership. A leadership that fails to see the bigger picture is a failed leadership. A leadership that fails to see long term costs is a failed leadership. A leadership that promotes short term prolific product devs that squirt out product rather than foundational, do the hard thing because its necessary even if it takes more time, product devs is doomed.If you go around and all the people in your technical leadership positions can't tell you what the real architectural problems are because they are too busy plopping out features, that means leadership has failed.And guess what! All the senior engineers are avoiding these problems like the plague because sane solutions require a lot of really unpleasant, really unsexy work that is not guaranteed to succeed. Why would a senior engineer work on a hard problem when they can write a horribly complex feature, get a promotion, a 20% bonus, and quit the instant there are any repercussions for their bad decisions with a resume that talks about the amazing features they created? I'm not asking that facetiously. Is hard grunty refactoring work valued by the companies reward system/authority system? Do platform engineers get to write performance reviews for the senior engineers that cause them pain? Where/how does quality/architecture feedback even get considered systematically?How many kids out of college even know what dependency injection or the global scope is? How many of them can mock a canonical easily understood teaching example on the white board of a global scope violation or dependency that was not injected? How many senior engineers can do that? How many onboardings, documentations, or classes exist that explain these concepts and the long term consequences of ignoring them? What do you do if your senior engineers are leading by example by taking those shortcuts if it gets their feature out faster?What every product engineer needs to understand is that a platform is a commons, like the tragedy of the commons. A grassy meadow with a bunch of cattle ranchers might let the cattle eat their fill, but if all the cattle eat all the grass, the cattle will starve or the ranchers will fight. In the same way, if a bunch of devs are allowed to create as much complexity as they want, the server will starve (tests will take to long, build will always be broken, pages will always be sent) or engineers will start to fight.Every individual dev is incentivized to create as much complexity as quickly as they can because they want to meet their goals. With no force regulating the complexity it is not a stable system and will definitely collapse. So the platform team comes in and says "no" because that's the only way to really limit complexity. Leaderships job is to regulate this relationship and ensure that everyone is aligned and properly incentivized, so that "the cattle ranchers can have as many cattle as the fields can sustain."

barnaclejiveabout 4 years ago

TBH, I stopped reading when I saw the first meme/gif.

Syzygiesabout 4 years ago

> 25 × 10^6 is, roughly, what separates us from orangutans: 12 million years to our common ancestor on the phylogenetic tree and then 12 million years back by another branch of the tree to the present day orangutans.> But are there topologists among orangutans?> Yes, there definitely are: many orangutans are good at ”proving” the triviality of elaborate knots, e.g. they fast master the art of untying boats from their mooring when they fancy taking rides downstream in a river, much to the annoyance of people making these knots with a different purpose in mind.<a href="https://www.ihes.fr/~gromov/wp-content/uploads/2018/08/manifolds-Poincare.pdf" rel="nofollow">https://www.ihes.fr/~gromov/wp-content/uploads/2018/08/manif...</a>Whenever anyone refers to another sentient being as stupid, the assessment tends to say more about the speaker.