We don’t use a staging environment

409 pointsby Chris86about 3 years ago

100 comments

rock_hardabout 3 years ago

This is pretty common actuallyAt Facebook too there was no staging environment. Engineers had their dev VM and then after PR review things just went into prodThat said features and bug fixes were often times gated by feature flags and rolled out slowly to understand the product/perf impact betterThis is how we do it at my current team too…for all the same reasons that OP states

评论 #30899906 未加载

评论 #30903624 未加载

评论 #30902745 未加载

评论 #30900928 未加载

评论 #30901332 未加载

评论 #30901397 未加载

评论 #30903470 未加载

评论 #30902857 未加载

评论 #30903968 未加载

评论 #30901257 未加载

评论 #30902009 未加载

评论 #30904834 未加载

评论 #30899547 未加载

nickelproabout 3 years ago

This article has some very weird trade-offs.They can't spin up test environments quickly, so they have windows when they cannot merge code due to release timing. They can't maintain parity of their staging environments with prod, so they forswear staging environments. These seem like infrastructure problems that aren't addressing the same problem as the staging environment eo ipso.They're not arguing that testing or staging environments are bad, they're just saying their organization couldn't manage to get them working. If they didn't hit those roadblocks in managing their staging environments, presumably they would be using them.

评论 #30902612 未加载

评论 #30903193 未加载

评论 #30899763 未加载

评论 #30899768 未加载

评论 #30900027 未加载

WYepQ4dNnGabout 3 years ago

I don't see how this can scale beyond a single service.Complex systems are made of several services and infrastructure all interconnected. Things that are impossible to run on local. And even if you can run on local, the setup is most likely very different from production. The fact that things work on local give a little to zero guarantees that they will work in prod.If you have a fully automated infrastructure setup (e.g: terraform and friends), then it is not that hard to maintain a staging environment that is identical to production.Create a new feature branch from main, run unit tests, integrations tests. Changes are automatically merged in the main branch.From there a release is cut and deployed to staging. Run tests in staging, if all good, promote the release to production.

评论 #30900077 未加载

评论 #30904280 未加载

评论 #30902723 未加载

评论 #30902652 未加载

评论 #30899639 未加载

jasonhanselabout 3 years ago

This is good insofar as it forces you to make local development possible. In my experience: it's a big red flag if your systems are so complex or interdependent that it's impossible to run or test any of them locally.That leads to people only testing in staging envs, causing staging to constantly break and discouraging automated tests that prevent regression bugs. It also leads to increasing complexity and interconnectedness over time, since people are never encouraged to get code running in isolation.

评论 #30900476 未加载

评论 #30900680 未加载

jb3689about 3 years ago

I don't think you can take infrastructure seriously without a staging environment. For many companies that is fine - they don't have significant infrastructure to maintain (or just don't maintain the infrastructure they have).I work on a team that maintains our database layer and the lack of a staging environment is incredibly painful. Every test has to be done in production and massive effort needs to be taken to proceed safely. With a staging environment you can be more aggressive and come up with a solid benchmark and test suite to gain confidence rather than having to data collect in prod

评论 #30902886 未加载

otterleyabout 3 years ago

The short answer appears to be "we are cheap and nobody cares yet."It's easy to damn the torpedoes and deploy straight into production if there's nobody to care about, or your paying customers (to the extent you have any) don't care either.Once you start gaining paying customers who really care about your service being reliable, your tune changes pretty quickly. If your customers rely on data fidelity, they're going to get pretty steamed when your deployment irreversibly alters or irrevocably loses it.Also, "staging never looks like production" looks like a cost that tradeoff that the author made, not a Fundamental Law of DevOps. If you want it to look like production, you can do the work and develop the discipline to make it so. The cloud makes this easier than ever, if you're willing to pay for it.

评论 #30900244 未加载

评论 #30902732 未加载

fishtoasterabout 3 years ago

This is a pretty weird article. Their "how we do it" section lists:- "We only merge code that is ready to go live"- "We have a flat branching strategy"- "High risk features are always feature flagged"- "Hands-on deployments" (which, from their description, seems to be just a weird way of saying "we have good monitoring and observability tooling")...absolutely none of which conflict with or replace having a staging environment. Three of my last four gigs have had all four of those and found value in a staging environment. In fact, the often help make staging useful: having feature-flagged features and ready-to-merge code means that multiple people can validate their features on staging without stepping on eachother's toes.

评论 #30899812 未加载

评论 #30899713 未加载

评论 #30899913 未加载

评论 #30899892 未加载

评论 #30900041 未加载

myth2018about 3 years ago

I'm assuming this is not an April Fools' joke, and my comments are targeted at the discussion it sparked here anyway.A flat branching model simplify things, and the strategy they describe surely enables them to ship features to production faster. But the risks I see there:- who decides when a feature is ready to go to production? The programmer who developed them? The automated tests?- features toggleable by a flag must, at least ideally, be double-tested -- both when turned on and off. Being in a hurry to deploy to production wouldn't help on that;- OK, staging environments aren't in parity with production. But wouldn't they be better than the CD/CI pipeline, or developer's laptop, testing new features in isolation?- Talking about features in isolation: what about bugs caused by spurious interaction between two or more features? No amount of test would find them if they only test features in isolation

评论 #30916691 未加载

zimbatmabout 3 years ago

If you can, provide on-demand environments for PRs. It's mostly helpful to test frontend changes, but also database migrations and just demoing changes to colleagues.If you have that, you will see people's behaviour change. We have a CTO that creates "demo" PRs with features they want to show to customers. All all the contension around staging as identified in the article is mostly gone.

评论 #30900180 未加载

评论 #30900099 未加载

klabb3about 3 years ago

Not endorsing this point blank but.. One positive side effect of this is that it becomes much easier to rally folks into improving the fidelity of the dev environment, which has compound positive impact on productivity (and mental health of your engineers).In my experience at Big Tech Corp, dev environments were reduced to low unit test fidelity over years, then as a result you need to iterate (ie develop) in a staging environment that is orders of magnitude slower (and more expensive if you're paying for it). It isn't unusual that waiting for integration tests is the majority of your day.Now, you might say that it's too complex so there's no other way, and yes sometimes that's the case, but there's nuance! Engineers have no incentive to fix dev if staging/integration works at all (even if super slow) so it's impossible to tell. If you think slow is a mild annoyance, I will tell you that I had senior engineers on my team that committed around 2-3 (often small) PRs per month.

评论 #30900791 未加载

productceoabout 3 years ago

> We only merge code that is ready to go live.In their perception, is the rest of tech industry gambling in every pull request that some untested code would work in production?I work at a large company. We extensively test code on local machines. Then dev test environments. Then small roll out to just a few data centers in prod bed. Run small scale online flight experiments. Then roll out to the rest of prod bed.And I've seen code fail in each of the stages, no matter how extensively we tested and robustly code ran in prior stages.

评论 #30899739 未加载

评论 #30900170 未加载

评论 #30899921 未加载

评论 #30900049 未加载

midrusabout 3 years ago

Good monitoring, logs, metrics, feature flagging (allowing for opening a branch of code for a % of users), blue/green deployment (allowing a release to handle a % of the user's traffic) and good tooling for quick builds/releases/rollback, in my experience, are far better tools than intermediate staging environments.I've had great success in the past with a custom feature flags system + Google's App Engine % based traffic shifting, where you can send just a small % of traffic to a new service, and rollback to your previous version quickly without even needing to redeploy.Now, not having those tools as a minimum, and not having either staging environment is just reckless. No unit/integration/whatever tests are going to make me feel safe about a deploy.

评论 #30900076 未加载

debarshriabout 3 years ago

We used to believe staging environments are not important enough. If you believe that then I would argue that you have not crossed a threshold as an org where your product is critical enough for you consumers. The staging environment or any for that matter just acts as a gating mechanism to not ship crappy stuff to customers. You cannot have too many gates, then you would be shipping lates but with less number of gates you end up shipping low quality product.Staging environment saves unnecessary midnight alerts and easy to catch issues that might have a huge impact when a customer has to face it. I wouldn't be surprised if in few quarters or a year or so they would have an article about why they decided to introduce a staging environment.

评论 #30900189 未加载

parksyabout 3 years ago

This sounds like something I would write if a hypothetical gun was pointed at my head in a company where the most prominent customer complaint was that time spent in QA and testing was too expensive.I have zero trust in any company that deploys directly from a developer's laptop to production, not in the least starting with how much do you trust that developer. There has to be some process right?

评论 #30899916 未加载

评论 #30900028 未加载

user3939382about 3 years ago

> Pre-live environments are never at parity with productionThen you fix that particular problem. Infrastructure as code is one idea just off the top of my head.

评论 #30900484 未加载

评论 #30900153 未加载

评论 #30899827 未加载

MetaWhirledPeasabout 3 years ago

I don't have experience with the true CI he describes, but I do have experience with pre-production environments.> "People mistakenly let process replace accountability"I find this to be mostly true. When the code goes somewhere else before it goes to prod, much of the burden of responsibility goes along with it. Other people find the bugs and spoon feed them back to the developers. I'm sure as a developer this is nice, but as a process I hate it.

评论 #30900163 未加载

评论 #30899808 未加载

jeffbeeabout 3 years ago

What I infer from the article is this company does not handle sensitive private data, or they do but are unaware of it, or they are aware of it and just handle it sloppily. I infer that because one of the biggest advantages of a pre-prod environment is you can let your devs play around in a quasi-production environment that gets real traffic, but no traffic from outside customers. This is helpful because when you take privacy seriously there is no way for devs to just look at the production database, or to gain interactive shells in prod, or to attach debuggers to production services without invoke glass-breaking emergency procedures. In the pre-prod environment they can do whatever they want.Most of the rest of the article is not about the disadvantages of pre-prod, but the drawbacks of the "git flow" branching model compared to "trunk based development". The latter is clearly superior and I agree with those parts of the article.

mkl95about 3 years ago

> People mistakenly let process replace accountability> We only merge code that is ready to go live.This is one of the most off-putting things I have read on HN lately. Having worked on several large SaaS where leadership claimed similar stuff, I simply refuse to believe it.

评论 #30900114 未加载

blorenzabout 3 years ago

We duplicate the production environment and sanitize all the data to be anonymous. We run our automated tests on this production-like data to smoke test. Our tests are driven by pytest and Playwright. God bless, I have to say how much I love Playwright. It just makes sense.

评论 #30900790 未加载

评论 #30901061 未加载

DevKoalaabout 3 years ago

> We only merge code that is ready to go liveThat’s a cool April fool’s squeaky.ai

评论 #30904341 未加载

KaiserProabout 3 years ago

> We only merge code that is ready to go liveCool story, but you don't _know_ if its ready until after.Look, staging environments are not great, for the reasons described. But just killing staging and having done with it isn't the answer either. You need to _know_ when your service is fucked or not performing correctly.The only way that this kind of deployment is practical _at scale_ is to have comprehensive end-to-end testing constantly running on prod. This was the only real way we could be sure that our service was fully working within acceptable parameters. We ran captured real life queries constantly in a random order, at a random time (caching can give you a false sense of security, go on, ask me how I know)At no point is monitoring strategy discussed.Unless you know how your service is supposed to behave, and you can describe that state using metrics, your system isn't monitored. Logging is too shit, slow and expensive to get meaningful near realtime results. Some companies expend billions taming logs into metrics. don't do that, make metrics first.> You’ll reduce cost and complexity in your infrastructureI mean possibly, but you'll need to spend a lot more on making sure that your backups work. I have had a rule for a while that all instances must be younger than a month in prod. This means that you should be able to re-build _from scratch_ all instances and datastores. Instances are trivial to rebuild, databases should also be, but often arn't. If you're going to fuck around an find out in prod, then you need good well practised recovery procedures> If we ever have an issue in production, we always roll forward.I mean that cute and all, but not being able to back out means that you're fucked, you might not think you're fucked, but that's because you've not been fucked yet.its like the old addage, there are two states of system admin: Those who are about to have data loss, and those who have had data loss.

评论 #30900908 未加载

评论 #30900948 未加载

nunezabout 3 years ago

This makes sense. With a high-enough release velocity to trunk, a super safe release pipeline with lots of automated checks, a well-tested rolling update/rollback process in production, and aggressive observability, it is totally possible to remove staging in many environments. This is one of the popular talking points touted by advocates of trunk-based development.(Note that you can do a lot of exploratory testing in disposable environments that get spun up during CI. Since the code in prod is the same as the code in main, there's no reason to keep them around. That's probably how they get around what's traditionally called UAT.)The problem for larger companies that tend to have lots of staging environments is that the risk of testing in production vastly exceeds the benefits gained from this approach. Between the learning curve required to make this happen, the investment required to get people off of dev, the significantly larger amounts of money at stake, and, in many cases, stockholder responsibilities, it is an uphill battle to get companies to this point.Also, many (MANY) development teams at BigCo's don't even "own" their code once it leaves staging.I've found it easier to employ a more grassroots approach towards moving people towards laptop-to-production. Every dev wants to work like Squeaky does (many hate dev/staging environments for the reasons they've outlined); they just don't feel empowered to do so. Work with a single team that ships something important but won't blow up the company if they push a bad build into prod. Let them be advocates internally to promote (hopefully) pseudo-viral spread.

throwaway787544about 3 years ago

This is good practice, except that blue/green is not exactly what you want. You want a smart load balancer that can shuffle an exact amount of traffic to a new service with your new deploy version. It must then evaluate the new service for errors and metrics, and then do an increase in shuffling traffic, etc, until you reach 100% shuffled traffic, at which time the old services can be decommissioned.If at any time the monitoring of logs or metrics becomes unusual, it must shuffle all traffic away from the new service, alert devs, and halt all deploys (because someone needs to identify the bad code and unmerge it, thus requiring rework for all the subsequent work about to be merged). This is called "pulling the andon cord".It is sad that there's all these comments saying this doesn't work. This has been the best practice established by Etsy, Martin Fowler, and others in the DevOps community for... 10 years? I guess until you see it for yourself it seems unbelievable. It requires a radical shift in design, development and operation, but it works great.

craigmcnamaraabout 3 years ago

We have review environments so there is an easy way to have a fairly persistent config to QA features, but our environment that's named staging is more of a historical artifact. It's basically the same as a review environment because we recognize that after testing that the feature works as intended, it's going out and, we may be surprised by real production use. Our test suite, which is kicked off after you hit the merge button takes about 10-15 minutes and build/deploy to Amazon ECS is 8 to 10 minutes so there is pretty quick feedback. We also use feature flags when possible, but most deploys are very granular and we generally don't worry if something passes our test suite which is currently about 6k tests. Once we decided that merge to main get deployed automatically our staging environment became just another environment, our velocity increased, security patches are deployed almost immediately and we mostly don't worry about launches.

lapserabout 3 years ago

Disclaimer: I worked for a major feature flagging company, but these opinions are my own.This article makes a lot of valid points regarding staging environments, but their reasoning to not use them is dubious. None of their reasons are good enough to take staging environments out of the equation.I'd be willing to be that the likelihood of anyone merging code that isn't ready to go live is close to zero. You still need to validate the code. Their branching strategy is (in my opinion) the ideal branching strategy, but again, that isn't good enough to take staging away.Using feature flags is probably the only reason they give that comes to close to being okay with getting rid of staging, but even then, you can't always be sure that the code you've built works as expected. So you still need a staging environment to validate some things.Having hands-on deployments should always be happening anyway. It's not a reason to not have a staging environment.If you truly want to get rid of your a staging environment the minimum that you need to feature flagging of _everything_, and I do mean everything. That is honestly near impossible. You also need live preview environments for each PR/branch. This somewhat eliminates the need for a staging because reviewers can test the changes on a live environment. These two things still aren't good enough reason to get rid of your staging environment. There is still many things that can go wrong.The reason we have layered deployment systems (CI, staging etc) is to increase confidence that your deployment will be good. You can never be 100% sure. But I'll bet you, removing a staging environment lowers that confidence further.Having said all of this, if it works for you, then great. But the reasons I've read on this post, don't feel good enough to me to get rid of any staging environments.

评论 #30900112 未加载

评论 #30899926 未加载

Negitivefragsabout 3 years ago

If you are saying you don't have a staging environment, what you are really saying is that your company doesn't have any QA process.If your QA process is just developers testing their own shit on their local machine then you are not going to get as much value out of staging.

评论 #30900398 未加载

评论 #30900519 未加载

评论 #30900465 未加载

评论 #30900593 未加载

评论 #30900413 未加载

评论 #30900399 未加载

评论 #30900608 未加载

sergiotapiaabout 3 years ago

> We only merge code that is ready to go live > If we’re not confident that changes are ready to be in production, then we don’t merge them. This usually means we've written sufficient tests and have validated our changes in development.Yeah I don't trust even myself with this one. Your database migration can fuck up your data big time in ways you didn't even predict. Just use staging with a copy of prod. <a href="https://render.com/docs/pull-request-previews" rel="nofollow">https://render.com/docs/pull-request-previews</a>Sounds like OP could benefit from review apps, he's at the point where one staging environment for the entire tech org slows everybody down.

rileymat2about 3 years ago

I think a lot of these process type articles would be well served by linking to some other post about team and project structure, size and scope.

higeorge13about 3 years ago

They mention database as a factor not to have a staging env due to different size, but they don’t mention how they test schema migrations and any feature which touches the data which usually produce multiple issues, or even data loss.

cortesoftabout 3 years ago

This makes some sense for a single application environment. In our system, however, there are dozens of interacting systems, and we need an integration environment to ensure that new code works with all the other systems.

mianosabout 3 years ago

This probably also depends on your core business. If your product does not deal with real money, crypto, or other financial instruments and it is not serious if something goes wrong with a small number of people in production, this may work for you. It is probably cheaper and simpler. Lots of products are not like that. I built a bank and work on stock exchanges. Probably not a good idea to save money by not testing as people get quite annoyed when their money goes missing.

benjaminwaiabout 3 years ago

I think we are missing some contexts here. I have been trying to find more information about them. From what I found [1] (hopefully accurate) it looks like they are a new team - Beta in August 2021 and just incorporated in this February. The founder/CTO is a full stack developer. I speculate they are a very small team (1-2 developers at the most) and a relatively straightforward architecture. In that context I suspect it is quite feasible to go from local to production without going through staging: They are likely to have a self sustained stack that can be packaged; they don't have a huge database or collection of edge cases; they have few customers, low expectation in terms of service level; they don't have stakeholders to review and approve features done (they are their own bosses). I emphasize with where they are, I have been in the same place at some point. It will be interesting to see whether this is sustainable without staging, or for how long, as they grow in team and offering.[1] <a href="https://www.indiehackers.com/product/squeaky" rel="nofollow">https://www.indiehackers.com/product/squeaky</a>

richardfeyabout 3 years ago

Let's talk again about this after the next postmortem?

mattmabout 3 years ago

An important piece of context missing from the article is the size of their team. LinkedIn shows 0 employees and their about page lists the two cofounders so I assume they have a team of 2. It's odd that the article talks about the problems with large codebases and multiple people working on a codebase when it doesn't look like they have those problems. With only 2 people, of course they can ship like that.

kuonabout 3 years ago

How do you do QA? I mean, staging in our case is accessible by a lot of non technical people that test things automated test cannot test (did I say test?).

donohoeabout 3 years ago

It seems like an April 1st troll (based on publication date), but I am assuming its not.I can only say that this is a fairly poor decision from someone who appears knowledgeable to know better.They could do everything they are doing as-is in terms of process, and just add a rudimentary test on a Staging environment as it passes to Production.Over a long enough timeline it will catch enough critical issues to justify itself.

anderscoabout 3 years ago

Isn’t the concept of a single staging environment becoming a bit dated? Every recent project I’ve worked on uses preview branches or deploy previews, eg what Netlify offers <a href="https://docs.netlify.com/site-deploys/deploy-previews/" rel="nofollow">https://docs.netlify.com/site-deploys/deploy-previews/</a>Or am I missing something?

评论 #30899681 未加载

评论 #30900432 未加载

tezzaabout 3 years ago

This reads like a Pre-Mortem.When they lose all their most important customers’ data because the feature flags got too confusing… they can take this same article and say: “BECAUSE WE xxxx that led to YYYY.In future we will use a Staging or UAT environment to mitigate against YYYY and avoid xxxx”Saving time on authoring a Post Mortem by pre-describing your folly seems like an odd way to spend precious dev time

marvinblumabout 3 years ago

I use a somewhat similar approach for Pirsch [0]. It's build so that I can run it locally, basically as a fully fledged staging environment. Databases run in Docker, everything else is started using modd [1]. This has proven to be a good setup for quick iterations and testing. I can quickly run all tests on my laptop (Go and TypeScript) and even import data from production to see if the statistics are correct for real data. Of course, there are some things that need to be mocked, like automated backups, but so far it turned out to work really well.You can find more on our blog [2] if you would like to know more.[0] <a href="https://pirsch.io" rel="nofollow">https://pirsch.io</a>[1] <a href="https://github.com/cortesi/modd" rel="nofollow">https://github.com/cortesi/modd</a>[2] <a href="https://pirsch.io/blog/techstack/" rel="nofollow">https://pirsch.io/blog/techstack/</a>

briandilleyabout 3 years ago

> Pre-live environments are never at parity with productionSame with your laptops... and this is only true if you make it that way. Using things like Docker containers eliminates some of the problem with this too.> There’s always a queueThis has never been a problem for any of the teams I've been on (teams as large as ~80 people). Almost never do they "not want your code on there too". Eventually it's all got to run together anyway.> Releases are too largeThis has nothing to do with how many environments you have, and everything to do with your release practices. We try to do a release per week at a minimum, but have done multiple releases in a single day as well.> Poor ownership of changesCode ownership is a bad practice anyway. It allows people to throw their hands up and claim they're not responsible for a given part of the system. A down system is everyone's problem.> People mistakenly let process replace accountabilityAgain - nothing to do with your environments here, just bad development practices.

评论 #30900997 未加载

koffiezetabout 3 years ago

This sounds like an organisational issue, not a technical, and I predict that this simply won't scale organisational-wise. It sounds like they have given no thought about their platform architecture, deploy pipelines, testing strategies, ... It's probably not yet causing issues because they're working in a small team, but rectifying this later will be an absolute pita.That said, at scale, having a big staging/test/... can be impossible, but then things are split up organisationally, each team managing/service group/... managing their own environments, being responsible for the reliability/stability and availability towards other teams.Also, with service meshes it has become feasible to actually test in production so you can let select users end up on specific (test) versions of a certain backend service.

konaraddiabout 3 years ago

This works for services that are growing slowly in features or have few other services integrating with it. I’m not sure how this scales when there are multiple services across multiple teams with dependencies on one another where services are being rapidly developed with new features. At work, we have staging/pre-prod environments across most teams that my team works with so new features can be tested in staging and other teams can test integrating with it. This is also possible to do with just a production environment but requires some engineering effort to add feature flags and special headers indicating a request is from a team looking to try a new API.

quickthrower2about 3 years ago

All of their “problems” with staging are fixable bathwater that doesn't require baby ejection.I avoid staging for solo projects but it does feel a bit dirty.For team work or complex solo projects (such as anything commercial) I would never!On the cloud it is too easy to stage.To the point where I have teared down and recreated staging environment to save a bit of money at times because it is so easy to bring back.The article says to me their not using modern devops practices.It is rare a tech practice “hot take” post is on the money, and this post follows the rule not the exception.Have a staging environment!Just the work / thinking / tech debt payoff to make one is worth it for other reasons: including to streamline your deployment processes both human and in code.

kayodelycaonabout 3 years ago

I don’t see how this works when you have multiple external services you don’t control in critical code paths that you can’t fully test in CI.The cost of maintaining a staging environment is peanuts compared to 30 minutes of downtime or data corruption.

issaabout 3 years ago

I have a lot of questions, but one above all the others. How do you preview changes to non-technical stakeholders in the company? Do you make sales people and CEOs and everyone else boot up a local development environment?

评论 #30900063 未加载

评论 #30900127 未加载

fmakunboundabout 3 years ago

I’m working at megacorp at the moment as contractor. The local dev, cloud dev, cloud stage, cloud prod pipeline is truly glacial in velocity even with automation like Jenkins, kubernetes, etc. it takes weeks to move from dev complete to production. It’s a middle manager’s wet dream.I used to wonder why isn’t megacorp being murdered by competitors delivering features faster, but actually, everyone is moving glacially for the same reason, so it doesn’t matter.I’m kinda reminded by pg’s essay on which competitors to worry about. I might be a worried competitor if these guys are pulling off merging to master as production.

jokethrowawayabout 3 years ago

A previous client was paying roughly 50% of their AWS budget (more than a million per year) just to keep up development and staging.They were roughly 3x machines for live, 2x for staging and 1x for development.Trying to get rid of it didn't work politically, because we had a cyclical contract with AWS where we were committing to spend X amount in exchange for discounts. Also, a healthy amount of ego and managers of managers BS.In terms of what that company was doing, I'm pretty sure I could have exceeded their environment for 2k per month on hetzner (using auction).

ygouzerhabout 3 years ago

It might be enough for this company, but if you are a big corporate, it's definitely not something to do. You cannot expect millions of consumers to just be ok with the fact that the mobile app is done because it's too hard to keep in sync staging and prod.I am maintaining the infra for a big mobile app and our staging environment allowed us in the last year to have only two production incidents and they were not due to code source (networking).I really recommend any serious business to at least try it and see by themselves the advantages

sulamabout 3 years ago

When I started at Twitter, a guy was proposing building a staging environment. At that point it would have cost about $2M, so it was within the realm of conceivable. I was a pretty immediate “no”, for all sorts of reasons. Keeping the data in reasonable shape would have been a big project all by itself, getting reasonable load on it would have been another, and then of course it’s a large environment you need to be on call for but that doesn’t have the priority production has. It’s just an all around bad idea.

lazyantabout 3 years ago

No disrespect but you can do this for an analytics dashboard or a content web site with canaries (Facebook). May not be the best for high liability sites like financial systems.

评论 #30902309 未加载

mannykannotabout 3 years ago

This appears to be just a naming convention issue. All the potential problems of staging environments can occur, for the same underlying reasons, in the approach advocated here, but they don't happen in staging merely because there isn't anything called that.Personally, I think the approach advocated here is feasible, and even necessary if you are operating at global scale, but I am skeptical of tendentious stories about how it makes a number of problems just disappear.

karmasimidaabout 3 years ago

It really dependsWithout staging environment, your chance of finding critical bugs rely on offline testing. Not all bugs can be found in unit tests, you need load tests to detect certain bugs that doesn't break your program from correctness perspective, but on latency/memory leakage front. And such tests might take longer time to run.Staging slows things down, but it is intended, it creates a buffer to observe behavior. Depending on the nature of your service, it can be quite critical.

rurbanabout 3 years ago

In one of my very first jobs in the mid 90ies there was also an incoming team we took over from a major competitor, who made the processes I introduced much simpler by removing dev testing, staging and CVS. They preferred to work as root on the live servers with 80.000 customers. Development was apparently so much easier with immediate feedback.I liked that so much, that I resigned, and found a much better job 2 years later. I guess you could say bad cultural fit.

bob1029about 3 years ago

> Pre-live environments are never at parity with productionAs a B2B vendor, this is a conclusion we have been forced to reach across the board. We have since learned how to convince our customers to test in production.Testing in prod is usually really easy if you are willing to have a conversation with the other non-technical humans in the business. Simple measures like a restricted prod test group are about 80% of the solution for us.

drexlspiveyabout 3 years ago

> Last updated: April 1, 2022

anothernewdudeabout 3 years ago

If they're not a parity then you are doing CI/CD wrong and aren't forcing deploys to staging before production. If you set the pipelines correctly then you *can't* get to production without being at parity with pre-production.> they don’t want your changes to interfere with their validation.Almost like those are issues you want to catch. That's the whole point of continuous integration!

Trasterabout 3 years ago

The only way that you can create stable and safe systems is by introducing processes to ensure that your systems are stable and safe. It doesn't matter how much personal responsibility you claim to take, you are going to make mistakes, and processes are the mechanism for limiting the damage of those mistakes. This is core to the best practices behind any safety critical industry, and is embedded in functional safety. The logic of this article appears to be "We just concentrate very hard to make up for not having a decent staging environment". Which is fine if no one cares if your stuff breaks.>When there is no buffer for changes before they go live, you need to be confident that your changes are fit for production.This is just completely wrong headed. It's like saying you should learn to tight rope walk 100 metres from the ground because it's going to make you concentrate on not falling more. The solution for making mistakes isn't to increase the fallout of those mistakes. You can absolutely build a culture where you value putting the onus on the developer to make sure they have a sense of responsibility for keeping master clean and working, without abandoning the processes that help mitigate when you fail to do that.The funny thing is, that when you see articles saying the opposite of this, almost always they will also say "over the course of X months, our new staging environment caught Y additional bugs that would have impacted production". I'd love to see the same here - some actual data on how much they "We're just going to concentrate harder" impacts production.

pigbearpigabout 3 years ago

> "Last updated: April 1, 2022"April Fools joke? It is the only post on their blog. Or maybe they don't have any customers yet?

cinbun8about 3 years ago

This strategy won't scale beyond a very small team and codebase. The reasons mentioned, such as parity, are worth fixing.

评论 #30901563 未加载

devmunchiesabout 3 years ago

One approach I’m experimenting with is that all services communicate via a message channel (e.g. NATS or Pub/Sub).By doing this, I can run a service locally but connect it to the production pubsub server and then see how it effects the system if I publish events to it locally.I could also subscribe to events and see real production events hitting my local machine.

评论 #30902802 未加载

coldcodeabout 3 years ago

At my previous job we had a single staging environment, which was used by dozens of teams to test independent releases as well as to test our public mobile app before release. That said, it never matched production, so releases were always a crapshoot as things suddenly happened no one ever tested. Yes, it was dumb.

winridabout 3 years ago

This is how we work at fastcomments... soon we will have a shard in each major continent and will just deploy changes to a shard, run e2e tests, and then roll out to the rest of the Shards.But if you have a high risk system or a business that values absolute quality over iteration speed, then yeah you want dev/staging envs...

smokey_circlesabout 3 years ago

I dunno if I'm getting older or if this is as silly as it seems.You don't like pre-live because it doesn't have parity with production, so you use a developers laptop? What???I stopped reading at that point because that's pretty indicative of either a specific niche or a poorly thought out problem/solution set

skeritabout 3 years ago

I currently have this with a client. When I was the only backend developer, the staging server setup worked perfectly because it was basically the master branch + 1 or 2 queued changes. Now that other people have joined the mix it's become more of a nuisance.

shooabout 3 years ago

different business or organisational contexts have different deployment patterns and different negative impacts of failure.in some contexts, failures can be isolated to small numbers of users, the negative impacts of failures are low, and rollback is quick and easy. in this kind of environment, provided you have good observability & deployment, it might be more reasonable to eliminate staging and focus more on being able to run experiments safely and efficiently in production.in other contexts, the negative impacts of failure are very high. e.g. medical devices, mars landers, software governing large single systems (markets, industrial machinery). in these situations you might prefer to put more emphasis on QA before production.

pharmakomabout 3 years ago

What are some useful tools for running a development environment for each dev?I have a pretty common setup of AWS services, Terraform, Docker, etc. Deploying this to a fresh AWS account is largely automatic but it takes about 20 minutes and it’s also expensive.

makachabout 3 years ago

What works for you works for you. If you can't have a staging environment you obviously found a work around. There are many ways to deploy. Basically, you decide what risk you want to accept when you define a lifecycle.

pmarreckabout 3 years ago

I use my staging environment to let prospective clients or colleagues create and play with accounts without touching "real" data, and in the past used it to let a pentester test the non-prod site

awillabout 3 years ago

>>>If we’re not confident that changes are ready to be in production, then we don’t merge them. This usually means we've written sufficient tests and have validated our changes in development.This made me laugh.

krm01about 3 years ago

This isn’t very uncommon. In fact, it actually is exactly what the article is trying to explain it’s not: a staging/pre-live environment. Only instead of having it be deployed online, you keep it local.

Dave3of5about 3 years ago

> Most companies are not prepared to pay for a staging environment identical to productionThat's it basically no need for anything else. You are doubling, tripling ...etc your costs. No company will OK that.

wdbabout 3 years ago

How do people test in production? It's difficult to get test accounts in production because it containing consumer data. Think financial data.Does anyone know any good resources for testing in production?

simonwabout 3 years ago

Without a staging environment, how do you test that large scale database migrations work as intended?I wouldn't feel at all comfortable shipping changes like that which have only been tested on laptops.

评论 #30899936 未加载

bombcarabout 3 years ago

They have a staging environment - they just run production on it.

shaneprrltabout 3 years ago

Does QA just pull and test against a dev instance? Do they test against prod? Do engineers get prod API keys if they have to test an integration with a 3rd party?

ohmanjjjabout 3 years ago

I’ve been shipping software for over two decades, built multiple successful SaaS companies, and have never in my life written a single unit test.

评论 #30904441 未加载

评论 #30902022 未加载

评论 #30900482 未加载

评论 #30900174 未加载

评论 #30902976 未加载

funfunfunctionabout 3 years ago

Infra as code + good modern automation solves the parity issue. I empathize with wanting to stay lean but this seems extreme.

sedatkabout 3 years ago

Problem TL;DR:"With staging:- There could be differences from production- Multiple people can't test at the same time- Devs don't test their code."Solution TL;DR: "Test your code, and push to production."They completely misunderstood the problem and their solution literally changed nothing other than making devs test their code now. Staging could stay as is and would provide some significant risk mitigation with zero additional effort."Whenever we deploy changes, we monitor the situation continuously until we are certain there are no issues."I'm sure customers would stay on the site, monitoring the situation too. Good luck with that strategy.

评论 #30900704 未加载

hutrdvnjabout 3 years ago

It's about risk acceptance. What could go wrong without an staging environment, seriously?

chrisshrobaabout 3 years ago

Just wondering, what does this phrase mean?> If we ever have an issue in production, we always roll forward.

评论 #30901067 未加载

kingcharlesabout 3 years ago

Some places don't even have dev. It's all on production."Fuck it, we'll do it live!"

chagaifabout 3 years ago

At our company we each have a few servers so we can run tests on those servers ourselves

hpenabout 3 years ago

Well of course you can ship faster — But that’s not the point of a staging environment!

epolanskiabout 3 years ago

> If we ever have an issue in production, we always roll forward.What does it mean to roll forward?

评论 #30900936 未加载

评论 #30900920 未加载

victor_eabout 3 years ago

This can pass for a startup but not for a proper business with customers.Do you get audited?

drcongoabout 3 years ago

I don't recognise any of those "problems" with staging.

iLoveOncallabout 3 years ago

Well I guess Squeaky.ai goes on the list of companies to never use.

meow_mixabout 3 years ago

It's called squeaky because it's always breaking

cosmiccatnapabout 3 years ago

This is currently how my job works and it's hell.

评论 #30900126 未加载

joshxyzabout 3 years ago

You know what they say, test in prod haha.

nvaderabout 3 years ago

Published April 1st. Ooh, nice try.

pkruminsabout 3 years ago

Staging is bloat. You want to get to production asap. Always deploy to production only.

_ZeD_about 3 years ago

(yet)

okamiueruabout 3 years ago

My experience with their list of suppositions:> Pre-live environments are never at parity with productionMy experience is that is is fairly trivial to have feature parity with production. Whatever you do for production, just do it again for staging. That's what it is meant to be.> Most companies are not prepared to pay for a staging environment identical to productionAu contraire. All companies I've been to are more than willing to pay this. And secondly, it is pennies compared to production environment costs, because it isn't expected to handle any significant load. And, the article does mention being able to handle load as being one of the things that differ. I have not yet found the need to use changes to staging to verify load scaling capabilities.> There’s always a queueI don't undestand this paraph at all. It seems like an artificial problem created by how they handle repository changes, and has little to do with the purpose of a staging environment. It smells fishy to have local changes rely on a staging environment. The infrastructure I set up had a development environment be spun up and used for a development testing pipeline. Doesn't, and shouldn't need to rely on staging.> Releases are too largeWell... one of the main benefits of having a staging environment is to safely do frequent small deployments. So this just seems like the exact wrong conclusion.> Poor ownership of changesThis again, is not at all how I understand code should be shipped to a staging environment. "I’ve seen people merge, and then forget that their changes are on staging". What does this even mean? Surely, staging is only ever something that is deployed to from the latest release branch, which also surely comes from a main/master? The following "and now there are multiple sets of changes waiting to be released", also suggest some fundamental misunderstanding. *Releases* are what are meant to end up in staging. <Multiple set of changes> should be *a* release.> People mistakenly let process replace accountability > "By utilising a pre-production environment, you’re creating a situation where developers often merge code and “throw it over the fence"Again. Staging environment isn't a place where you dump your shit. "Staging" is a place where releases are verified in an as much-as-possible-the-same-environment-as-production. So, again. This seems like entirely missing the point.----It seems to me that they don't use a staging environment, because they don't understand what such a thing should be used for. I'd be completely OK with someone rationalizing this as "too much of a hassle". But to try and justify something so poorly...From their conclusion:> Dropping your staging environment in favour of true continuous integration and deployment can create a different mindset for shipping software. When there is no buffer for changes before they go live, you need to be confident that your changes are fit for production. You also need to be alert and take full ownership of any changes you make.Well... of course there is a shift in mindset when you'll be shitting your pants every time you make a change in production, since that's when you'll get to see if you broke something. The whole point of a staging environment is to have a buffer.... so that you don't have to be "confident". So that you don't have to be on high alert, because you have alerts that can trigger without anything important going offline. So that ownership isn't crucial in a post-fuck-up blame game.

rio517about 3 years ago

I struggle with a lot of the arguments made here. I think one key thing is that staging can mean different things. In the authors case, they say "can’t merge your code because someone else is testing code on staging." It is important to differentiate between this type of staging for development testing development branches vs a staging where only what's already merged for for deployment is automatically deployed.Many of the problems are organizational/infrastructure challenges, not inherent to staging environments/setups. Straightening out dev processes and investing in the infrastructure solves most of the challenges discussed.Their points:What's wrong with staging environments?* "Pre-live environments are never at parity with production" - resolved with proper investment in infrastructure.* "There’s always a queue [for staging]" - is staging the only place to test pre-production code? If you need a place to test code that isn't in master, consider investing in disposable staging environments or better infrastructure so your team has more confidence for what they merge.* "Releases are too large" - reduced queues reduces deployment times. Manage releases so they're smaller.* "Poor ownership of changes" Of course this happens with all that queued code. address earlier challenges and this will be massively mitigated. Once there, good mangers's job is to ensure this doesn't happen.* "People mistakenly let process replace accountability" - this is a management problem.Solving some of the above challenges with the right investments creates a virtuous cycle of improvements.How we ship changes at Squeaky?* "We only merge code that is ready to go live" - This is quite arbitrary. How do you define/ensure this?* "We have a flat branching strategy" - Great. It then surprises me that they have so much queued code and such large releases. I find it surprising they say, "We always roll forward." I wonder how this impacts their recovery time.* "High risk features are always feature flagged" - do low risk features never cause problems?* "Hands-on deployments" - I'm not sure this is good practice. How much focus does it take away from your team? Would a hands-off deployment with high confidence pre-deploy, automated deployment, automated monitoring and alerting, while ensuring the team is available to respond and recover quickly?* "Allows a subset of users to receive traffic from the new services while we validate" is fantastic. Surprised they don't break this into its own thing.

pmoriartyabout 3 years ago

This sounds horrible unless they have a super reliable way to roll back changes to a consistent working state, both in their deployments and their databases.

评论 #30899476 未加载

评论 #30899522 未加载

评论 #30899599 未加载

NorwegianDudeabout 3 years ago

Staging, tests, previews and even running code locally is for people who make mistakes. It's dumb and a total waste of time if you don't make any mistakes.No testing at all, that's what I call optimizing for success!On a more serious note: Sometimes staging is the same as local, and in those situations there is very limited use for staging.

评论 #30900256 未加载

评论 #30900613 未加载

kafrofriteabout 3 years ago

- I don't always test my code but when I do, it's in production.- Everyone has a testing environment. Some people are lucky enough that they have a separate one for running production[INSERT ADDITIONAL JOKES HERE]

评论 #30900677 未加载

mdomsabout 3 years ago

> "We only merge code that is ready to go live"I like to go even farther, I advocate only merging code that won't break anything. If you're feature flagging as many changes as possible then you can merge code that doesn't even work, as long as you can gate users away from it using feature flags. The sooner and more often you can integrate unfinished code (safely) into master the better.

teenabout 3 years ago

Imagine writing this entire blog post and being completely wrong about every topic you discuss. This is the most amateur content I've seen make it to the front page, let alone top post.

评论 #30900995 未加载