Heroku - Bamboo Routing Performance

343 pointsby nigmaover 12 years ago

32 comments

redguavaover 12 years ago

I don't understand why people think this is a great response. They know how their routing works, just say so. It can't be that hard to give a basic overview of it before they release a more comprehensive post.As for the comment "Improving our documentation and website to accurately reflect our product". That is a very round about way of saying "our website indicates our service does things that it does not" which is a VERY bad thing. People are paying for this service based on what Heroku claims it does.If the website has been inaccurate for years, that is false advertising and really a bigger problem than they are giving credit to.If anything, I am more disappointed now that I have read this response, it has not appeased anything.

评论 #5224853 未加载

评论 #5224879 未加载

评论 #5225526 未加载

评论 #5228903 未加载

antokoover 12 years ago

That's actually a pretty impressive response as far as it goes. Obviously there's no details at this point, but he absolutely takes responsibility, doesn't try to deflect or sugar coat it, and manages to find a tone that is both professional/serious, yet also down-to-earth and earnest. I guess the real impact will be how they go about "making it right" but in terms of a first response to the situation the tone is near perfect.

评论 #5225678 未加载

评论 #5225083 未加载

评论 #5224875 未加载

ibdknoxover 12 years ago

It's a good response in that they are taking responsibility, but it is pretty obvious that they are reluctant to say anything about a fix. In my mind, "it's hard" isn't a valid excuse in this case, especially when there are relatively straightforward solutions that will solve this at a practical level. For example, you could imagine a naive form of intelligent routing that would work simply by keeping a counter per dyno:- request comes in and gets routed to the dyno with the lowest count. Inc the count.- response goes out. Dec the counter.Since they control the flow both in and out, this requires at most a sorted collection of counters and would solve the problem at a "practical" level. Is it possible to still end up with one request that backs up another one or two? Sure. Is it likely? No. While this isn't as ideal as true intelligent routing, I think it's likely the best solution in a scenario where they have incomplete information about what a random process on a dyno can reliably handle (which is the case on the cedar stack).Alternatively, they could just add some configuration that allows you to set the request density and then you could bring intelligent routing back. The couple of milliseconds that lookup/comparison would take is far better than the scenario they're in now.EDIT: I realized my comment could be read as though I'm suggesting this naive solution is "easy". At scale it certainly isn't, but I do believe it's possible and as this is their business, that's not a valid reason to do what they are.

评论 #5224514 未加载

评论 #5224451 未加载

评论 #5224558 未加载

评论 #5224843 未加载

nikcubover 12 years ago

There is a perverse conflict with platform service providers - the worse your scheduler performs the more profitable your service will be.You replace intelligent request scheduling with more hardware and instances, which you charge the user for.How much investment is there in platform service providers towards developing better schedulers that would reduce the number of instances required to serve an application? That answer, in this case, is "not a lot"The incentives between provider and user are not aligned, which is why I am more inclined to buy and manage at a layer lower with virtual machines.Edit: AppEngine went through a similar issue. Here is an interesting response from an engineer on their team:<a href="https://groups.google.com/forum/#!msg/google-appengine/y-LnZ2WYJ5Q/j_w13F4oSSkJ" rel="nofollow">https://groups.google.com/forum/#!msg/google-appengine/y-LnZ...</a>

评论 #5225732 未加载

评论 #5224951 未加载

programminggeekover 12 years ago

Wow, I feel like Heroku is really dropping the ball here. Like, they are acting punch drunk or something. Basically all this says is "we hear you and we are sorry". They could have posted that a day ago. This still says nothing about what is wrong and what they are doing to fix it.Also, I'm not sure at what point this is, but at some point around say $3-5k a month, (100+ dynos) you really should rethink using Heroku. At that point and higher, you really ought to know about your infrastructure enough to optimize for scale. The "just add more dynos" approach is stupid because adding more web fronts is often the lazy/expensive approach. Add a few queues or some smarter caching and you'll need fewer web servers. Throw in something like Varnish where you can and you need even fewer servers. Point being, at some point scaling is no longer "free", it takes work and Heroku isn't magic.

评论 #5226075 未加载

aneth4over 12 years ago

This is a horribly inadequate response. Prices for hardware have dropped 30% over the last 3 years and heroku is admitting their performance has degraded by many orders of magnitude. It's completely unacceptable to simply say, "yeah there's a problem, we'll give you some metrics to understand it better."Sure, it's great they responded. The response should be "you're right, we are fixing it and issue credits" for revenue gained from fraudulent claims about the performance of their product and a credibility straining bait-and-switch.

salman89over 12 years ago

Most people are going to come here and mention how they are not planning on fixing the problem.Put it into context. Heroku made this change 3 years ago, and also has had no issues admitting the change to users. Their documentation has lagged far behind and I believe they will be more transparent in the future. This is an engineering decision they made a long time ago that happened to get a lot of PR in the past 24 hours. Until there is a business reason (losing customers), I don't see them "fixing" the problem.

评论 #5224462 未加载

xwowsersxover 12 years ago

What the hell? It's good he owned up...I guess. But the response basically sounds like "yeah, we've been charging the same prices over the last few years for increasingly degraded performance and we would have continued to do so, but someone finally caught on so I guess we have to now do something about this, right?"

ibrahimaover 12 years ago

I think this is really a fine response considering the pretty terrible way the original post was written and the community responded. The simulation was a bit of a stretch because the supposed number of servers you need to achieve "equivalent" performance is highly dependent on how slow your worst case performance is, and if your worst case isn't that bad the numbers look a lot better. Don't remember the precise math, but back when I studied random processes we studied this problem and the conclusion was that randomly routing requests is generally not that much worse than doing the intelligent thing, and doing the intelligent thing is nowhere near as trivial as Rapgenius and random HN posters would have you believe. Given generally well behaved requests he random solution should be maybe 2-3x worse but nothing near 50x worse.And besides, I really don't see why someone who needs that many dynos is still on Heroku.

评论 #5224523 未加载

评论 #5224602 未加载

drchiuover 12 years ago

What I find incredibly irritating about this blog response by Heroku is that it took a very visible post on Hackernews for them to act and reconsider their way of doing business.They saw the potential loss in customers, and then acted. What this means is that they never had in mind to provide the best support and product they could for their customers before this news broke out.Sad.

mcgwizover 12 years ago

Credit for owning the scope of the problem (allowing serious discrepancies for 3 years), which is sure to cost them trust from the community. But the skeptic in me reminds me that it's likely there was no way out of admitting it.What disheartens me is that the documentation discrepancy caused real, extremely substantial aggregate monetary impact on customers, yet there is no mention of refunds. Perhaps that will come, but in my opinion, anything short of that is just damage control.This is a time excessively demonstrate integrity, for them to go above and beyond. It's in their interest not to just paper over the whole thing.

spankaleeover 12 years ago

It's so refreshing to see this kind of communication. I don't use Heroku, and don't know much about this specific issue, but they're responses to downtime and complaints have been so direct and BS-free that I'll definitely consider them when I need a PaaS.

ivzarover 12 years ago

I feel like there is an answer for this, but why are two companies in the "YC family" at odds so publicly? If RapGenius is "starting beef" like is done in the music industry, I find it odd that it would happen with someone on their own "label".Perhaps this is ignorance on my behalf of how companies who have already been sold (Heroku) fit into the picture, but some explanation would be appreciated.

评论 #5224448 未加载

auggieroseover 12 years ago

Well, let's put it like this. Those of us who know our programming shit and aren't afraid of a little math know exactly what has being going on here and that this answer is pretty much BS (what else is he supposed to say? basically he makes minimal concessions given the facts).

timothyaover 12 years ago

Working closely with our customers to develop long-term solutionsOf the five action items they listed, it seems that only the last of them is about actually solving the problem. I hope they are committed to it - better visibility of the problem can help, but I'd rather not have the problem in the first place.

评论 #5224475 未加载

RaphiePSover 12 years ago

Interesting -- he seems to be saying that they'll explain all about the problem, but not do anything about it.

评论 #5224401 未加载

评论 #5224406 未加载

评论 #5224386 未加载

评论 #5227798 未加载

评论 #5224387 未加载

kevinfatover 12 years ago

Can someone explain, to people who know nothing about scaling infrastructure, why routing to idle dynos is a hard problem?

评论 #5224505 未加载

评论 #5224484 未加载

评论 #5224529 未加载

encodererover 12 years ago

(Wonders what this response would look like if Elon Musk was running Heroku.)

评论 #5224538 未加载

damian2000over 12 years ago

So the issue only affects Bamboo? that's what it seems to be saying

评论 #5227006 未加载

zensavonaover 12 years ago

Maybe I'm missing something here, this response speaks specifically about Bamboo - do all new services now not run on Cedar?

mhartlover 12 years ago

This is a great response, and I'll look forward to the follow-ups in the days to come. Kudos to the Heroku team. Bravo.

tyler_gradyover 12 years ago

Is it me not understanding disqus, or did Heroku's moderator just deleted my comment?

评论 #5224507 未加载

alberthover 12 years ago

It seems strange for me to read in Heroku's response how forthcoming they are to accept blame and responsibility for the "a degradation in performance over the past 3 years".Yet they state their action plan to "fix" this issue is to update their DOCUMENTATION and no mention of fixing the DEGRADATION issues itself.Just bizarre.

评论 #5227721 未加载

austingunterover 12 years ago

I'm very curious to see what the technical review turns up tomorrow.This feels like something that would have been connected to the Salesforce acquisition 3 years ago, and then making the service less efficient in order to increase profits or revenue targets on paid accounts. Not to mention saving money on the free ones.It would be a little bit like Tesla not only selling you the Model S, but also selling you the electricity you charge the vehicle with. At some point, they make the car less efficient, forcing you to charge more often, and then claiming they didn't document this very well. Frankly, there are only so many people who will be a capable enough electrical engineer (or in Heroku's case, a sysadmin) to catch the difference and measure it.The apology should be, "we misled you, and betrayed your trust. Here's how we're planning on resolving that, and working to rebuild our relationship with our customers over the next year. [Insert specific, sweeping measures...]

podpersonover 12 years ago

Seems to me like a classy response to a real problem from Heroku.We all need to remember that there are no magic bullets. The fact that Heroku can get a startup to, say, 5M uniques per day by dragging some sliders on a web panel and running up a bill on a corporate AMEX is pretty impressive.At some point scaling a web business becomes a core competency and one needs to deal with it. I'm guessing by the time scaling an app on Heroku becomes an issue, if better understanding your scaling needs and handling them directly isn't going to save you a TON of money, your business model is probably broken.

tomlemonover 12 years ago

Rap Genius cofounder:Our response: <a href="http://rapgenius.com/Oren-teich-bamboo-routing-performance-lyrics" rel="nofollow">http://rapgenius.com/Oren-teich-bamboo-routing-performance-l...</a>

habosaover 12 years ago

So do the issues in the RapGenius post only affect those on the Bamboo stack? I'm procrastinating migrating to Cedar now but this could be a very good reason.Also, I really love seeing a company take responsibility like this. I know the situations (and the stakes) are not comparable but this is a lot better than what Musk did when Tesla got a bad review. As a company just take the blame and say you can and will fix it, that's good enough for most people.

twogover 12 years ago

Honest question, why would Rapgenuis still be on Heroku if the y needed 100 dynos? Why not go directly to AWS at that scale? The cost savings would be pretty significant. Am I missing something?

评论 #5224477 未加载

评论 #5225048 未加载

mattquirosover 12 years ago

Did they just say that they have no plans to return to intelligent routing, just making naive routing more visible to you?

instakillover 12 years ago

Bamboo routing? Is Cedar not affected?

评论 #5225377 未加载

wowzerover 12 years ago

At this point they haven't really done anything. I'm really curious to see what they come up with.

seivanover 12 years ago

Wait, so those guys were on Bamboo, and complaining? Fuck, that is so not cool.We've been on cedar ever since it launched, and been running puma threads or unicorn workers. The idea of one dyno per request is bullshit, and I wasn't sure if they were on cedar or not. A dyno is an allocated resource (512mb, not counting db, k/v store etc)How ballsy of them to complain when they are doing it wrong.

评论 #5224497 未加载

评论 #5224403 未加载

评论 #5224476 未加载

评论 #5224434 未加载

评论 #5224480 未加载

评论 #5224393 未加载