I don't understand why people think this is a great response. They know how their routing works, just say so. It can't be that hard to give a basic overview of it before they release a more comprehensive post.<p>As for the comment "Improving our documentation and website to accurately reflect our product". That is a very round about way of saying "our website indicates our service does things that it does not" which is a VERY bad thing. People are paying for this service based on what Heroku claims it does.<p>If the website has been inaccurate for years, that is false advertising and really a bigger problem than they are giving credit to.<p>If anything, I am more disappointed now that I have read this response, it has not appeased anything.
That's actually a pretty impressive response as far as it goes. Obviously there's no details at this point, but he absolutely takes responsibility, doesn't try to deflect or sugar coat it, and manages to find a tone that is both professional/serious, yet also down-to-earth and earnest. I guess the real impact will be how they go about "making it right" but in terms of a first response to the situation the tone is near perfect.
It's a good response in that they <i>are</i> taking responsibility, but it is pretty obvious that they are reluctant to say anything about a fix. In my mind, "it's hard" isn't a valid excuse in this case, especially when there are relatively straightforward solutions that will solve this at a practical level. For example, you could imagine a naive form of intelligent routing that would work simply by keeping a counter per dyno:<p>- request comes in and gets routed to the dyno with the lowest count. Inc the count.<p>- response goes out. Dec the counter.<p>Since they control the flow both in and out, this requires at most a sorted collection of counters and would solve the problem at a "practical" level. Is it possible to still end up with one request that backs up another one or two? Sure. Is it likely? No. While this isn't as ideal as true intelligent routing, I think it's likely the best solution in a scenario where they have incomplete information about what a random process on a dyno can reliably handle (which is the case on the cedar stack).<p>Alternatively, they could just add some configuration that allows you to set the request density and then you <i>could</i> bring intelligent routing back. The couple of milliseconds that lookup/comparison would take is far better than the scenario they're in now.<p>EDIT: I realized my comment could be read as though I'm suggesting this naive solution is "easy". At scale it certainly isn't, but I do believe it's possible and as <i>this</i> is their business, that's not a valid reason to do what they are.
There is a perverse conflict with platform service providers - the worse your scheduler performs the more profitable your service will be.<p>You replace intelligent request scheduling with more hardware and instances, which you charge the user for.<p>How much investment is there in platform service providers towards developing better schedulers that would reduce the number of instances required to serve an application? That answer, in this case, is "not a lot"<p>The incentives between provider and user are not aligned, which is why I am more inclined to buy and manage at a layer lower with virtual machines.<p>Edit: AppEngine went through a similar issue. Here is an interesting response from an engineer on their team:<p><a href="https://groups.google.com/forum/#!msg/google-appengine/y-LnZ2WYJ5Q/j_w13F4oSSkJ" rel="nofollow">https://groups.google.com/forum/#!msg/google-appengine/y-LnZ...</a>
Wow, I feel like Heroku is really dropping the ball here. Like, they are acting punch drunk or something. Basically all this says is "we hear you and we are sorry". They could have posted that a day ago. This still says nothing about what is wrong and what they are doing to fix it.<p>Also, I'm not sure at what point this is, but at some point around say $3-5k a month, (100+ dynos) you really should rethink using Heroku. At that point and higher, you really ought to know about your infrastructure enough to optimize for scale. The "just add more dynos" approach is stupid because adding more web fronts is often the lazy/expensive approach. Add a few queues or some smarter caching and you'll need fewer web servers. Throw in something like Varnish where you can and you need even fewer servers. Point being, at some point scaling is no longer "free", it takes work and Heroku isn't magic.
This is a horribly inadequate response. Prices for hardware have dropped 30% over the last 3 years and heroku is admitting their performance has degraded by many orders of magnitude. It's completely unacceptable to simply say, "yeah there's a problem, we'll give you some metrics to understand it better."<p>Sure, it's great they responded. The response should be "you're right, we are fixing it and issue credits" for revenue gained from fraudulent claims about the performance of their product and a credibility straining bait-and-switch.
Most people are going to come here and mention how they are not planning on fixing the problem.<p>Put it into context. Heroku made this change 3 years ago, and also has had no issues admitting the change to users. Their documentation has lagged far behind and I believe they will be more transparent in the future. This is an engineering decision they made a long time ago that happened to get a lot of PR in the past 24 hours. Until there is a business reason (losing customers), I don't see them "fixing" the problem.
What the hell? It's good he owned up...I guess. But the response basically sounds like "yeah, we've been charging the same prices over the last few years for increasingly degraded performance and we would have continued to do so, but someone finally caught on so I guess we have to now do something about this, right?"
I think this is really a fine response considering the pretty terrible way the original post was written and the community responded. The simulation was a bit of a stretch because the supposed number of servers you need to achieve "equivalent" performance is highly dependent on how slow your worst case performance is, and if your worst case isn't that bad the numbers look a lot better. Don't remember the precise math, but back when I studied random processes we studied this problem and the conclusion was that randomly routing requests is generally not that much worse than doing the intelligent thing, and doing the intelligent thing is nowhere near as trivial as Rapgenius and random HN posters would have you believe. Given generally well behaved requests he random solution should be maybe 2-3x worse but nothing near 50x worse.<p>And besides, I really don't see why someone who needs that many dynos is still on Heroku.
What I find incredibly irritating about this blog response by Heroku is that it took a very visible post on Hackernews for them to act and reconsider their way of doing business.<p>They saw the potential loss in customers, and then acted. What this means is that they never had in mind to provide the best support and product they could for their customers before this news broke out.<p>Sad.
Credit for owning the scope of the problem (allowing serious discrepancies for 3 years), which is sure to cost them trust from the community. But the skeptic in me reminds me that it's likely there was no way out of admitting it.<p>What disheartens me is that the documentation discrepancy caused real, extremely substantial aggregate monetary impact on customers, yet there is no mention of refunds. Perhaps that will come, but in my opinion, anything short of that is just damage control.<p>This is a time excessively demonstrate integrity, for them to go above and beyond. It's in their interest not to just paper over the whole thing.
It's so refreshing to see this kind of communication. I don't use Heroku, and don't know much about this specific issue, but they're responses to downtime and complaints have been so direct and BS-free that I'll definitely consider them when I need a PaaS.
I feel like there is an answer for this, but why are two companies in the "YC family" at odds so publicly? If RapGenius is "starting beef" like is done in the music industry, I find it odd that it would happen with someone on their own "label".<p>Perhaps this is ignorance on my behalf of how companies who have already been sold (Heroku) fit into the picture, but some explanation would be appreciated.
Well, let's put it like this. Those of us who know our programming shit and aren't afraid of a little math know exactly what has being going on here and that this answer is pretty much BS (what else is he supposed to say? basically he makes minimal concessions given the facts).
<i>Working closely with our customers to develop long-term solutions</i><p>Of the five action items they listed, it seems that only the last of them is about actually solving the problem. I hope they are committed to it - better visibility of the problem can help, but I'd rather not have the problem in the first place.
It seems strange for me to read in Heroku's response how forthcoming they are to accept blame and responsibility for the "a degradation in performance over the past 3 years".<p>Yet they state their action plan to "fix" this issue is to update their DOCUMENTATION and no mention of fixing the DEGRADATION issues itself.<p>Just bizarre.
I'm very curious to see what the technical review turns up tomorrow.<p>This feels like something that would have been connected to the Salesforce acquisition 3 years ago, and then making the service less efficient in order to increase profits or revenue targets on paid accounts. Not to mention saving money on the free ones.<p>It would be a little bit like Tesla not only selling you the Model S, but also selling you the electricity you charge the vehicle with. At some point, they make the car less efficient, forcing you to charge more often, and then claiming they didn't document this very well. Frankly, there are only so many people who will be a capable enough electrical engineer (or in Heroku's case, a sysadmin) to catch the difference and measure it.<p>The apology should be, "we misled you, and betrayed your trust. Here's how we're planning on resolving that, and working to rebuild our relationship with our customers over the next year. [Insert specific, sweeping measures...]
Seems to me like a classy response to a real problem from Heroku.<p>We all need to remember that there are no magic bullets. The fact that Heroku can get a startup to, say, 5M uniques per day by dragging some sliders on a web panel and running up a bill on a corporate AMEX is pretty impressive.<p>At some point scaling a web business becomes a core competency and one needs to deal with it. I'm guessing by the time scaling an app on Heroku becomes an issue, if better understanding your scaling needs and handling them directly isn't going to save you a TON of money, your business model is probably broken.
So do the issues in the RapGenius post only affect those on the Bamboo stack? I'm procrastinating migrating to Cedar now but this could be a very good reason.<p>Also, I really love seeing a company take responsibility like this. I know the situations (and the stakes) are not comparable but this is a lot better than what Musk did when Tesla got a bad review. As a company just take the blame and say you can and will fix it, that's good enough for most people.
Honest question, why would Rapgenuis still be on Heroku if the y needed 100 dynos? Why not go directly to AWS at that scale? The cost savings would be pretty significant. Am I missing something?
Wait, so those guys were on Bamboo, and complaining?
Fuck, that is so not cool.<p>We've been on cedar ever since it launched, and been running puma threads or unicorn workers.
The idea of one dyno per request is bullshit, and I wasn't sure if they were on cedar or not. A dyno is an allocated resource (512mb, not counting db, k/v store etc)<p>How ballsy of them to complain when they are doing it wrong.