Money Trees – Rap Genius Response to Heroku

283 pointsby tomlemonover 12 years ago

23 comments

> This works for future customers, since once Heroku makes these documentation changes, everyone who signs up will understand exactly how routing works. But it does nothing to address the time and money that existing customers have spent over the past few years. What does Heroku owe them?Lawyers -- well, judges really -- are good at coming up with answers for this exact sort of question.I am not being facetious. There are legal rules for assessing losses in even very complex, very entangled situations. If you feel Heroku has dudded you, find a torts lawyer.Heck, Salesforce.com have deep pockets. Round up a few other $20k/month customers and start a class action.Web companies need to realise that boring old-fashioned rules like "your claims should not be misleading" apply to them too.(IANAL, TINLA)

评论 #5237936 未加载

tomlemonover 12 years ago

If you use Heroku and New Relic, make sure you install the gem we wrote to make New Relic report correct queue times: <a href="https://github.com/RapGenius/heroku-true-relic" rel="nofollow">https://github.com/RapGenius/heroku-true-relic</a>

评论 #5237878 未加载

homosaurover 12 years ago

Man, there is a LOT of expertise over there at Rap Genius just to have a website where you can figure out what "hollatickin" means.

评论 #5237827 未加载

评论 #5237818 未加载

评论 #5238759 未加载

评论 #5238637 未加载

评论 #5237844 未加载

gojomoover 12 years ago

My guess as an armchair observer (and tiny-scale Heroku user) would be that Heroku will offer some affected customers refunds, especially if those customers "threw dynos" at latency problems that were aggravated by the drift in Bamboo routing behavior and hidden by the misleading NewRelic monitoring.I don't think Adam@Heroku's response on the 11th is that bad. He accepts the feedback and also wants Heroku to help RapGenius 'modernize their stack'. That's not a full and proper solution, nor a remedy for the lost cost/effort so far, but it would have offered a lot of performance and cost relief.In fact, I think that's why this problem festered: many customers managed to soften the pain by going to Cedar, multiple-workers, app-optimizations, and more dynos... so deeper investigations kept getting backburnered, both inside and outside Heroku, until now.RapGenius has done us a mitzvah by finally digging deeper, but I'm still eager to see what Heroku thinks the right remedies are, beyond RapGenius's 'must do' ultimatums.

评论 #5239171 未加载

评论 #5239008 未加载

jlouisover 12 years ago

There are still some important points missing from the discussion:1. Operating at scale with parallel routing. 2. Handle faults while operating at scale with parallel routing 3. Providing correct statistical models for the situation. The one we have right now is a crude approximation. 4. Measuring on the real system for problems.The optimum routing is to have each dyno with 0 or 1 job at a time and a global queue of all incoming requests. But this is a latency problem then since it takes time for a dyno to tell that it is "ready". The net result is very bad performance and the global queue is a single point of failure. The solution is to queue because this removes the latency --- but with the price you see RG paying if a Dyno can only serve one request at a time.If a dyno does not report "ready" to the routing mesh, then you can't route optimally:Queue length doesn't work since a request in queue may take 7000ms while still having a length of 1. Another queue with length 5 consisting of 5 70ms requests is better to route to.The time the last request spent in queue is not useful either because the very next message may be a 7000ms one.So to solve this problem, you must do something else. You cannot use "intelligent routing" unless you can describe how it will work distributed with, say, 8 routing machines while avoiding latency. And while you are at it, you better measure your solution in a real-world scenario.

评论 #5238584 未加载

评论 #5238475 未加载

评论 #5241845 未加载

评论 #5239286 未加载

goronbjornover 12 years ago

This incident has done wonders for RapGenius's technical brand. I don't know how many people would've identified them as a 'tech company' before, but that number has surely gone up.

pseutover 12 years ago

Guys, you've made a lot more money than me, so you don't need my advice. But if you want money back, you should probably be communicating in private through your lawyers. Posts like this look like you're trying to get (more) attention.

评论 #5237854 未加载

评论 #5237858 未加载

评论 #5237835 未加载

评论 #5237892 未加载

评论 #5237928 未加载

评论 #5238737 未加载

socialist_coderover 12 years ago

Heroku's suggestion: "modernize and optimize your web stack."I don't have any experience with Ruby web stacks so I'm curious if this is actually an option for you guys? What would it take to do that? Would the performance increase on Heroku be worth it?It also seems like if you wanted to self host you would probably need to do those same improvements, right?Please don't take my comment the wrong way, I'm not trying to say Heroku is somehow excused from their mistakes here. I'm just trying to understand that suggestion from Heroku.

评论 #5238853 未加载

instakillover 12 years ago

I've lost a lot of faith in Heroku this last week. Going to be doing a lot of investigating Cloud66/Elastic Beanstalk + EC2 for my Rails app. Good excuse to up my sysadmin abilities a bit.

bradleyjgover 12 years ago

Why does Adam Wiggins repeatedly use the word 'evolve' as a transitive verb in an awkward fashion? Is this some sort of start-up usage that I managed to avoid thus far?"We're working on evolving away from the global backlog concept in order to provide better support for different concurrency models, and the docs are no longer accurate.""Getting user perspective is very helpful and I'll apply your feedback as we continue to evolve our product.""You're correct that we've made some product decisions over the past few years that have evolved our HTTP routing layer away from the "intelligent routing" approach that we used in 2009."Evolve to me connotates natural selection -- which is rather more haphazard than I would hope for from a engineering process.

评论 #5238625 未加载

评论 #5239630 未加载

评论 #5241865 未加载

tibbonover 12 years ago

Maybe this is offtopic, but I really don't like the way Rap Genius does links. It makes it so I essentially have to click on each link twice to get to what it actually goes to...

评论 #5237882 未加载

评论 #5238752 未加载

评论 #5238497 未加载

zmitriover 12 years ago

I'm sorry, but I don't understand any of this hating on Rap Genius.There's a reason they are the fastest growing YC company ever, and got a16z in for 15M -- because they are straight killers. They have quietly created an internet empire until this point, and are building something that people love and use everyday.A lot of folks wouldn't have the chutzpah to call out Heroku like that or are just too small to make this kind of attention. To me it seems as though they are helping Ruby devs save money and time. 8 dynos vs 4 dynos is a hell of a big difference when you're starting out. Also, seems like something that would be pretty fun to do if you worked there.

评论 #5238872 未加载

friendstockover 12 years ago

Thank you so much for forcing Heroku to confront this issue!We've been seeing strange delays and optimizing based on New Relic for a long time... and whenever we reported this to Heroku, they would not admit to an issue.We ended up using threads (on cedar stack) to get more concurrency per dyno.

WillieBKevinover 12 years ago

Wonder how all of these people are feeling right now..<a href="http://success.heroku.com/" rel="nofollow">http://success.heroku.com/</a>

评论 #5239678 未加载

porkerover 12 years ago

"Explain Now, as Rap Genius is widely known for its expertise in queuing theory" Is this true, or are they being sarcastic that if they could do it Heroku really should've?

评论 #5238097 未加载

评论 #5239431 未加载

erichoceanover 12 years ago

Ironically, it's possible to get a huge gain over purely random load balancing by examining just two queues at random -- essentially, you should always be doing this since the cost is O(1) and the improvement is large.[0] This doesn't require any distributed locking and at least would qualify as "intelligent" routing -- probably the bare minimum needed to justify that marketing label.Oh, and it also scales incredibly well. Like I said, there's no reason not to use it over purely random load balancing.[0] <a href="http://www.eecs.harvard.edu/~michaelm/postscripts/mythesis.pdf" rel="nofollow">http://www.eecs.harvard.edu/~michaelm/postscripts/mythesis.p...</a>

评论 #5247199 未加载

bshanksover 12 years ago

Heroku should have done something about the issue earlier, but it seems like the problem was just poor prioritization/time management on their end. Yes, these posts got them to finally get moving, but i wonder if perhaps RapGenius could have had the same effect by continuing to bug them privately in the same unyielding manner, instead of going public with it so quickly. That would have allowed Heroku to have focused their energy on fixing the problem, rather than upon worrying about PR and class action lawsuits.Also, on the topic of lawsuits, how many small startups will go out of business if they get hit with a class action lawsuit every time their documentation accidentally diverges from reality? In this case, RapGenius is small and Salesforce is big, but the legal system will apply the same standard when the plaintiff is big and the defendent is poor. If this becomes precedent, then soon we will have lawyers trying to treat any public post by company employees as 'documentation', forcing startups to have a policy of not allowing their employees to freely help others with their product in public forums. Also, any small startup with a large competitor will have the large competitor paying people to sign up for the product with the sole intent of finding a bug in the documentation so that the small startup can be sued out of business.

评论 #5241686 未加载

STRMLover 12 years ago

I agree that Heroku's response is pretty unbelievable and their engineering choices very suspect. Reading the email chain between Tom & Adam really drives home how badly this has been handled by Heroku.Heroku is massively crippling its own product with random routing. Other cloud providers have been able to get this right, and Heroku very obviously knows what kind of applications are running on its server (e.g. deploy a Rails application, Heroku says "Rails" in the console). It would not be difficult to apply different routing schemes for each type of application.Given that this has been going on for years now, Heroku is either acting with pronounced malice or incompetence. Any competent engineer would not be satisfied with switching the routers over to random and calling it a day. How could that have possibly been approved, then remained for years? They must not have realized what a grave mistake it is.The #1 thing they should be doing right now (aside from damage control) is to move the routers over to round-robin routing. Random is the most naive scheme possible and is laughably inappropriate for this situation.See for yourself using this simulator: <a href="http://ukautz.github.com/pages/routing-simulator.html" rel="nofollow">http://ukautz.github.com/pages/routing-simulator.html</a>

jxfover 12 years ago

To what extent would using something like Amazon's ELB mitigate this sort of issue in a bring-your-own-cloud approach? Completely?I've been looking at using something like Cloud66 and an ELB to move off of Heroku.

评论 #5237841 未加载

评论 #5237849 未加载

MrGandoover 12 years ago

This is big stuff...Sorry to see Rap Genius investing all that money in New Relic, I can't really imagine being on their shoes.I would be so pissed.PS: Heroku user here

Giszmoover 12 years ago

Am I the only one who read "crap genius"?

drudru11over 12 years ago

Holy crap - over $60k to get app performance graphs! Wow - that is super expensive!!

signed0over 12 years ago

This is unrelated, but every time I see that domain my mind thinks it's either rapegenius.com or ragepenius.com. Surely I can't be the only one?

评论 #5237901 未加载