Issue 8788 - Every day around 9 AM Brussels time, huge drop in GAE performance

258 pointsby thijserabout 12 years ago

32 comments

hosay123about 12 years ago

> I'm going to assume everyone experiencing this issue is using M/S. Upgrading to HRD will solve your issue.This is the reason I abandoned AE and part of why adopting a platform that isn't standardized is incredibly dangerous. The problem is technical debt constantly accrues even when you aren't making changes.Even though the API was unchanged, HRD differs subtly enough that breakage can occur on any non-trivial project. Edge cases (how indices behave within transactions comes to mind, but there are plenty more examples) will see new semantics compared to M/S, and so this "upgrade" involves not only thorough testing and auditing, but likely also code changes and potentially significant engineering hours.<a href="http://goo.gl/HVuaC" rel="nofollow">http://goo.gl/HVuaC</a>: These techniques are not needed with the (now deprecated) Master/Slave Datastore, which always returns strongly consistent results for all queries.This means a project written and signed off circa 2011 requires mandatory engineering costs just to continue running in a functioning and supported fashion. An AE app will never quite resemble that ancient perl5 behemoth running uninterrupted since 1997, as the underlying implementation and recommended APIs are constantly modified and replaced (Datastore, NDB, Python major version)."A strong test suite will save your soul!" I hear you say, tests that a small project might have survived without if targeting any other platform, and testing on AppEngine is also yet another moving target (for example, testing nested subrequests was all but impossible using the SDK until relatively recently).The promise was a carefree life for a project willing to code against their proprietary APIs; the reality is a constantly moving target, "not quite free" autoscaling and the threat that while you're asleep an unannounced change will take down your app (I could name a few, but as many will attest this has happened regularly since launch).

评论 #5292544 未加载

评论 #5292756 未加载

评论 #5292520 未加载

评论 #5292583 未加载

评论 #5292943 未加载

rachelbythebayabout 12 years ago

Remember the old "thundering herd" problem with Apache children and things of that nature? You'd basically have a whole bunch of processes which had a listening fd from an earlier call to listen(). When a new connection would come in, the kernel would wake all of them, even though only one of them would actually have something to get. The others would go through the process for nothing. It caused a big performance hit back in the day.Well, imagine now that you have a directory or lock service where you can store things and perform atomic updates. When you do a write to something in it, it fans out to all of its clients, and they all wake up (nearly) simultaneously and receive the update. They then have to do whatever processing you do with new data of that type.If they all do this at the same time, then you have no processes left to service incoming requests. They're all identically busy with whatever mutexes held in order to apply those config changes safely, so no other work happens on those clients while they load in the new data.It's not so much that it's taking a mutex and is getting stuck for a little bit, since that's going to happen no matter what. It's that all of the children do it at the same time, so there's nobody to service your hit, and you're guaranteed to get stuck. If it was spread out, then only some percentage of incoming requests would get stuck behind this. The others would get lucky and would hit another instance which either had already run it or hadn't yet run it.I'm not saying this is what's going on here, but it sure sounds familiar.

评论 #5291728 未加载

Confusionabout 12 years ago

Well, the bug report doesn't really invite quick attention. Simply reporting your observations is not enough: you should position yourself as a competent customer, by explaining what you have done to ensure the problem isn't on your side. Mention the code hasn't changed, that you have no database cleanup cronjobs or similar running that could be interfering, etc.My first instinct when I see a report like this is: he probably has some cronjob running he forgot about; perhaps one whose performance decreased with O(n^2).By which I'm not saying that Google is right in not replying for days, but by which I am saying that as a customer, there are easy ways to get attention beyond shouting and threatening. Show it's an interesting problem and you're bound to get some techie's attention.

评论 #5291473 未加载

评论 #5291416 未加载

评论 #5291389 未加载

评论 #5291566 未加载

评论 #5293247 未加载

评论 #5292294 未加载

benjaminwoottonabout 12 years ago

Google support is an absolute disgrace.I had a Nexus 7 go AWOL at Christmas and I've never had such a shambolic customer service experience.They have absolutely no respect or customer service ethos when it comes to people who are actually paying them real cash money.Not in a million years would I sign off on hosting a production project on App Engine.

评论 #5292109 未加载

评论 #5291554 未加载

davedxabout 12 years ago

Why would anyone put anything production critical on Google these days, knowing that they provide 0 support across most of their business?

评论 #5291414 未加载

评论 #5291390 未加载

评论 #5291618 未加载

评论 #5291410 未加载

评论 #5292463 未加载

afhofabout 12 years ago

A GAE user sees a problem of his service being slow, writes a frantic bug report with caps and exclamation marks and threatens to leave GAE. As a GAE user myself, two questions come to mind:1. Is GAE outside of their .9995 SLA* uptime? If they aren't, then it probably isn't important enough spend time looking into it. Customers cannot expect better than the agreed upon uptime percent, and hosting companies are obligated to reimburse customers if they go below SLA. Both of these are covered in the SLA doc.2. Is it reproducible? So far, the bug report mentions 2 people out of GAE users. Is 2 people enough to say its a problem with GAE? One person is panicked, and the other provides few details for the bug report.*<a href="https://developers.google.com/appengine/sla" rel="nofollow">https://developers.google.com/appengine/sla</a>

评论 #5291643 未加载

评论 #5293616 未加载

nivlaabout 12 years ago

Having never used GAE, it would nice if someone could expand M/S and HRD for me.It looks like OP of the bug-report is using a depreciated feature/program which according to the Project Member is causing latency issues at a specific time daily. But that could not be the real issue since another commentator who is using the new HRD is also having the same problem. It is even frustrating for people who are reading this. All it implies is the lack of communication from Google when something goes awry. Come on Google, stop reinforcing my stereotypes about your customer support!Selling to a customer is different than selling to a business, you may have a great product at a great price but if you offer terrible CS, in the B2B world everyone is going to avoid you. It is a place where support is valued more than the product itself.Therefore, unless you start offering a decent CS, you can lower your price all you want, I will be sticking with AWS.

评论 #5291468 未加载

评论 #5291440 未加载

zwischenzugabout 12 years ago

I manage the 3rd line support of some of the busiest websites in the world (we provide back-end e-commerce software).I can't say I think much of google's response here. Nearly two weeks before the first comment, and then shut down after 2 days and a question directed at who knows who, and no explanation?The analysis elsewhere on here suggests they're violating SLA, so this should get more attention. I'm guessing support is under-resourced @ google, and the culture of support is a bit shabby (no acknowledgement of inconvenience or indication or evidence of work undertaken in the background) - hardly surprising for a large-scale software business based on free services.

neyaabout 12 years ago

I'm sorry, but this is the price you pay for running your business that is dependent TOTALLY on a 3rd party service. Forget Google, everyone out there is most likely the same, that's why it's important for you to run your 'apps' on something you have control over - Like Linode, AWS, Rackspace, Openshift, etc. and also have back-up nodes from other providers for redundancy, for emergency situations, incase of storms, etc.I would recommend trying your apps on OpenStack (Openshift in particular), which doesn't have the vendor lock-in, which you face right now.

评论 #5291427 未加载

评论 #5291442 未加载

评论 #5293250 未加载

lucb1eabout 12 years ago

To their credit, people are apparently using something that's been deprecated and should be changed regardless. At least, that was their conclusion when it was changed to wontfix. The replies are very rare and curt though, I can't really say it's quality service when you're paying for a product.Customer support from Google has always been like this as far as I've experienced and heard. There is no way to actually reach and converse with anyone, regardless whether you are paying them for the service or what kind of request it is.Once a Google employee randomly replied to a complaint of mine about Google+ (I didn't even +mention them). After a few comments and him confirming that it was added to the bugs list, I asked if it was okay to +mention him in the future with similar issues. It was okay. I did. He never showed his face again. (His profile still says "Works at Google+".)Another Google employee I know online also never replies to anything concerning Google. I know he works on the Google+ project, but I can only hope he passes on any bugs I +mentioned him in.For Youtube, you can post in their forums but merely hope for a reply. Copyright complaint disputes are no priority, either.I haven't used many paid products, but I have read about their customer support being one of the very worst and also have never been able to find a single e-mail address or phone number to get support at for any service.Edit: By the way, I would have moved away from the Google Apps Engine a long time ago if my app went down every morning during rush hour for 10 days straight.

sgiftabout 12 years ago

It is interesting that basically no one (including the news poster) noticed that there is an comment (#12) which states that this problem happens on HRD too. This statement may be false and/or a completely different issue, but at least it should be considered here for HN comments which state "M/S is deprecated, Google is right, just use HRD."

brown9-2about 12 years ago

A bug tracker seems like a horrible way to report production (or non-production) support issues. This is the same bug tracker OSS projects on Google Code use.Is it really helpful for the public to comment on my support request? Seems like the signal to noise ratio would be quite low, and then you get inane comments like:I got here from HackerNews, but after seeing the original poster spam the forums in multiple places and have a bad attitude, I can't blame Google for not fixing what looks to me like a non-issue.Fuck 'em.You have to believe that the choice of tools has some bearing on the quality of the response from Google. Seems like there is very little incentive for any "Project members" to trawl through open bug reports when no one is ever responsible.

Al-Khwarizmiabout 12 years ago

Not surprising... the second most-voted bug in Google Code, reported exactly a year ago ( <a href="http://code.google.com/p/support/issues/detail?id=24324" rel="nofollow">http://code.google.com/p/support/issues/detail?id=24324</a> ) deplores the removal of a feature that was already there (the Updates page) and was the single most useful feature in Google Code for many of us. After one year and more than 800 people registering their interest on the issue, they haven't even explained why they removed it or whether there are any plans of brinding it back.

afhofabout 12 years ago

Comment from the WontFix mark:"M/S is deprecated and there is a clear and straightforward path to migrating to HRD."M/S was deprecated April 4, 2012, so it has been some time since the notice has been out there. High replication data store has been available for over 2 years now. Whether or not less than a year is too short a deprecation period is another issue.

评论 #5291696 未加载

评论 #5291353 未加载

pyalot2about 12 years ago

Ok, so here's the deal. If your app runs exclusively on GAE you've essentially tied yourself to one cloud vendor. Now disregarding the respective benefits and drawbacks of google as a hosting company for your app (I would never do that), being dependent on one cloud provider is a very bad idea. No matter if you run on EC2, Azure or GAE, if you can't seamlessly switch to another provider, you're screwed. These all go down regularly and have issues. They're big companies, you're a small company, you have no such thing as "recourse". The court of public opinion will not save your company.

评论 #5291667 未加载

评论 #5291626 未加载

killermonkeysabout 12 years ago

Many on the thread say the reporters are over-reacting. They are not. What would amazon do? They would not consider this an issue, would respond in less than 24 hours, and would take complete responsibility. GAE is a pay service. I think this level of service is pathetic.As noted the only attempt at diagnosis is completely wrong (even the reporter is not on MS) and very late.

timmeabout 12 years ago

The headline blows this out of proportion.Few people (who act obnoxious as hell) report a problem that can be solved by moving away from a deprecated system, yet they fail to even read the note because they're busy smashing exclamation marks into the issue tracker.

评论 #5291646 未加载

评论 #5291541 未加载

edentabout 12 years ago

Why would anyone expect customer support from Google? They have made it clear time and time again that they don't provide it. <a href="http://shkspr.mobi/blog/2013/02/googles-customer-contempt-conundrum/" rel="nofollow">http://shkspr.mobi/blog/2013/02/googles-customer-contempt-co...</a>

评论 #5291493 未加载

mosabout 12 years ago

Customer support of Google really sucks! Currently the GAE cloud has a reliability problem (also for new customers). Instances are restarted like crazy. This leads to downtimes. But that's not enough. Customers have even to pay more(!) instance hours because of this. There is the running gag on the mailing-list: "Whenever GAE is unreliable for weeks Google needed to make revenue targets ;-)"ReferencesCurrent Issue: <a href="http://code.google.com/p/googleappengine/issues/detail?id=8844" rel="nofollow">http://code.google.com/p/googleappengine/issues/detail?id=88...</a>Same issue from last year that took weeks to be resolved (check last comments!): <a href="http://code.google.com/p/googleappengine/issues/detail?id=8004" rel="nofollow">http://code.google.com/p/googleappengine/issues/detail?id=80...</a>Some Pros and Cons of Google App Engine in this blog-post: <a href="http://www.mosbase.com/" rel="nofollow">http://www.mosbase.com/</a>

kelvin0about 12 years ago

BTW, this issue is not simply a due to MS, it also happens on HRD. So any google support apologists here, please read the BUG thread submitted by this poor customer before dismissing it simply as a 'migration issue'.I have had some issues with Google Docs (paid for premier commercial account). Some documents we had stored simply vanished from our account. After getting the run around for 3-4 days, finally a google engineer tolds us they can't help us recover the documents THEY 'lost' unless we have the URL to the document ... Thankfully someone on our team had kept the URL when I first shared that document with them (1+ year after the document had been created).Nightmare ...

bromleyabout 12 years ago

Quick tip for anyone making a system with high load and daily or hourly quotas: When an account is created, assign a random start time (e.g. 05:43 for daily quotas or minute 12 for hourly) to measure that account's quotas against. Then you can avoid this issue of the system getting a huge spike in load when everyone's quota refreshes at the same time.

mrerrormessageabout 12 years ago

It happens that 9 AM Brussels time is midnight pacific time. I'm sure Google is running some maintentance cron at midnight thinking "This is a low demand time," and it is, across the US, but not in Brussels. These are old instances, and Google probably doesn't want to re-time or rewrite the cron job to be more efficient.

评论 #5294136 未加载

lognabout 12 years ago

Ironically, the guy who closed this issue owns this project:<a href="https://code.google.com/p/sentimentally/" rel="nofollow">https://code.google.com/p/sentimentally/</a>"sentimentally is a tool that determines sentiment of your emails. Once determined, it helps you gauge your relationships with co-workers, customers, friends, or other individuals based on the tone of your conversations with these people."

kushtiabout 12 years ago

Never pay Google. It has terrible support for all products

评论 #5291632 未加载

raverbashingabout 12 years ago

" Upgrading to HRD will solve your issue. M/S is deprecated and there is a clear and straightforward path to migrating to HRD."Can anyone explain why this is not possible for them?Wonderful Google support apart, there are a lot of alternatives out there.

评论 #5291373 未加载

okkuabout 12 years ago

I am also a GAE-user, I have had no problems like the OP. But I start to miss a fundamental feature, sockets. I have worked around it by using other services and polling.Maybe wrong forum, but is there any infrastructure templates for setting up a scalable web/db/loadbalancer/memcached for a simple tradional webservice, in my case a game?I want to be able to sleep at night, and easily scale up by adding some more machines in case of higher load.I could use denormalized myslq/postgre or mongodb for speed. Preferred language is Python (or maybe c# or java).Any ideas?

评论 #5291879 未加载

评论 #5291630 未加载

petersmagnussonabout 12 years ago

Hi folks. We are fully aware of this issue. We've added it to external issue tracker (<a href="https://code.google.com/p/googleappengine/issues/detail?id=8901" rel="nofollow">https://code.google.com/p/googleappengine/issues/detail?id=8...</a>), please follow up there.Response from us was initially muted because it looked like it only affected M/S apps, but it turns out (a) it can impact HRD as well, and (b) we're pretty unhappy about the level of impact for many M/S apps so we're looking at ways to resolve. It's a high priority and we're looking at a number of ways to address it. It's also a pretty interesting issue, because indirectly it's caused by (a) the large scale that App Engine is running, and (b) the large extent with which GAE is running free applications.Regardless, apologies to those who felt support was unresponsive. We are working very hard to improve support. For the sophisticated audience that comes to these pages, please link to me on Google+ to get my attention if we are failing you (<a href="https://plus.sandbox.google.com/110401818717224273095" rel="nofollow">https://plus.sandbox.google.com/110401818717224273095</a>).

mnml_about 12 years ago

Gae is cool but its not worth the money. I shouldn't be out of beta.

lnanek2about 12 years ago

I've heard a lot of people saying this is why Google can't get a lot of businesses to sign on. There's no one for the CEO to call and complain to directly when their stuff is down.

chris_wotabout 12 years ago

Well, there's a good reason not to use this service.

xsaceabout 12 years ago

Jesus, so glad I switched to node when the hosting rates increased back in the sept 2011 with the "GAE out of preview" move

saosebastiaoabout 12 years ago

Ahh the perils of being a customer of Google.

32 comments

hosay123about 12 years ago

评论 #5292544 未加载

评论 #5292756 未加载

评论 #5292520 未加载

评论 #5292583 未加载

评论 #5292943 未加载

rachelbythebayabout 12 years ago

评论 #5291728 未加载

Confusionabout 12 years ago

评论 #5291473 未加载

评论 #5291416 未加载

评论 #5291389 未加载

评论 #5291566 未加载

评论 #5293247 未加载

评论 #5292294 未加载

benjaminwoottonabout 12 years ago

评论 #5292109 未加载

评论 #5291554 未加载

davedxabout 12 years ago

Why would anyone put anything production critical on Google these days, knowing that they provide 0 support across most of their business?

评论 #5291414 未加载

评论 #5291390 未加载

评论 #5291618 未加载

评论 #5291410 未加载

评论 #5292463 未加载

afhofabout 12 years ago

评论 #5291643 未加载

评论 #5293616 未加载

nivlaabout 12 years ago

评论 #5291468 未加载

评论 #5291440 未加载

zwischenzugabout 12 years ago

neyaabout 12 years ago

评论 #5291427 未加载

评论 #5291442 未加载

评论 #5293250 未加载

lucb1eabout 12 years ago

sgiftabout 12 years ago

brown9-2about 12 years ago

Al-Khwarizmiabout 12 years ago

afhofabout 12 years ago

评论 #5291696 未加载

评论 #5291353 未加载

pyalot2about 12 years ago

评论 #5291667 未加载

评论 #5291626 未加载

killermonkeysabout 12 years ago

timmeabout 12 years ago

评论 #5291646 未加载

评论 #5291541 未加载

edentabout 12 years ago

评论 #5291493 未加载

mosabout 12 years ago

kelvin0about 12 years ago

bromleyabout 12 years ago

mrerrormessageabout 12 years ago

评论 #5294136 未加载

lognabout 12 years ago

kushtiabout 12 years ago

Never pay Google. It has terrible support for all products

评论 #5291632 未加载

raverbashingabout 12 years ago

评论 #5291373 未加载

okkuabout 12 years ago

评论 #5291879 未加载

评论 #5291630 未加载

petersmagnussonabout 12 years ago

mnml_about 12 years ago

Gae is cool but its not worth the money. I shouldn't be out of beta.

lnanek2about 12 years ago

I've heard a lot of people saying this is why Google can't get a lot of businesses to sign on. There's no one for the CEO to call and complain to directly when their stuff is down.

chris_wotabout 12 years ago

Well, there's a good reason not to use this service.

xsaceabout 12 years ago

Jesus, so glad I switched to node when the hosting rates increased back in the sept 2011 with the "GAE out of preview" move

saosebastiaoabout 12 years ago

Ahh the perils of being a customer of Google.