TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Operations magic cure: nightly server restarts

16 点作者 samaparicio超过 15 年前

9 条评论

donw超过 15 年前
This could be better titled as 'Operations Magic Cure: Test Like A Madman'.<p>It's not the act of restarting the servers that solves problems. Rather, their nightly reboot policy forces them to have all of the infrastructure in place to be able to rotate servers in and out of service quickly. It forces them to centrally manage configuration. It forces them to architect their software and networks to handle restarts.<p>They use server restarts as a way to force all of these things to be in place, which isn't such a bad idea. More importantly, it doesn't sound like they use the nightly restarts as a substitute for actual troubleshooting (e.g., rebooting a server to 'fix' a problem, rather than figuring out what's really going on).
评论 #960776 未加载
评论 #960835 未加载
djcapelis超过 15 年前
I do this with webservers running trac all the time. Except I just restart the webserver I don't bounce the entire machine.<p>I also find it relatively unbelievable that the linked post claims that restarting servers requires a 24/7 operations team. He trusts his servers to do a bunch of other complex operations unwatched but doesn't trust them to do a restart? Obviously you need to make sure the next server waits until the one before it in the schedule comes back up before it dives down and send you an alert if one dies and doesn't come back, but otherwise this isn't something someone should have to run by hand.<p>These aren't mainframes, you don't need an operator anymore.<p>Have one if you want but you sure shouldn't need one.
rm-rf超过 15 年前
Of course re-booting servers or re-starting processes forces app servers to re-cache data, database servers re-parse, re-cache and re-optimize queries and throws away good query plans &#38; replaces them with new versions, some of which will be worse than what you had.<p>On apps that cache database objects in app server memory, we see a noticeable degrade of response time while the app server re-caches database objects.
junklight超过 15 年前
Sometimes it <i>is</i> ok to have less than optimal solutions to get you past sticky patches. Is your goal to build a perfect beautiful engineering solution or to deliver a service to end users? However - you <i>must</i> recognise that its a kludge and that it needs to be sorted out as soon as possible.<p>The opposite of this is the "server that must not be rebooted EVER" (and if it has to be will require the whole engineering team -I've seen one or two of these in my time). Likewise machines that are rebooted every night soon turn into "that's the way we've always done it".<p>So - it can be ok to fix things like this sometimes but watch it doesn't become a cargo cult fix.
评论 #961108 未加载
ghshephard超过 15 年前
Anyone who hasn't restarted their server has: o Old versions of Firmware (Hard Disks, Mother Boards, BIOS, SCSI Controllers, etc...) o Old kernels o In for a surprise when their data center eventually loses power. :-)
评论 #960707 未加载
dazzawazza超过 15 年前
Nope, sorry. I just can't accept this. I've though about it for 10 minutes trying to convince myself that this is a pragmatic approach to a difficult problem but I just can't see it.<p>It's just poor management of servers. If you have a process that routinely leaks memory or stalls then manage that by restarting it &#38; balancing it but rebooting the entire machine! This implies that you don't trust your OS to NOT leak resources (pipes, files, sockets, whatever) thus you need a new OS.<p>The 'excuse' that it forces you to build a scalable redundant system is bogus, it's folly.
评论 #961115 未加载
ojbyrne超过 15 年前
logrotate and "apachectl graceful." at 4am. Don't push code without a restart during the day, then you don't have to have someone awake for the restart in the middle of the night.
评论 #960748 未加载
kristianp超过 15 年前
Nightly restarts: Implies a Windows server platform. And/or bad software.
评论 #960863 未加载
plaes超过 15 年前
Windows - Reboot, Unix - Be root!