TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Is anybody else loading their database by tailing log files?

5 点作者 petewarden超过 15 年前
I recently had a brief chat with a couple of developers working on different data-heavy websites, who were both using an interesting pattern for filling their databases.<p>Their data-gathering components (pulling from external sources like crawlers and APIs) would append new data to the bottom of a log file.<p>Another process sat doing something like a 'tail -f' on the same file, and parsed and added the updates to the database.<p>This seems like it might solve some problems for my case:<p>- Very easy to recreate the database if the schema changes or things blow up, just reread the log files<p>- Good history for debugging<p>What worries me is that it feels funky using files for IPC, and I can't find any examples of this being used elsewhere.<p>So, is anyone else using this pattern, or have any references to it that I'm missing?

5 条评论

gstar超过 15 年前
It's certainly an interesting approach! Depending on the volume, it could have the benefits you described - but it may get a bit old when the volumes increase.<p>I'd engineer the crawler to talk to a persistent message queue, and load the database from there. That gives you a lot of flexibility to move loads around, instrument the queue and you're not reinventing things, either.
brown9-2超过 15 年前
<i>Very easy to recreate the database if the schema changes or things blow up, just reread the log files</i><p>This would be nasty though if your log files wrapped, the disk they were on ran out of space, etc.
评论 #842159 未加载
petewarden超过 15 年前
As an update, I did some groundwork to see how this might work in PHP, by creating a small example that tails the Apache error log: <a href="http://petewarden.typepad.com/searchbrowser/2009/09/how-to-follow-your-apache-error-logs-in-a-browser.html" rel="nofollow">http://petewarden.typepad.com/searchbrowser/2009/09/how-to-f...</a><p>Still feels kinda sketchy...
fsniper超过 15 年前
RDBMS has logging mechanisms for this kind of recreating database. For example, PostgreSQL has WAL - write ahead log. These can be used to rebuild the db or for asynchronous replication. Likewise Mysql has binary logging.
skwiddor超过 15 年前
cat data | tee log | data_processor<p>it's called "the Unix philosophy". invented by Doug McIlroy, probably before you were born