科技回声

I recently had a brief chat with a couple of developers working on different data-heavy websites, who were both using an interesting pattern for filling their databases.Their data-gathering components (pulling from external sources like crawlers and APIs) would append new data to the bottom of a log file.Another process sat doing something like a 'tail -f' on the same file, and parsed and added the updates to the database.This seems like it might solve some problems for my case:- Very easy to recreate the database if the schema changes or things blow up, just reread the log files- Good history for debuggingWhat worries me is that it feels funky using files for IPC, and I can't find any examples of this being used elsewhere.So, is anyone else using this pattern, or have any references to it that I'm missing?

5 条评论

gstar超过 15 年前

It's certainly an interesting approach! Depending on the volume, it could have the benefits you described - but it may get a bit old when the volumes increase.I'd engineer the crawler to talk to a persistent message queue, and load the database from there. That gives you a lot of flexibility to move loads around, instrument the queue and you're not reinventing things, either.

brown9-2超过 15 年前

Very easy to recreate the database if the schema changes or things blow up, just reread the log filesThis would be nasty though if your log files wrapped, the disk they were on ran out of space, etc.

评论 #842159 未加载

petewarden超过 15 年前

As an update, I did some groundwork to see how this might work in PHP, by creating a small example that tails the Apache error log: <a href="http://petewarden.typepad.com/searchbrowser/2009/09/how-to-follow-your-apache-error-logs-in-a-browser.html" rel="nofollow">http://petewarden.typepad.com/searchbrowser/2009/09/how-to-f...</a>Still feels kinda sketchy...

fsniper超过 15 年前

RDBMS has logging mechanisms for this kind of recreating database. For example, PostgreSQL has WAL - write ahead log. These can be used to rebuild the db or for asynchronous replication. Likewise Mysql has binary logging.

skwiddor超过 15 年前

cat data | tee log | data_processorit's called "the Unix philosophy". invented by Doug McIlroy, probably before you were born

5 条评论

gstar超过 15 年前

brown9-2超过 15 年前

评论 #842159 未加载

petewarden超过 15 年前

fsniper超过 15 年前

skwiddor超过 15 年前

cat data | tee log | data_processorit's called "the Unix philosophy". invented by Doug McIlroy, probably before you were born

Is anybody else loading their database by tailing log files?

5 条评论

Is anybody else loading their database by tailing log files?

5 条评论