TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Data Warehousing 101

67 pointsby corbetalmost 14 years ago

11 comments

eneveualmost 14 years ago
If you are interested in Data Warehousing, you should read Ralph Kimball's "The Data Warehouse Toolkit": <a href="http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247" rel="nofollow">http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimens...</a><p>When I started learning about BI (Business Intelligence), a few members of the Pentaho community advised me to read this book. I'm glad I did. Kimball is one of the "fathers" of data warehousing, and his book had a lot of great insights for dimensional modeling. It helped me avoid many design mistakes while building my DWH, and gave me insight I might have taken years to discover.<p>It's a "theoretical" book, in the sense that it does not focus on any specific technology; it's also a "practical book", because he uses real-world scenarios (inventory management, e-commerce, CRM...) to demonstrate the various dimensional modeling techniques. I also liked the part about BI project management and encouraging BI in a company (= how to engage users and how to "sell" a BI project to management).<p>He also has a newsletter with many DWH design tips (archives here: <a href="http://www.kimballgroup.com/html/07dt.html" rel="nofollow">http://www.kimballgroup.com/html/07dt.html</a> ).
thibaut_barrerealmost 14 years ago
Oldies but goodies<p><a href="http://philip.greenspun.com/wtr/data-warehousing.html" rel="nofollow">http://philip.greenspun.com/wtr/data-warehousing.html</a><p>(data warehousing for cavemen)
评论 #2809839 未加载
评论 #2808686 未加载
xalalmost 14 years ago
Shopify is pondering open-sourcing our internal tool called Tiller. It runs all the reporting for our considerable data warehouse efforts, yet it's lightweight and super fast to get running.<p>Watch this space.
评论 #2809195 未加载
评论 #2809184 未加载
billswiftalmost 14 years ago
&#62;If you're building an archive, your only requirements are to minimize storage cost and to make sure the archive can keep up with the generation of data.<p>And in some of the cases he mentions be really. <i>really</i> certain you don't lose data, since some of the laws impose criminal penalties on data loss, and not necessarily even on the most responsible parties (legislatures have been getting increasingly psychotic this way).
pratikpatelalmost 14 years ago
The last stage of enterprise integration with the DW is through Data Marts, which are organized into Dimensions and Facts, and allow for dynamic interfaces for business users to mine their data. My current project is using Informatica CDC (Change Data Capture) to read multiple source databases through their logs and aggregate in real-time. Its really incredible and enables any level of intricate reporting requirements.
评论 #2809854 未加载
dgudkovalmost 14 years ago
This isn't a good article about data warehousing 101. I've been working in data warehousing since 2004. The core thing in DW is DWH data model because it's actually abstraction layer than converts raw transactional data into meaningful, consistent, correct and persistent representation of an organization's activity. Tools (including mentioned in the article) are just means to achieve that goal.
mumrahalmost 14 years ago
Hive is a really slick DW tool built on top of Hadoop. It has a SQL-like language and supports typical DW techniques like table partitioning, key clustering, etc.
ajtayloralmost 14 years ago
As a data warehouse newbie, this was an excellent introduction. I've heard a few of the names mentioned, but there are lots of new faces to explore.
orenmazoralmost 14 years ago
I'd like to learn more about this. can anybody recommend some cool projects (open source, or even just ideas) for me to explore?
T_S_almost 14 years ago
This article seems a little out of date. It's missing most of the things I have looked at over the past year. E.g. Riak, MongoDB, Redis and so on.
gaiusalmost 14 years ago
Can't mention MapReduce without Oracle Coherence (nee Tangosol)
评论 #2808589 未加载