TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Hands-On Data Engineering with a Real-Estate Project Guide

4 点作者 articsputnik大约 1 年前
Hey HN community,<p>I recently pushed an update to my GitHub repo titled &quot;Practical Data Engineering: A Hands-On Real-Estate Project Guide&quot;. This open-source project aims to tackle real-world data engineering challenges while exploring various technologies. It guides you through building a data application that collects, enriches, and visualizes real-estate data, potentially helping you find your dream property.<p>This project covers web scraping with Beautiful Soup, processing data with Spark and Delta Lake, visualizing with Apache Superset, and much more, all orchestrated on Kubernetes for scalability.<p>I started this project back in November 2020, mainly to learn and teach data engineering. Three years on, I&#x27;m fascinated by the fact that despite the data engineering space moving extremely fast, the core of my project, powered by carefully chosen tools from the Open Data Stack, remains relevant to this day. This project is my most searched blog post on Google, which motivated me to update it.<p>I updated to the latest versions of tools like Dagster while exploring new additions like delta-rs, which allows direct interactions with Delta Tables in Python.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;sspaeti-com&#x2F;practical-data-engineering">https:&#x2F;&#x2F;github.com&#x2F;sspaeti-com&#x2F;practical-data-engineering</a><p>I look forward to your thoughts and seeing what you would build differently. My future plans are to add Rill Developer as a code-first BI tool and add DuckDB or Polars to the mix.

1 comment

airstrike大约 1 年前
Congrats on building this and on putting yourself out there<p>Without having looked into it in detail, from the outside looking in, it strikes me as if you&#x27;re highlighting the technologies you&#x27;re using more than the actual insights you&#x27;re getting from the data. Even on this very post you&#x27;re saying you have plans to add X, Y and Z to the stack—without considering why you&#x27;re doing it.<p>This is perfectly fine if your goal is just to learn all these technologies. But in that case, chances are the project isn&#x27;t really interesting to anyone but you<p>I would encourage you to take a step back and reconsider what you can do next. Now that you possess the knowledge of using all these tools, how can you best use the best tool to answer the best question that can be asked about some interesting problem?<p>Usually, interesting problems are those that remove a major source of pain for someone else. Very often, that someone else soon becomes your first customer.<p>Good luck!
评论 #39775356 未加载