I built a minimalist AI news aggregator while teaching myself NLP. It's a single static HTML page that updates every 4-6 hours with clustered AI news stories - no accounts, no cookies, no JS frameworks.<p>Technical approach:
- Python scraper for AI/tech sources
- Custom NLP pipeline with BERT embeddings to filter for AI-specific content
- Hierarchical clustering to group related stories
- ChatGPT API for generating cluster titles and short summaries
- Served as a static HTML via Cloudflare Pages
- Lightweight analytics with GoatCounter and Umami (understanding these two frameworks to choose one over the other)
- Experimental JSON-based search (considering proper search if this scales)<p>The project started when I realized I was wasting hours daily checking multiple sources as a PM trying to track AI developments. Built this over 3 months between work commitments.<p>Interesting challenges:
- Finding the right threshold for story similarity (still tuning this)
- Balancing comprehensive coverage with noise filtering
- Keeping the page lightweight while maintaining content density<p>Would appreciate feedback on clustering accuracy, false positive/negative rates, and overall UX.<p>Link: <a href="https://currentai.news" rel="nofollow">https://currentai.news</a>