Hi HN! I’m back with another “what they don’t teach you in school” style course that I’d love to share with the community. (A couple years ago, I was part of the team that taught Missing Semester, an IAP class that taught programmer tools that weren’t covered in any CS courses at MIT: <a href="https://news.ycombinator.com/item?id=22226380" rel="nofollow">https://news.ycombinator.com/item?id=22226380</a>.)<p>MIT, like most universities, has many courses on machine learning (6.036, 6.867, and many others). Those classes teach techniques to produce effective models for a given dataset, and the classes focus heavily on the mathematical details of models rather than practical applications. However, in real-world applications of ML, the dataset is not fixed, and focusing on improving the data often gives better results than improving the model. We’ve personally seen this time and time again in our applied ML work as well as our research.<p>Data-Centric AI (DCAI) is an emerging science that studies techniques to improve datasets in a systematic/algorithmic way — given that this topic wasn’t covered in the standard curriculum, we (a group of PhD candidates and grads) thought that we should put together a new class! We taught this intensive 2-week course in January over MIT’s IAP term, and we’ve just published all the course material, including lecture videos, lecture notes, hands-on lab assignments, and lab solutions, in hopes that people outside the MIT community would find these resources useful.<p>We’d be happy to answer any questions related to the class or DCAI in general, and we’d love to hear any feedback on how we can improve the course material. Introduction to Data-Centric AI is open-source opencourseware, so feel free to make improvements directly: <a href="https://github.com/dcai-course/dcai-course">https://github.com/dcai-course/dcai-course</a>.