Open-ended but needs to use some of the following methods: KNN, LDA, QDA, Lasso, Ridge Regression, PCA, tree-based methods (Random Forest).<p>In terms of topic ideas I've had: (1) trying to predict what will be a popular HN post, (2) trying to predict how many stars a Github repo will get, (3) trying to predict whether a StackOverflow answer will be the top answer.<p>I've thought of doing some type of code quality project but there doesn't seem to be a dataset with code quality metrics.