As someone who has played with writing trading bots but never traded them with real money, some advice: if your results seem too good to be true, they probably are. Your trading bot may be doing unrealistic things or its results may not be reliable if the following are true:<p>- You are trading in a market with low liquidity or one that is controlled by a small number of market participants. I'm not an expert but I think this would apply more to markets like penny stocks and less to big markets like forex for major currency pairs<p>- You are not taking transaction costs into account or not doing so properly<p>- Your bot makes a low number of trades, making the results close or equivalent to lucky coin flips<p>- Your bot is simply making trades that cannot be executed, or may be doing simulated trades of something that is not actually tradable. This applies to a large number of research papers that assume you can just buy and trade the S&P 500 itself. You can trade ETFs that are tied to an index but an index is not a tradable instrument in of itself. Once you realize this, a lot of papers seem very weird<p>- You are not modelling other aspects of the trading process realistically, such as assuming the bot has infinite funds to trade, allowing it to take unlimited losses and continue trading when in reality you'd be hit with a margin call and your trading would be stopped<p>- Your code is committing any number of data snooping errors where the bot is asked to trade at time A (say the open of a trading session) but has access to future data (say the closing price of that day, future data that would not actually exist in a live environment)<p>- Depending on what you believe about how market conditions change over time, your bot may have worked in the past but would not work if used today. I.e., the market may have adapted to whatever edge your bot may have discovered<p>There are probably lots more pitfalls I don't even know about since I'm not an actual trader.<p>I'm not discouraging anyone from playing around or trying things, of course. I think it's great fun, which is why I do it.<p>Here's the good news: if you realize you don't actually have an edge and avoid risking your hard-earned money, you come out ahead of almost all people who ever trade.
I would add "build a toy regex engine" to the list.<p>A couple of years ago I implemented a toy regex engine from scratch (building NFAs then turning them into DFAs). I thought it was an enlightening experience because it showed me that the core principles behind regular languages are fairly simple, although you could spend years optimizing and improving your implementation. How do you deal with unicode? How do you modify your implementation to know how many characters you can skip if you don't have a match in order to avoid testing every single position in a file?<p>It demystified the concept of a regex engine for me while at the same time making me realize how impressive the advanced, ultra optimized engines we use and take for granted are.
This article is aimed towards students. It's great advice for students who are in college, know very little, and want to improve their CS skills.<p>It's poor advice for someone who already has a STEM degree and wants to build something useful and profitable. If you already know how these things work, your time is better spent on the "edge of the circle": <a href="http://matt.might.net/articles/phd-school-in-pictures/" rel="nofollow">http://matt.might.net/articles/phd-school-in-pictures/</a> which applies to businesses and startups as well.<p>If you're in the latter group -- you've already got the skills to build real shit. Don't waste your time on homework problems. Find a problem you have and build a solution for it. Don't listen to people who tell you to work on homework problems that have already been solved; it's a complete waste of your time if you already know the fundamentals.<p>As for stock trading bots -- if you don't have a mathematics degree or equivalent (e.g. having incredible math skills), don't even bother. You won't be profitable, and you will learn nothing useful in the process, because you will approach the problem as a naive CS student would. Smarter people than you have made trading bots and have failed miserably. Without having an extremely strong foundation in mathematics, your trading bot will amount to nothing more than a futile exercise in gluing APIs together.
I would strongly recommend building something that you yourself think is cool, and not feeling that you have to conform to what other people tell you to do.
I would also recommend, if someone is interested in games, to do Tetris. It's a simple concept that is trickier than expected once you have to figure out the details of how it all comes together.
Here's an open-ended programming project which, in a certain formal sense, spans the entire range of all difficulty levels: write an "intuitive ordinal notation" for as large of an ordinal number as you can.<p>What is an "intuitive ordinal notation"? Definition: The set of intuitive ordinal notations is the smallest set P of computer programs with the following property. For every computer program p, if, when p is run, all of p's outputs are elements of P, then p is in P.<p>So "End.", the program which immediately ends with no outputs, is vacuously in P (all of its outputs are in P, because it has no outputs). It notates the ordinal 0. Likewise, "Print(`End.')" is in P, because its sole output, "End.", is in P; it notates the ordinal 1. Likewise, "Print(`Print(End.')')" is in P, notating the ordinal 2. And so on.<p>The above can be short-circuited:
"Let X=`End'; While(True){Print(X); X=`Print(\`'+X+`\')'}".
This program outputs "End.", "Print(`End.')", "Print(`Print(`End.')')", and so on forever, all of which are in P, so this program itself is in P. It notates omega, the smallest infinite ordinal.<p>Here's a library of examples in Python, currently going up to a notation for the ordinal omega^omega: <a href="https://github.com/semitrivial/IONs" rel="nofollow">https://github.com/semitrivial/IONs</a>
Building a distributed key value store is a fun project and lets you learn tons of real world world stuff. It's a great excuse to get a survey on grokking the design decisions required to build a distributed system and it will truly help one understand why No SQL DBs scale easier than relational ones and the kind of tradeoffs they make to achieve that.
Instead of a stock trading bot, go for daily fantasy sports contests. It can cover pretty much all parts of programming.<p>Web scraping to gather data, databases for storing it, ML for analyzing, front and backend web dev to show the daily information and adjust.<p>And instead of having to deal with trading regulations, contests can be really small and easy to enter. There are daily contests for 5 cents an entry, and you can enter 150 optimized lineups from an uploaded csv for $7.50 a day. You can really learn a ton.
The database project is quite the rabbit hole if you start chasing performance. I have learned some amazing things about just how fast a 3~4ghz CPU core actually is from this journey.
Fantastic list!<p>By the way, adventofcode.com is currently ongoing. Though the challenges are easy compared to the projects in this list, I highly recommend it. It covers problems you might face in big projects. With these small puzzles it's easy to experiment. It prepares you for bigger things.
IMHO a text-based browser isn't exactly in the "challenging" category, as it basically amounts to stripping all the HTML tags out and doing some very simple transformations (like replacing <br>'s with newlines.) Then again, one of the things I've been working on intermittently for the past few years is a graphical (CSS2+) browser, which is definitely in the challenging category. There are some other public efforts too:<p><a href="https://github.com/lexborisov/Modest" rel="nofollow">https://github.com/lexborisov/Modest</a><p><a href="https://github.com/litehtml" rel="nofollow">https://github.com/litehtml</a><p><a href="https://github.com/ArthurHub/HTML-Renderer" rel="nofollow">https://github.com/ArthurHub/HTML-Renderer</a><p>Along the same lines, some other challenging projects I recommend are to write decoders/renderers for existing formats like MP3, MP4, PDF, etc.
I wrote a raytracer in 1996, and then a year later used Intel's VTune to speed it up. Just removing unused "return" statements gave me 3x speed increase. Apparently Borland C/C++ wasn't very smart back then.<p>A fun project I did after that was writing a AI frame language to do goal-stack problem solving, specifically with path finding. I connected it to the ray tracer and made movies of spheres having wars. (I used an unlicensed DivX encoder to stitch together thousands of GIFs.)
For some simpler projects, I can only recommend doing some digital signal processing. For example, an audio signal is just a list of values, so you can do things like:<p>- Count the number of zero crossings
- Find out where they are
- Create any shape of wave by adding together multiple sine waves
- Hard clip the signal
- Stretch a signal and interpolate it with new samples
- Invert and revert a signal<p>For level 2, you can start processing "live":<p>- Create a sine synthesizer
- Create a small ring buffer of samples
- Find out how to output that audio (system audio, soundcard)
- Add MIDI support
- Add polyphony support<p>DSP gets hard once it has to be in real time and the latency has to be minimal. It's great exercise to mess around with it.
> it is really simple to create the basic "database". You can start by using the dictionary data structure that comes with whatever programming language you're using and slap a web API on top of it.<p>Better yet: do it in C. There's no "dictionary" object type so you have to make it yourself. You'll soon learn a whole bunch of fallacies about how those "dictionaries" actually work. After you spent a good deal of time doing that, you can switch to authentication/authorization, logging, storage, tracing, API management, resource quotas, and a raft of distributed computing issues.<p>I recommend basing it on Consul, it has a better general model than etcd.
Writing a Game Boy emulator has been the most fulfilling and interesting programming project in my life.<p>I love, most of all, how modular the project is. I can do an hour here or there and make meaningful progress.<p>I'm really eager to discover other very large programming projects that break down into sensible bites so well.
I would recommend choosing a long enough time (e.g. 3 months) to contribute an open source project you are using, especially you are not familiar to that domain. I learnt a lot from modern compiler stuffs by contribute to rust-analzyer.
A great compilation:
<a href="https://github.com/danistefanovic/build-your-own-x" rel="nofollow">https://github.com/danistefanovic/build-your-own-x</a>
I recently went through the process of creating a ray tracer project from zero for learning purposes. It was a humbling and eye-opening experience. I've written an article[0] to explain my process in detail if you're interested.<p>[0] <a href="https://alessandrocuzzocrea.com/how-i-made-a-ray-tracer/" rel="nofollow">https://alessandrocuzzocrea.com/how-i-made-a-ray-tracer/</a>
I'd also recommend writing an emulator for real or fake (e.g. CHIP-8) hardware. It seems complicated but the core loop gets pretty simple. It ends up giving you a much better view of both assembly and pointer semantics (useful for better understanding C).
Related discussion from the original suggestions a year ago fwiw:<p><a href="https://news.ycombinator.com/item?id=21790779" rel="nofollow">https://news.ycombinator.com/item?id=21790779</a>
I would add a basic feature-complete website which works on every mainstream browser starting with Mosaic.<p>It's much easier than it may seem, architecting it is interesting, and there is a lot of "last 10%" stuff which keeps it fun as long as you keep going.<p>In the demystifying area, it demystified HTML and JS history for me, forced me to use with a minimal toolkit, and taught me how to build "modern" JS features in ways which will not break browsers which don't know how to do them or have them disabled.
Writing even an extremely simple game without using a game engine or dedicated game library is quite an eye opener.<p>Make a small 2D platform game, and it covers so many areas (and it is a lot of fun!).
Being able to do challenging projects is a cold comfort when by far the most challenging project I’ve ever faced was trying to build something people would pay.
What I am really missing is some kind of real-time AI. A decade ago, I have coded some bot for an ego-shooter with RTS elements and have learnt so much from it (while having a lot of fun).<p>It starts with basic things like waypoint systems vs. area awareness systems plus the relevant routing algorithms like A*, but goes on to organizing a group of players and finding good strategies. And all of that with a limited time budget and an changing environment around you. Last but not least, you want to emulate human behavior which is probably the hardest part as it includes changing you behavior according to your situation (don't run straight against a wall for 10 seconds) but also taking into account the weaknesses as e.g. humans can't aim perfectly.<p>Granted, what I have done has a huge field of challenges, but even with a 2D engine I think you can learn a lot from the experience.
I would add an emulator to the list.I've always struggled to figure out how these are built from scratch.<p>A long time ago I wanted to code a neogeo emulator but gave up before I even started, I didn't have a clue where to begin.<p>I am amazed at anyone that can code an emulator from scratch.
another one related to stock trading, but perhaps more interesting- build a simulator for a sport. both baseball and darts lend themselves to markov models, and are simple enough to simulate in some detail. with darts, you can get <i>very</i> close to as accurate as possible. baseball has more weird complications because of the rules. but its fun to do, and to compare to old games to see how well your model does.
Thanks for following up with this list after your previous one. I spent a great deal of my time (including some office time) on writing a Chip-8 emulator thanks to your previous list :D
My own favorite is an equation parser.<p>Before attempting to do so I thought it was implemented as a simple seek over the string, maybe a bunch of regex stuff. I guess it can be done that way, at the cost of growing complexity; but the proper solution (with a stack, etc) is so elegant (makeing it easy to add functions, operators, parenthesis, variables, etc) that it really makes one appreciate the value of good, thoughtful engineering.
Interesting projects.
I might try ray tracing in python as I’m also exploring a-lot about CGI lately.<p>Has anyone tried CGI if so how’s your experience has been so far
What do folks think about implementing a web crawler that you can send to a website and it indexes every internal url on the site. I remember sitting down to write one 100 years ago now, and finding it to be much trickier than I thought it would be.
>> automate testing on historical data over long periods of time<p>I want to try this. Where can u get access to historical pricing data that includes pricing changes during the day, not just end of day prices?