If you feel like you've finally groked GPU/massive parallel software programming and need more challenges, I highly recommend playing around with digital circuits! The level of parallelism available to you in hardware is truly unmatched and it's incredibly fun, especially once you start really pushing implementations of your designs on FPGAs. Granted, FPGAs are frequently less useful than what you could do on a GPU due to the higher clock speeds available on ASICs (if your GPU core clock is 3GHz and your FPGA design maxes out at 500MHz [which would be admirable!], the GPU has nearly 6x the number of cycles to match or beat your implementation!).