I'm curious about the opportunities for co-design of hardware and algorithms, especially since we are building more ASICs for neural networks - they're an increasingly used workload.<p>Things like grouped convolutions were invented for AlexNet as a practical engineering step because of limited GPU memory, but ended up giving nice cost/accuracy trade-off choices.<p>Perhaps algorithms will move too fast for dedicated hardware to be worth it, but there will be primitives that should be relevant for a while that can be integrated into whatever hardware we use - see Nvidia Tensor Cores, which also include things like sparsity support.