This is so great! Frankly, I believe that this kind of low-parameter-count high complexity optimization task is the least suitable kind of task for SGD. Bad local optima everywhere. But I didn't let this opinion of mine spoil the fun:<p>I changed Chamfer distance to unbiased Sinkhorn divergence (via GeomLoss), bumped arity to 4, moved randomness out of the training loop (with the goal of making training more stable), and added a LR scheduler.<p>Here's my notebook:
<a href="https://colab.research.google.com/drive/154ffvEWpD7tTW_AIqTDWq6kkeGgmhf7P" rel="nofollow">https://colab.research.google.com/drive/154ffvEWpD7tTW_AIqTD...</a><p>This tree parameter set is quite nice and interpretable:
<a href="https://users.renyi.hu/~daniel/tmp/ifs-christmas-tree-arity-4.gif" rel="nofollow">https://users.renyi.hu/~daniel/tmp/ifs-christmas-tree-arity-...</a>