We published a notebook and a GitHub repo that helps you train synthetic models on highly dimensional datasets (e.g. 1000's of columns, and millions of records). It works by using Gretel's open source header clustering to group correlated data and parallelize training across multiple GPUs.