As someone who has spent a <i>lot</i> of time working with text-generating neural networks (<a href="https://github.com/minimaxir/textgenrnn" rel="nofollow">https://github.com/minimaxir/textgenrnn</a>), I have a few quick comments.<p>1) The input dataset from Memegenerator is a bit weird. More importantly, <i>it does not distinctly identify top and bottom texts</i> (some have a capital letter to signifify the start of the bottom text, which isn't always true). A good technique when encoding text for these types of things is to use a control token (e.g. a newline) to indicate these types of behaviors. (the conclusion notes this problem: "One example would be to train on a dataset that includes the break point in the text between upper and lower for the image. These were chosen manually here and are important for the humor impact of the meme.")<p>2) The use of GLoVe embeddings don't make as much sense here, even as a base. Generally the embeddings work best on text which follows real-world word usage, which memes do not follow. (in this case, it's better to let the network train the embeddings from scratch)<p>3) A 512-cell LSTM might be too big for a word-level model of that size; since the text follows rules, a 256-cell Bidirectional might work better.