As usual* the result sounds awfully unstructured and unenjoyable and could be aswell be achieved by some random walk through a musical scale, since if you put some basic music theory in program form, you can get these small harmonic structures pretty easily.
(*since projects like this seem to pop up every couple of months)<p>Like other people already mentioned, the part that usually gets neglected is the overarching dramatic structure of a musical piece. Compare a complete shakespeare piece to a pile of randomly thrown together half-sentences.<p>I don't fully understand the fascination of music generation with some ai-neural-learning buzzword bingo technique that always gets kickstarted by dumb-force-analysing a human made music corpus to achieve it.<p>What in a musical sense is much more interesting is to generate _new_ music that cannot be composed by a human, and cannot be played by a human. That's playing to the strength of the machines. Sonification of large datasets, sonification of function behaviour. Sonification of the binary world, that's so different to ours. This is much more interesting than the 10th failed emulation of a simple folk song.<p>Nevertheless, as a students piece about programming neural networks, it's certainly ok, the presentation is nice, but the result is uninspiring, like building a car tire out of bananas, just because it's possible. Just let the folk songs belong to the actual folk.<p>As a side note: what would happen if the result were millions of super nice catchy folk tunes on a button press? Would it be the end of pop music as we know it? Maybe i redact my opinion.
I'm quite sure I have seen other attempts to generate music with RNNs recently, although I don't remember exactly anymore. You don't cite that many references to other approaches, only the one from Boulanger-Lewandowski from 2012.<p>I did a quick search, and I probably miss a lot, but I found these:<p><a href="http://papers.nips.cc/paper/5655-deep-temporal-sigmoid-belief-networks-for-sequence-modeling" rel="nofollow">http://papers.nips.cc/paper/5655-deep-temporal-sigmoid-belie...</a><p><a href="http://dl.acm.org/citation.cfm?id=2806383" rel="nofollow">http://dl.acm.org/citation.cfm?id=2806383</a><p><a href="http://gitxiv.com/posts/WEoQCj8hxHz6vPxe6/gruv-algorithmic-music-generation-using-recurrent-neural" rel="nofollow">http://gitxiv.com/posts/WEoQCj8hxHz6vPxe6/gruv-algorithmic-m...</a>
<a href="https://github.com/MattVitelli/GRUV" rel="nofollow">https://github.com/MattVitelli/GRUV</a><p><a href="http://www.hexahedria.com/2015/08/03/composing-music-with-recurrent-neural-networks/" rel="nofollow">http://www.hexahedria.com/2015/08/03/composing-music-with-re...</a>
<a href="https://news.ycombinator.com/item?id=10028878" rel="nofollow">https://news.ycombinator.com/item?id=10028878</a><p>I haven't really looked into any of these, so I'm not sure about the differences. But it would be good if you cite some relevant other works and point out the differences.
In my opinion, anyone who works on music generation should take a look at Karma (<a href="http://karma-lab.com/" rel="nofollow">http://karma-lab.com/</a>) for a baseline of what can be achieved by simple math and plain old programming. Probably not particularly interesting from programmer's perspective (it's closed-source and to the best of my knowledge doesn't use anything fancy), but the end results are spectacular and used in real music.
I would love to see a real-time bebop improvisation generator in the style of Charlie Parker, Sonny Rollins, Bill Evans, et. al. - bebop is definitely a musical (jazz) language that I bet would be well suited to RNNs.
You could feed Lilypond [0] files into the network. You might gain more long-term structure that way.<p>It looks like this, you can do repeats and everything else:<p><pre><code> \new Voice \with {
\consists "Ambitus_engraver"
} \relative c' {
\voiceTwo
es4 f g as
b1
}
</code></pre>
[0] <a href="http://lilypond.org/" rel="nofollow">http://lilypond.org/</a>
Both of the pieces at the top of the article sound like off-key broken record renditions of the main refrain of "Jesu, Joy of Man's Desiring" to me. It's like the RNN cannot hold enough state to express the structure of a real musical piece and it just emits riffs here and there of main themes from its training set.<p>What would be somewhat impressive is if it spontaneously figured out the note sequence I hear from observing its re-expression in bits and pieces from various jigs and folk pieces in its training set, kind of like this:<p><a href="https://www.youtube.com/watch?v=XPLp_gInC-o" rel="nofollow">https://www.youtube.com/watch?v=XPLp_gInC-o</a>
Cool. Anyone have an opinion on "state of the art" for music generation? I realize this is entirely subjective. This one sounds pretty interesting! It'd be awesome to get something like this on a top 10 list and start influencing man made music. We can't be so far off from that. The kids these days love techno and that is easily synthesized relative to music with original lyrics and voices.
Fascinating piece of research and the details in the write-up managed to click mostly even though I know it's a level far above my head. Well done and glad to have come across it.