Deepjazz: AI-generated 'jazz'

189 pointsby mattdennewitzabout 9 years ago

31 comments

rryanabout 9 years ago

This is really neat! But I think it's a stretch to call it AI-generated jazz music.As I understand it, the author has trained an LSTM on a single MIDI file -- "And Then I Knew" by Pat Metheny. The network is then asked to generate MIDI notes in sequence.What this network has been asked to do is to produce an output stream that is statistically similar to the single MIDI input file it has been trained on. It would be more accurate to call this an "And Then I Knew" generator. Its "cost function" -- the function the network is trying to minimize during training -- is exactly how well it reproduced the target song.Neural networks are "universal function approximators". It's not surprising that given a single input, a network can produce outputs that are statistically similar to it.A network that could compose novel MIDI jazz would look like this:* Train a network on a corpus of thousands to hundreds of thousands of MIDI jazz files.* Add significant regularization and model capacity limits to prevent the network from "memorizing" its inputs.* Generate music somehow -- the char-RNN approach described here is fine. There are other methods.You want the network to build representations that capture the patterns of jazz music necessary to pastiche them but not high-level enough representations that the network is exactly humming the tune "And Then I Knew". This is so much of a problem that any paper presenting a novel result in generative modeling pretty much must include a section presenting evidence their model is not memorizing its inputs.I can hum a few classic jazz tunes from memory but that mental process is not jazz music composition -- it's reproducing something from memory. If we're going to call a model "AI-generated jazz" you need some way to tell the network to not hum a tune it knows and instead compose a new tune with the principles/patterns it knows. Since we can't speak to our models and tell them to think one way and not the other, part of the trick in this field is to come up with models that can only do one thing and not the other.

评论 #11521742 未加载

评论 #11521803 未加载

评论 #11521931 未加载

评论 #11524291 未加载

JamilDabout 9 years ago

This sounds to me like the "uncanny valley" of music. It's close to being pleasant, but it's very discordant and hard to listen to…

评论 #11522381 未加载

评论 #11521154 未加载

neurobuddhaabout 9 years ago

Coming from an avid Jazz listener, this is awful. Not even close.I don't mean this as a slight at all, but definitely raise the bar on your experiments.

评论 #11521474 未加载

评论 #11521541 未加载

daviddaviddavidabout 9 years ago

One of the central features of jazz (or any music) is rhythm. In the case of swing-based jazz, including bebop you have the upbeats of 2 and 4 emphasized. It's the opposite of rock. The Metheny track here has a typical rock beat, so it's a very odd target.Also, unless I missed something the clips just play the network's attempt at duplicating the "head" of the track; not the soloing.As a jazz musician I find this cool but I also feel safe that it won't be stealing gigs from me anytime soon.

评论 #11521631 未加载

评论 #11522390 未加载

devinabout 9 years ago

As a card-carrying jazz nerd, I am impressed. If there were more dynamics, some of these soundcloud examples would sound significantly better.ETA: The default midi sound font doesn't do it any favors, either. I have some software instruments I could throw at this that would make it sound a whole lot better.

brandonmencabout 9 years ago

Anyone interested in algorithmic jazz should check out Al Biles:<a href="http://igm.rit.edu/~jabics/" rel="nofollow">http://igm.rit.edu/~jabics/</a>

评论 #11521164 未加载

newobjabout 9 years ago

The best part is that the resultant "jazz" sounds more like vaporwave[1].[1] <a href="https://www.youtube.com/watch?v=PdpP0mXOlWM" rel="nofollow">https://www.youtube.com/watch?v=PdpP0mXOlWM</a>

alexc05about 9 years ago

That's funny, I was just researching this last week.I stumbled across some music generators. A downloadable one <a href="http://duion.com/link/cgmusic-computer-generated-music" rel="nofollow">http://duion.com/link/cgmusic-computer-generated-music</a>And <a href="http://www.abundant-music.com/" rel="nofollow">http://www.abundant-music.com/</a>Both are "procedurally generated music" so I'm not sure where that falls in the AI spectrum.I found that the quality was interesting and there was some potential there but at least in these cases, there were some issues with the quality of the midi instruments and song structure was very "same-y"Anyways, Looking forward to poking around in the DeepJazz code.

mpdehaan2about 9 years ago

Always good to see more computer music projects.I started on recently - and need to do more work on it - to do some things in a bit more of an object-oriented way trying to model more music theory concepts (like scales) as objects, not so much analyzing existing files but making the primatives you might need to build a sequencer (and eventually some generative stuff).If people are interested check out:<a href="https://github.com/mpdehaan/camp" rel="nofollow">https://github.com/mpdehaan/camp</a> (in the README, there is mailing list info).The next thing for me is to make an ASCII sequencer so it's a program that can also be used by people who can't code, and then I'll get back more into the generative parts.

shams93about 9 years ago

George Lewis wrote a realtime improv AI in forth back in the 90s it used midi so the sounds were like general midi at the time but the interplay between human trombone and the machine listening to his playing on the fly was amazing given the limitations of the machines at the time. To be AI jazz it has to be able to jam with humans or other machines. <a href="https://en.wikipedia.org/wiki/George_Lewis_(trombonist)" rel="nofollow">https://en.wikipedia.org/wiki/George_Lewis_(trombonist)</a>

评论 #11522052 未加载

ARothfuszabout 9 years ago

I'd be more impressed if they had trained it on Pat Metheny and then given it "Mary Had a Little Lamb" and said "jazz this up"

评论 #11522005 未加载

评论 #11522749 未加载

trsohmersabout 9 years ago

Serious question: Who is the copyright holder on generated works? The program author? The person who wrote it? Do you have to give any sort of authorship credit to those who created the works in the mined data set? Copyright law in the 21st century is just getting more and more complicated...

评论 #11522319 未加载

twicabout 9 years ago

There's an enjoyable summary of some other efforts in neural network music synthesis here:<a href="https://highnoongmt.wordpress.com/2015/08/11/deep-learning-for-assisting-the-process-of-music-composition-part-1/" rel="nofollow">https://highnoongmt.wordpress.com/2015/08/11/deep-learning-f...</a>The same author's Endless Traditional Music Session supplies all the Irish session music you could ever need, by mechanical means:<a href="http://www.eecs.qmul.ac.uk/~sturm/research/RNNIrishTrad/index.html" rel="nofollow">http://www.eecs.qmul.ac.uk/~sturm/research/RNNIrishTrad/inde...</a>

phatbyteabout 9 years ago

Awesome work, and this is quite interesting, something worth exploring with more depth that an hackaton can't provide.Having said that, and as a Jazz fan, the generated music is horrible. Keep feeding it more jazz tunes :P

gluelogicabout 9 years ago

One thing that comes to mind is that, to me, it sounds like all of the notes' velocities are equal. It would sound a lot more natural if volume differences were incorporated

granttimmermanabout 9 years ago

I built a very similar project for classical music using Theano and MusicXML for a Sound Capstone Project at UW.Blogpost + music: <a href="https://medium.com/@granttimmerman/algo-rhythm-music-composition-using-neural-networks-f89897ff2df7" rel="nofollow">https://medium.com/@granttimmerman/algo-rhythm-music-composi...</a>GitHub: <a href="https://github.com/grant/algo-rhythm" rel="nofollow">https://github.com/grant/algo-rhythm</a>

desireco42about 9 years ago

I respect the criticism of people who love and listen to jazz quite a bit.As someone who maybe is not as sophisticated in his taste for jazz, this sounds good enough for me. Especially this can be passed as elevator music.On the other hand, it would be more valuable if there were more than a single file used for seeding. This way this is a theme that is listenable but will always have the same style of it's seed.I intend to play with it and see if I can get more interesting melodies.

imaginenoreabout 9 years ago

It's rendered with some really shitty sounding instruments. Run it through Ableton Live at least. Or even better, a specialized piano engine.

pjdorrellabout 9 years ago

When human composers attempt to compose original music, they have immediate access to their own subjective judgement of the quality of the music.Until such time as we discover an algorithm that replicates human taste in music, any AI-based approach to composing music will fail because it will not have any feedback about the quality of the music.

return0about 9 years ago

It sounds like with a few epochs it captured some rhythmicity. The notes still sound random, but overall its promising. This is only a hackathon project, I 'm pretty sure we ll see more elaborate networks in the future that make acceptable jazz. Its gonna be a bit more difficult for other kinds of music, i guess.

I_HALF_CATSabout 9 years ago

Can someone explain to me the difference between this and the computer generated music David Cope of the early 1990s? <a href="https://youtu.be/yFImmDsNGdE?t=44s" rel="nofollow">https://youtu.be/yFImmDsNGdE?t=44s</a>It seems like the word 'AI' is getting thrown around.

jbmorgadoabout 9 years ago

An improvement that should be quite straightforward and take you no more than a couple of hours is to use sampled sounds for recording the play.It would massively improve the quality of the output and make it sound more "humane" IMO.You can use the samples from www.freesound.org for instance.

ryanmarshabout 9 years ago

Was expecting to hear some Blue Note, got frantic muzak. Humans are safe... for now it seems.

评论 #11522461 未加载

评论 #11522728 未加载

genolilieabout 9 years ago

<a href="https://www.youtube.com/watch?v=Fq6lypuUPeg" rel="nofollow">https://www.youtube.com/watch?v=Fq6lypuUPeg</a>

squeaky-cleanabout 9 years ago

Even if it is a very limited model and the tracks get boring quickly like everyone is saying, this is still extremely cool. I really need to buy a new GPU that I can run Theano on.

KON_Airabout 9 years ago

Knowing next to nothing about musical terms I couldn't figure out the workflow of the AI. Does it generate note after note trying to follow the learned "structure"?

sengorkabout 9 years ago

This reminds me of AWK Music <a href="http://kmkeen.com/awk-music/" rel="nofollow">http://kmkeen.com/awk-music/</a>

fiatjafabout 9 years ago

I like this because I don't like jazz.

DonHopkinsabout 9 years ago

Hook it up to a speech synthesizer, to make Deep Scat!I played around with looping different speech synthesizers back into different speech recognizers, kind of like audio or video feedback, but with chaotic noise injected like quirks of the synthesizer, the voice, speech speed and pitch, and the audio environment around the microphone (you could talk over it to interfere with the words it was speaking and lay down new words in the loop), working against the lawful pattern matching and error correction behavior of the speech recognizer, and the HMM language model it was trained with.It was a lot like beat poetry, in that it tended to rhyme and have the same number of syllables and use plausible sounding sequences of words that didn't actually make any sense, like Sarah Palin.You can start it out with a sensible sentence, and it will play the telephone game, distorting it again and again. If you slow down the speech rate, words will split into more words or syllables, and if you speed it up, words will collapse into fewer words or syllables, or you can tune the speech rate to maintain the same number of syllables. Its analogous to zooming the video camera in and out with video feedback.It would wander aimlessly randomly around poetic landscapes, sometimes falling into strange attractors in the speech recognizer's hidden markov model and repeating itself with little or no variation.At any time you can join in with your own voice and add words during the pause at the end of the loop, or talk over its voice, much the way you can hold things in front of the camera during video feedback to mix them in.Different speech recognizers are better at recognizing different vocabularies, and therefore like to babble about different topics, depending on which data they were trained on, which we could guess by attepmting to psychoanalyze their incoherent babbling.IBM's ViaVoice was apparently trained on a lot of newspaper articles about the Watergate hearings, as it was quite paranoid, but business like, as if it were dictating a memo, and would start chanting and fixating on phrases like "congressional investigation," and "burglary and wiretapping," and "convicted of conspiracy".Microsoft's speech recognizer had obviously been trained on newspaper articles about the Clinton Lewinsky scandal, since it was quite obsessed with repeatedly chanting about blow jobs (just like the news of the time), and whenever you mentioned Clinton this or Clinton that, it would rapidly converge on Clinton Lewinsky, Clinton presidency, Clinton impeachment, etc.What I'd love to have would be a speech recognizer that returns a pitch envelope and timing that you could apply back on the synthesized words, then it could sing to you!

aaronlevinabout 9 years ago

If you're interested in making deep-jazz more discoverable, consider applying to our Search team! :)<a href="https://soundcloud.com/jobs/2016-02-19-search-engineer-berlin-germany" rel="nofollow">https://soundcloud.com/jobs/2016-02-19-search-engineer-berli...</a>

评论 #11521506 未加载

评论 #11521319 未加载

SubiculumCodeabout 9 years ago

Sorry. Not impressed.

评论 #11521071 未加载

评论 #11520932 未加载

评论 #11520934 未加载