Music Composition with Deep Learning: A Review

125 pointsby pramodbiligiriover 3 years ago

16 comments

bravuraover 3 years ago

I'm working on audio AI, both academic research and as founder of Spooky Labs. We don't have a webpage yet, but we do have clients. We are using deep learning to create rich new synthesizers that sound like they were designed by aliens, as well as novel vocal manipulation techniques.We've spoken to musicians and producers who are excited about new tools, new sounds, and assistants that automate boring parts of the workflow.But when the problem is framed as "music composition", it just leaves me scratching my head. Like, who's clamoring for that? I'm unaware in the history of music any automatically generated music that isn't seen as an oddity. Even if techniques improve, it's not a really sexy sell. People simply want to listen to music created by people, even if AI music were perfect. Only in commercial applications like stock music or jingles is AI composition in demand.I understand that you can move the goalposts and say: "This isn't about total AI composition, it's about co-composition!" But honestly, I think it's just framing the problem wrong to talk about composition, and its lead to some really strange solution-in-search-of-a-problem research agendas. People should thinking about it through the lens of: How do you use AI to create tools that musicians want?

评论 #28353935 未加载

评论 #28354803 未加载

评论 #28356535 未加载

评论 #28353924 未加载

评论 #28354181 未加载

评论 #28354499 未加载

评论 #28354551 未加载

评论 #28354139 未加载

评论 #28356869 未加载

评论 #28360151 未加载

评论 #28355444 未加载

评论 #28360351 未加载

评论 #28361721 未加载

评论 #28355198 未加载

评论 #28370764 未加载

评论 #28354018 未加载

评论 #28354963 未加载

评论 #28354330 未加载

评论 #28354512 未加载

评论 #28358850 未加载

YeGoblynQueenneover 3 years ago

This is an interesting and informative article but to be a bit meta I'm concerned when I see articles like this on HN because they don't usually do a great job of summarising the work that has been done in a certain area before the advent of deep neural nets. This is important because very often, especially when it comes to generative art, the standard approaches used before deep neural nets could do thins that modern deep neural nets cannot do, in particular when it comes to structured generation.For example, alorithmic music is the subject area of generating music with algorithmic approaches, not necessarily using a computer. The wikipedia page seems to be a bit poor in detail but it lists a number of different approaches most of which are not machine larning approaches:<a href="https://en.wikipedia.org/wiki/Algorithmic_composition" rel="nofollow">https://en.wikipedia.org/wiki/Algorithmic_composition</a>I'm by no means an expert but that's the point. When a non-expert reads an article like the one above, I fear they may get an impression that neural nets are the first approach ever to generate music, or that they are the best approach ever to generate music, or anyway some kind of misunderstanding that is natural to draw from incomplete information.The thing to try and keep in mind is that computer scientists, and other scientists and creative people, had been able to do amazing things with the tools they had in their disposal long before the advent of deep neural nets. And that there are many such tools that are not deep neural nets. Somehow these amazing things flew under the radar of technies - until deep neural nets came along and suddendly everyone is amazed that "wow, neural nets can do X!". Well, what else can do X? That's something worth trying to find out.

评论 #28355878 未加载

shannifinover 3 years ago

Decent review! I've been fascinated by this topic for a while. I think the real magic is of course in the specifications of the details, but I do think it's innevitable that AI-generated (or assisted) music will come to dominate the field as it democratizes what is currently a more specialized skill set.I've always found the resistance to the idea of AI-generated music a bit odd, but I think it stems from a more philosophical idea about where beauty comes from. For example, it's tempting to imagine that the beauty of a Beethoven symphony comes from Beethoven himself, thus his music facilitates an intense personal emotional connection across time and space. The idea of AI music challenges this notion a bit; if AI "composed" something beautiful, and its decisions were not founded on emotions, where is the beauty coming from? Of course, I'd say the beauty is in the natural phenomenon of music itself; the beauty is in the human brain's ability to have a sense of "music" at all. In this way, a connection to Beethoven through his music is a shared recognition of the beauty of certain natural musical phenomenon. The beauty did not come from Beethoven, rather it was "captured" and shared by Beethoven.As far as the business side goes, I've definitely found that there's interest in at least AI-assisted music composition, both from a melody-generating app I used to sell some years ago (limited as it was), and just collecting emails for a new web app I'm working on at <a href="https://www.tunesage.com/" rel="nofollow">https://www.tunesage.com/</a> ... I know there are already other services out there too for AI music, and I expect the space will continue to grow. I think it's gonna be awesome. :)

评论 #28354461 未加载

评论 #28354656 未加载

评论 #28354805 未加载

bjourneover 3 years ago

It all comes down to generating tokens. The dominant paradigm is to compose the music by generating the next token based on the last few seen tokens, then take that and generate the next token, and so on. No one has been able to significantly improve upon that. All development in the last ten years or so has been about creating ever larger models with more data so that emitted sequences will stay coherent for longer. The problem with this paradigm is that it works amazingly well for short sequences, but often fails to stay coherent for longer sequences. So composing interesting ten seconds long instrumental music is a solved problem - composing full songs three-to-five minutes long is far beyond the reach of current state-of-the-art.Text-generating models suffer from the same problem. E.g. GPT-3 will generate two to three-hundred tokens or so and then it will begin to spout nonsense. That it still works so well is because it's trained with a massive corpus and because its tokens are richer. It operates on the world level, while most music composition models operate on the note level which are more analogous to characters than words. If someone were to be able to figure out how to represent multi-instrumental compositions using "music vectors" (analogous to word vectors) that would probably lead to a major breakthrough.

评论 #28354941 未加载

评论 #28354793 未加载

评论 #28355535 未加载

streamofdigitsover 3 years ago

A key question imho is this: will algorithmic music composition ever produce anything else than more or less sophisticated musac, effectively recycling a particular musical corpus of a particular music tradition? Producing a memorable, emotionally moving piece is not something that is generally reducible to prescription. If the DL based pattern matching is sufficiently intelligent and produces something worthwhile, will it actually feel different to an existing piece?Music perception and appreciation is a deeply human feature (is there a "purpose" to it or is it actually a piece of redundant brain code? <a href="https://brianjump.net/2020/11/02/why-does-music-exist/" rel="nofollow">https://brianjump.net/2020/11/02/why-does-music-exist/</a>) that has both a biological and cultural basis. There is a feedback loop of contemporary musical sounds a young brain is exposed to and the perception machinery and associated innate pleasurable responses.Think of the countless cultural innovations that underpin human music: tunings, scales, rhythms, genres and the deep link to voice and song. None of those has any (ex-ante) algorithmic feel (to the chagrin of Pythagoreans and other ex-post numerologists). Coming up with music feels literally like plucking beautiful sounds from thin air.In any case its a fascinating research domain...

评论 #28355716 未加载

thebrickstaover 3 years ago

The best definition I've seen for the success of a piece of music is this: "What emotion is the artist trying to convey and how well does it convey it?"Throughout the composing, arranging, recording, mixing, and mastering process, there are thousands of choices to be made, and the correctness of each choice is entirely linked back to that goal: Does the choice help to convey the emotion, or does it detract from it?To that end, there is no correct choice, no correct or optimal harmony, no correct note, no correct rhythm, no correct timbre. It's all contextual in relation to conveying the desired emotion.I'm really not sure how you could ever train a NN to make choices in that regard without first trying to teach them how to understand the impacts of their choices on the emotions conveyed.At best, you may be able to train a NN to reproduce emotionally-void works in a particular style, and perhaps assign some emotion through the timbres selected (ambient music comes to mind here). Still, this isn't much of an achievement. You could easily codify the rules taught in Music 101 about harmonization and melody composition to a computer and have it spit out bland but pleasant excerpts, no deep learning required.

adamnemecekover 3 years ago

I’ve been working on an IDE for music composition <a href="http://ngrid.io" rel="nofollow">http://ngrid.io</a>. Launching soon.

TomSwirlyover 3 years ago

The article starts really badly with "Music is generally defined as a succession of pitches or rhythms, or both, in some definite patterns."The big three in music are pitch, rhythm and TIMBRE, guys, TIMBRE (the "sound" or "tone" of the piece).Why is Beethoven's Ode To Joy somewhat annoying on the recorder, and can drive a room of hardened Germans to tears when played by a top-notch symphony orchestra? Timbre!In the construction of classical musical pieces, this is controlled with orchestration. It's why a Hans Zimmermann score sounds punchy or etheriel or whatever is called for by the script.I basically stopped there, because, well, if you don't care how your music _sounds_ it isn't going to sound very good.

评论 #28356805 未加载

评论 #28355724 未加载

PennRoboticsover 3 years ago

The paper begins with David Cope. He evidently has a Youtube channel with algorithmic music.<a href="https://www.youtube.com/user/davidhcope/videos" rel="nofollow">https://www.youtube.com/user/davidhcope/videos</a>

评论 #28357999 未加载

sharikousover 3 years ago

I wonder why they don't mention the problem of the audio quality of the output. As far as I know the best models work on magnitude spectrograms and have issues with recreating the phase information. Sub-par algorithms like Griffin-Lim are used instead

评论 #28354538 未加载

geekamongusover 3 years ago

It will only be a matter of time until the AI learns that what sounds pleasing to most humans is often some form of, or extract from, Pachelbel's Canon in D.

评论 #28355203 未加载

inferenseover 3 years ago

ever since I saw the VAE learn to synthesize simple tunes followed by listening to the Hello World by Skygge, I contemplated about the inflection point of the music industry [1][2]. At first some of the music would be a close collaboration of a machine and an artist, but at some point, I wonder if we’d be primarily listening to tunes generated to our specific taste similarly to spotify’s discover weekly. Though it would be all be much closer to what we want to listen to (exploit & explore) at a given time.[1]<a href="https://www.youtube.com/watch?v=G5JT16flZwM" rel="nofollow">https://www.youtube.com/watch?v=G5JT16flZwM</a>[2]<a href="https://blog.nameshield.com/blog/2018/01/23/hello-world-musical-creation-artificial-intelligence/" rel="nofollow">https://blog.nameshield.com/blog/2018/01/23/hello-world-musi...</a>

spywaregorillaover 3 years ago

I'm not particularly familiar with music creation, but it always seemed to me neural nets would be better at handling music production more than composition. That is, changing the sound to be, say, a trent reznor style output given the original.Does that make any sense?

评论 #28357139 未加载

DrNukeover 3 years ago

Automated music (art) generation + NFTs = new printing money aka new trading bubble?

qwerty456127over 3 years ago

I am eager to be able to choose a classical music piece (e.g. Bach Cello Suite #1 in G) and ask an AI to generate an infinite sequence very close to its style but variating slightly.

gwernover 3 years ago

Not a single mention of Jukebox.