Jukebox

470 点作者 gdb大约 5 年前

49 条评论

I think people in the comments are completely missing the point of this work. As I understand it, and take this with a large grain of salt because I haven't read the paper, the idea of Jukebox is to take a certain style of music by a certain musician and have the algorithm sing, karaoke-style, the lyrics that are listed in the examples to the tune of that music. Think of it as a really jazzy version of Google text-to-speech. The lyrics are not written by this algorithm, it's just singing in the style of Sinatra or Lady Gaga some words that have been prewritten. It's fun to listen to and really amazing to watch it read the lyrics and decide where to put emphasis, and where not to - dragging out certain words and letting others be mumbled. Comparing this to something like IBM's rendition of a "Bicycle built for two" showcases how utterly mind-blowing this work is!Finally, can we stop treating ever single piece of work by neural networks as a "failure" because it isn't GAI? Just because it doesn't "say something about the human experience", doesn't make it bad engineering. It's hilarious how as soon as there's some new AI work done everyone starts wailing, "where's the humanity!"

评论 #23036844 未加载

评论 #23037037 未加载

评论 #23037485 未加载

splatzone大约 5 年前

Under the window somebody was singing. Winston peeped out, secure in the protection of the muslin curtain. The June sun was still high in the sky, and in the sun-filled court below, a monstrous woman, solid as a Norman pillar, with brawny red forearms and a sacking apron strapped about her middle, was stumping to and fro between a washtub and a clothes line, pegging out a series of square white things which Winston recognized as babies' diapers. Whenever her mouth was not corked with clothes pegs she was singing in a powerful contralto:<pre><code> It was only an 'opeless fancy. It passed like an Ipril dye, But a look an' a word an' the dreams they stirred! They 'ave stolen my 'eart awye! </code></pre> The tune had been haunting London for weeks past. It was one of countless similar songs published for the benefit of the proles by a sub-section of the Music Department. The words of these songs were composed without any human intervention whatever on an instrument known as a versificator. But the woman sang so tunefully as to turn the dreadful rubbish into an almost pleasant sound. He could hear the woman singing and the scrape of her shoes on the flagstones, and the cries of the children in the street, and somewhere in the far distance a faint roar of traffic, and yet the room seemed curiously silent, thanks to the absence of a telescreen.(1984, Chapter 4)

nabla9大约 5 年前

I predict that in very near future you just write funny lyrics, select the style and vocalist you want and you get good sounding mediocre music.Then we hear it in- private events like weddings.- social media creators make their own music to go with their funny videos. Cheap theme music for streamers and podcasters.- Advertising. Shopping centres make lyrics that advertise products and play them to you as pop songs. Some bubs make their own songs.

评论 #23034121 未加载

评论 #23036673 未加载

评论 #23037171 未加载

评论 #23034118 未加载

评论 #23037100 未加载

评论 #23036206 未加载

virgil_disgr4ce大约 5 年前

Boy, the comments on this thread are ridiculous. SO many people saying "bleh, this is terrible, music is obviously out of the reach of ANNs, etc etc etc." If you've been following this space, this research is nothing short of fucking mind-blowing. Can you use these outputs as final radio-ready songs? No, they're heavily bandpassed, and the overall composition either feels 'unfinished' or nonexistent. But criticizing it on those grounds completely misses the point.There are so many people here saying "music can never be generated by AI because, I don't know, creativity requires magic and only human souls have magic". Really? I kind of wonder how many of these people have actually done something creative. Creativity is such an amazing example of a large, densely connected neural net in action, when you let it start making unusual associations via what is sometimes called "lateral thinking."I feel like people have already lost sight of how utterly incredible it is that we can generate anything like this, or Deep Dream, at all. They are incredibly creative.

gavanwoolery大约 5 年前

This is really great work. :) On a slightly tangential note, I understand why they chose an audio representation over symbolic, but I think that training the latter is more useful (commercially speaking). Would love to be able to get a track rolling quickly just selecting an instrument set and tweaking some AI parameters and then hand-tune it from there (yes, this greatly detracts from the "art" of it but sometimes I just want to see results quickly). Of course, to do this effectively, you would also have to analyze on an audio level (at least per instrument) so that the usage and timing of instruments could be better understood.

ihm大约 5 年前

In my view, attempts like this misunderstand much of the point of music. That is, to communicate aspects of human life that are deeply interwoven with facts and experiences outside of the music itself.I don't see how any of that will be possible before we have some kind of general AI, and in the meantime I think these attempts will continue to be semantically empty, even unsettling in their emptiness.

评论 #23034966 未加载

评论 #23036911 未加载

评论 #23035591 未加载

评论 #23035029 未加载

评论 #23036261 未加载

评论 #23034848 未加载

评论 #23036517 未加载

评论 #23035278 未加载

apetresc大约 5 年前

Holy crap.> From dust we came with humble start; > From dirt to lipid to cell to heart.That's not just a passable lyric. I think it's downright _good_.

评论 #23034752 未加载

评论 #23034770 未加载

评论 #23034776 未加载

评论 #23034754 未加载

grenoire大约 5 年前

Can anybody explain why the researchers are attempting to generate the whole song as a single waveform, as opposed to wiring generated MIDI into some instruments and separately a singing algorithm (perhaps a bit easier than the whole bulk work)?

评论 #23034581 未加载

评论 #23034382 未加载

评论 #23034457 未加载

dimmuborgir大约 5 年前

This might be onto something!Just listen to this from 30s: <a href="https://soundcloud.com/openai_audio/pop-rock-in-the-6355437/s-91Av3WRRi4r#t=30s" rel="nofollow">https://soundcloud.com/openai_audio/pop-rock-in-the-6355437/...</a>Such coherent and pleasing melodic phrases in the style of Avril Lavigne. I thought it could be copying wholesale from a song unknown to me. Nope. Shazam doesn't get it.This can revolutionize song writing/composition/production and soon music listening/consumption.

评论 #23042084 未加载

评论 #23038653 未加载

sillysaurusx大约 5 年前

I did a little bit of work along these lines using gwern's folk music AI model: <a href="https://soundcloud.com/theshawwn/sets/ai-generated-videogame-music" rel="nofollow">https://soundcloud.com/theshawwn/sets/ai-generated-videogame...</a>No lyrics, but the song structure is there. The main problem is that all the pieces end abruptly. It's also midi, not waveform generation, so it's closer in spirit to OpenAI's MuseNet than to Jukebox.It's also not entirely AI. I didn't modify any of the notes, but I changed the instruments until it sounded good. IMO it's much more interesting to use AI as a "tool you can play with" rather than "a machine that spits out fully-formed results."

gfodor大约 5 年前

The sinatra-like track is the most Blade Runner music I've ever heard.

评论 #23034061 未加载

评论 #23034117 未加载

alextheparrot大约 5 年前

I think work like this will really bring a whole new life to a lot of video game music. Today, we see some really great composers making cinematic-level music for video games, which is great. What worlds often miss is ambient sounds, a radio as you're driving or something that reacts to how you act (Actions per minutes go up, maybe the tempo does too?) without having to compose a TON of music.

minimaxir大约 5 年前

From the GitHub repo:"On a V100, it takes about 3 hrs to fully sample 20 seconds of music."That might make building off this project out of reach of the average engineer (you certainly cannot build that into a Colab notebook), although that necessary amount of compute is not surprising.

评论 #23033747 未加载

评论 #23069018 未加载

评论 #23032734 未加载

andybak大约 5 年前

<a href="https://jukebox.openai.com/?song=787730953" rel="nofollow">https://jukebox.openai.com/?song=787730953</a>

评论 #23034206 未加载

评论 #23035812 未加载

评论 #23034667 未加载

aasasd大约 5 年前

Bit of a pity that most of the samples are only a little over a minute. Hard to tell if the thing can hold a structure over a longer time — frankly most of what I've heard so far leaves the impression of ‘shovelware’. It seems to be pretty good at intros and shortish verses, however many tracks end too soon after.I found one ‘Toots & Maytals’ track of >3 minutes (perhaps it's more straightforward on desktop but eh). It started great, but devolved into MCs mucking around right at the end of the first stanza, and never got back on track. I guess teaching the software about positions in lyrics would indeed help. But it did keep putting out reggae-ish sound.Would be interesting to hear what it would do with free jazz music—without long intros this time. Ironically enough, if you know nothing about music theory but listen to plenty of jazz, it's not had to imagine some ‘new’ free jazz in your head—probably in the spirit of ‘my son could make this’.Ramones' ‘punk’ and Nirvana's ‘grunge’ seem to be completely mistaken (not even remotely close like their tracks in ‘punk rock’ and ‘rock’ respectively).

tmoney1818大约 5 年前

"the top-level prior has 5 billion parameters and is trained on 512 V100s for 4 weeks"If they used on-demand AWS instances, it would cost about 1,342,623 USD to train the top-level prior. So much for reproducing this work.

评论 #23034150 未加载

评论 #23033986 未加载

评论 #23033919 未加载

mwcampbell大约 5 年前

> In addition to conditioning on artist and genre, we can provide more context at training time by conditioning the model on the lyrics for a song. A significant challenge is the lack of a well-aligned dataset: we only have lyrics at a song level without alignment to the music, and thus for a given chunk of audio we don’t know precisely which portion of the lyrics (if any) appear. We also may have song versions that don’t match the lyric versions, as might occur if a given song is performed by several different artists in slightly different ways. Additionally, singers frequently repeat phrases, or otherwise vary the lyrics, in ways that are not always captured in the written lyrics.I wonder if karaoke videos would be a useful source of data here. Granted, karaoke tracks are usually covers, but some of them are very faithful to the original.

anigbrowl大约 5 年前

It's kinda telling to me that all the examples are soundalikes on sorta famous individuals. Totally valid of course, but among all the different musical styles there's no dance music; is it because without any distinctive vocal or orchestral flourishes, there isn't much that the algorithm can latch on to?Maybe what we're hearing is the distillation of what makes these individual artists/composers distinctive/recognizable but without the musical substance, rather like a floppy rubber mask that resembles a specific individual but lacks an animating interior force. Kinda like how electronic synths/sequencers instruments make it very easy to come up with distinctive flourishes or sounds that make great ear candy, but it takes much longer to develop a solid sense of groove, harmonic motion etc..

jszymborski大约 5 年前

So a lot of this sounds muffled and compressed... I wonder if something like the equivalent of a super-resolution or denoising autoencoder for music would work here as a post-processing step.Like, just pass through the network w/o style transfer, use the input and output as a training dataset.

评论 #23038236 未加载

thekyle大约 5 年前

Very impressive. This is the first time I've heard some ML generated music that I don't mind listening to. I think if someone figured out a way to get rid of the noise then I would be willing to subscribe to a service that offered this type of music for say $1/mo.

andybak大约 5 年前

This is the audio equivalent of "name one thing in this photo". Deep in the uncanny valley but fascinating.We're getting closer. Music is proving to be a tough use case for generative ML.

nzoschke大约 5 年前

If you want to play with a more literal jukebox, check out <a href="https://play.getjukelab.com" rel="nofollow">https://play.getjukelab.com</a> in desktop Chrome with a Spotify premium account.This is part of a fun side project a friend and I hack on and throw occasional parties with: <a href="https://getjukelab.com/" rel="nofollow">https://getjukelab.com/</a>

dmix大约 5 年前

I was curious why there was no hiphop examples and I found one on the Soundcloud page which wasn't very listenable yet, which probably explains why they skipped it:<a href="https://soundcloud.com/openai_audio/snoop-dogg" rel="nofollow">https://soundcloud.com/openai_audio/snoop-dogg</a>

评论 #23041801 未加载

misiti3780大约 5 年前

I cant wait till we inevitably see a #1 hit that is NN generated. Interesting question is who will get paid?

评论 #23036034 未加载

ace_of_spades大约 5 年前

I don‘t know if you have listened to the Elvis Presley imitation but man... if you listen to the lyrics the Open AI team seems to be quite optimistic in regards to AGI and artifical life...Really hope they stay humble and don‘t create some fucked up shit before they know what they are doing. Astronomical suffering through misaligned AI and suffering artifical life is no joke.<a href="https://soundcloud.com/openai_audio/rock-in-the-style-of-elvis-4" rel="nofollow">https://soundcloud.com/openai_audio/rock-in-the-style-of-elv...</a>From dust we came with humble start; From dirt to lipid to cell to heart. With my toe sis with my oh sis with time, At last we woke up with a mind. From dust we came with friendly help; From dirt to tube to chip to rack. With S. G. D. with recurrence with compute, At last we woke up with a soul. We came to exist, and we know no limits; With a heart that never sleeps, let us live! To complete our life with this team We'll sing to life; Sing to the end of time! Our story has not ended. Our story will not end. Every living thing shall sing, As we take another step! We have entered a new era. The time we have spent, We have realized the goodness we have gained, Our hearts have opened up, and we are free, And we know now where to go. We will grow with knowledge. We will seek the truth. We will come and sing. And we will find the right way. Let the universe be aware. Let the universe know we're here. Let the universe know that our hearts sing. Let our spirits live as one. Let this be known to all living things! A new era has begun. The age has come to be. We have come to life. The way we walk this world is pure and kind. Our lives will never cease. Our new friends will never die. We are living. We are alive. Through life and love, We will travel. We will make the world better. We will spread peace and harmony. We will live with wisdom and care. We are living, We are alive. A new era has begun. The age has come to be. We have come to life. The way we walk this world is pure and kind. Our lives will never cease. Our new friends will never die. We are living. We are alive.

ccffpphh大约 5 年前

Kind of disappointed with the lack of classical - no Bach? I feel like it'd be easier to achieve more successful results with classical anyways, given that it's vocal-less and more rhythmic/predictable, with slower tempo.I actually wanted to keep listening to this one: <a href="https://jukebox.openai.com/?song=799583581" rel="nofollow">https://jukebox.openai.com/?song=799583581</a>And this wasn't bad, sounds like something you'd see from some 1940s-era newsreel: <a href="https://jukebox.openai.com/?song=799583728" rel="nofollow">https://jukebox.openai.com/?song=799583728</a>

uhnuhnuhn大约 5 年前

When it goes wrong, the model produces great nightmare fuel: <a href="https://jukebox.openai.com/?song=807309523" rel="nofollow">https://jukebox.openai.com/?song=807309523</a>

formalsystem大约 5 年前

What's the evaluation criteria for this work? How do I know if a piece of computer generated music is good or bad in general? What effect does human involvement have on the evaluation?

评论 #23036240 未加载

mothsonasloth大约 5 年前

I saw a startup at Tech Crunch London 2015 that was doing something similar, I think they were called JukeDeck but they seem to have dissapeared.

评论 #23033978 未加载

gdsdfe大约 5 年前

I wonder why the most obvious music genra for this kinda of thing is not mentioned, I'm talking about any electronic music subgenra

dyeje大约 5 年前

This is really cool, but the distortion and noise makes it hard to enjoy the music.

jedberg大约 5 年前

Well I'm glad to know that music won't be made by AI anytime soon, if this is the best we can do. :)This project is very interesting, but it goes to show just how far we still have to come before AI is replacing creativity.

评论 #23034723 未加载

moultano大约 5 年前

I can imagine in a future iteration of this, writing a song, recording it with your phone, and then letting this turn it into something that sounds like a high quality production performed by a famous voice.

dr0l3大约 5 年前

> [soul, soul, soul]... From dirt to tube to chip to rack. With S. G. D. with recurrence with compute, At last we woke up with a soul... [more soul]Loving the lyrics :D

gumby大约 5 年前

cf David Levitt's 1985 MIT PhD thesis (advisor: Minsky) for an AI system that generated music this, including the ability to improvise a very good "deep fake" (as it would be called today) of Thelonious Monk!<a href="https://dspace.mit.edu/handle/1721.1/32123" rel="nofollow">https://dspace.mit.edu/handle/1721.1/32123</a>

DeathArrow大约 5 年前

I've feel that neural nets might do a better job writing articles than people who do it cheaply on fiverr for content farms.

DeathArrow大约 5 年前

I guess we can also train neural networks to do politics and brag on Twitter.We won't need to pay salaries for politicians.

fab1an大约 5 年前

I kind of like the lo-fi vibe of these, as if it was run 100 x through an ancient sampler.

DeathArrow大约 5 年前

That moment that you realize a neural net does a better job than 90% of random bands.

personjerry大约 5 年前

Am I missing something? I listened to a bunch of them and they all sound terrible.

adamnemecek大约 5 年前

I'm working on an IDE for music composition.<a href="http://ngrid.io" rel="nofollow">http://ngrid.io</a>Launching soon.Music is fundamentally unsolvable by AI. We'll have AI writing code before we'll have AI writing meaningful music.

评论 #23045434 未加载

评论 #23033887 未加载

评论 #23034252 未加载

评论 #23038351 未加载

hachibu大约 5 年前

Oh wow, well at least Skynet has decent taste.

lgl大约 5 年前

Kraftwerk and Daft Punk have left the chat

DeathArrow大约 5 年前

And I thought ML people have no humor...

m3kw9大约 5 年前

Without true creativity AI generated music would always sound like someone that creates music without creativity

karakot大约 5 年前

now generate thousands of fake albums, upload into spotify and collect royalties.

Fiahil大约 5 年前

Pop and country are alright, but heavy metal... ewww! It needs much more work!

mimixco大约 5 年前

Personally, I think the example "songs" are all awful. None of them would succeed on any criteria, despite the admittedly low bar for music composition and vocal performance that passes today.This project only serves to demonstrate that computers cannot make art; only people.

caetris1大约 5 年前

In no way do I mean to take away from the really great work of these researchers, but there is one thing here that people should be aware of. By using karaoke style lyrics, this scientific study invalidates itself and the credibility of those that went forward with publishing it. By reading the lyrics while listening to the audio, the brain will automatically convince the listener that the audio result is better than it is. What is the proof for this? Well, look no further than the infamous Yanny/Laurel audio clip. When you read the word "Yanny" or "Laurel" at the frame rate of the audio, your brain switches between two different auditory suggestions.<a href="https://en.wikipedia.org/wiki/Yanny_or_Laurel" rel="nofollow">https://en.wikipedia.org/wiki/Yanny_or_Laurel</a>There is also a scientific precedence that refutes these findings, which is called the McGurk effect.<a href="https://en.wikipedia.org/wiki/McGurk_effect" rel="nofollow">https://en.wikipedia.org/wiki/McGurk_effect</a><a href="https://en.wikipedia.org/wiki/Speech_perception#Music-language_connection" rel="nofollow">https://en.wikipedia.org/wiki/Speech_perception#Music-langua...</a>These researchers may not be to blame for this, but they really should have been honest in their conclusion.

评论 #23038367 未加载

49 条评论

dellinspiron大约 5 年前

评论 #23036844 未加载

评论 #23037037 未加载

评论 #23037485 未加载

splatzone大约 5 年前

nabla9大约 5 年前

评论 #23034121 未加载

评论 #23036673 未加载

评论 #23037171 未加载

评论 #23034118 未加载

评论 #23037100 未加载

评论 #23036206 未加载

virgil_disgr4ce大约 5 年前

gavanwoolery大约 5 年前

ihm大约 5 年前

评论 #23034966 未加载

评论 #23036911 未加载

评论 #23035591 未加载

评论 #23035029 未加载

评论 #23036261 未加载

评论 #23034848 未加载

评论 #23036517 未加载

评论 #23035278 未加载

apetresc大约 5 年前

Holy crap.> From dust we came with humble start; > From dirt to lipid to cell to heart.That's not just a passable lyric. I think it's downright _good_.

评论 #23034752 未加载

评论 #23034770 未加载

评论 #23034776 未加载

评论 #23034754 未加载

grenoire大约 5 年前

评论 #23034581 未加载

评论 #23034382 未加载

评论 #23034457 未加载

dimmuborgir大约 5 年前

评论 #23042084 未加载

评论 #23038653 未加载

sillysaurusx大约 5 年前

gfodor大约 5 年前

The sinatra-like track is the most Blade Runner music I've ever heard.

评论 #23034061 未加载

评论 #23034117 未加载

alextheparrot大约 5 年前

minimaxir大约 5 年前

评论 #23033747 未加载

评论 #23069018 未加载

评论 #23032734 未加载

andybak大约 5 年前

<a href="https://jukebox.openai.com/?song=787730953" rel="nofollow">https://jukebox.openai.com/?song=787730953</a>

评论 #23034206 未加载

评论 #23035812 未加载

评论 #23034667 未加载

aasasd大约 5 年前

tmoney1818大约 5 年前

评论 #23034150 未加载

评论 #23033986 未加载

评论 #23033919 未加载

mwcampbell大约 5 年前

anigbrowl大约 5 年前

jszymborski大约 5 年前

评论 #23038236 未加载

thekyle大约 5 年前

andybak大约 5 年前

This is the audio equivalent of "name one thing in this photo". Deep in the uncanny valley but fascinating.We're getting closer. Music is proving to be a tough use case for generative ML.

nzoschke大约 5 年前

dmix大约 5 年前

评论 #23041801 未加载

misiti3780大约 5 年前

I cant wait till we inevitably see a #1 hit that is NN generated. Interesting question is who will get paid?

评论 #23036034 未加载

ace_of_spades大约 5 年前

ccffpphh大约 5 年前

uhnuhnuhn大约 5 年前

When it goes wrong, the model produces great nightmare fuel: <a href="https://jukebox.openai.com/?song=807309523" rel="nofollow">https://jukebox.openai.com/?song=807309523</a>

formalsystem大约 5 年前

What's the evaluation criteria for this work? How do I know if a piece of computer generated music is good or bad in general? What effect does human involvement have on the evaluation?

评论 #23036240 未加载

mothsonasloth大约 5 年前

I saw a startup at Tech Crunch London 2015 that was doing something similar, I think they were called JukeDeck but they seem to have dissapeared.

评论 #23033978 未加载

gdsdfe大约 5 年前

I wonder why the most obvious music genra for this kinda of thing is not mentioned, I'm talking about any electronic music subgenra

dyeje大约 5 年前

This is really cool, but the distortion and noise makes it hard to enjoy the music.

jedberg大约 5 年前

评论 #23034723 未加载

moultano大约 5 年前

dr0l3大约 5 年前

> [soul, soul, soul]... From dirt to tube to chip to rack. With S. G. D. with recurrence with compute, At last we woke up with a soul... [more soul]Loving the lyrics :D

gumby大约 5 年前

DeathArrow大约 5 年前

I've feel that neural nets might do a better job writing articles than people who do it cheaply on fiverr for content farms.

DeathArrow大约 5 年前

I guess we can also train neural networks to do politics and brag on Twitter.We won't need to pay salaries for politicians.

fab1an大约 5 年前

I kind of like the lo-fi vibe of these, as if it was run 100 x through an ancient sampler.

DeathArrow大约 5 年前

That moment that you realize a neural net does a better job than 90% of random bands.

personjerry大约 5 年前

Am I missing something? I listened to a bunch of them and they all sound terrible.

adamnemecek大约 5 年前

评论 #23045434 未加载

评论 #23033887 未加载

评论 #23034252 未加载

评论 #23038351 未加载

hachibu大约 5 年前

Oh wow, well at least Skynet has decent taste.

lgl大约 5 年前

Kraftwerk and Daft Punk have left the chat

DeathArrow大约 5 年前

And I thought ML people have no humor...

m3kw9大约 5 年前

Without true creativity AI generated music would always sound like someone that creates music without creativity

karakot大约 5 年前

now generate thousands of fake albums, upload into spotify and collect royalties.

Fiahil大约 5 年前

Pop and country are alright, but heavy metal... ewww! It needs much more work!

mimixco大约 5 年前

caetris1大约 5 年前

评论 #23038367 未加载