I can't comment on the merit of the technical aspects, but I feel like of all the AI generated content, especially AI generated music is as interesting as AI generated memoirs - sort of pointless. It lacks the human element that makes it relatable on an emotional level.
Do you think you're, perhaps over doing self-the promotion?<p>> Please don't use HN primarily for promotion. It's ok to post your own stuff part of the time, but the primary use of the site should be for curiosity.<p><a href="https://news.ycombinator.com/newsguidelines.html">https://news.ycombinator.com/newsguidelines.html</a><p>(4 subs, 2 weeks) <a href="https://news.ycombinator.com/from?site=musicot.github.io">https://news.ycombinator.com/from?site=musicot.github.io</a><p>(3 subs, 1 week/5 subs, 1 month) <a href="https://news.ycombinator.com/from?site=github.com%2Finclusionai">https://news.ycombinator.com/from?site=github.com%2Finclusio...</a><p>(3 subs 1 week) <a href="https://news.ycombinator.com/from?site=mainfunc.ai">https://news.ycombinator.com/from?site=mainfunc.ai</a><p>(2 subs, 2 weeks) <a href="https://news.ycombinator.com/from?site=mureka.ai">https://news.ycombinator.com/from?site=mureka.ai</a><p>(4 subs, 3 months) <a href="https://news.ycombinator.com/from?site=trae.ai">https://news.ycombinator.com/from?site=trae.ai</a><p>(approaching ∞ subs) <a href="https://news.ycombinator.com/from?site=pingcap.com">https://news.ycombinator.com/from?site=pingcap.com</a>
I work on music models, and this is a very cool paper! There are no papers that go into depth on how token-based AR music models (that aren't absurdly inefficient like Yue) are trained. I'm particularly interested in your semantic tokens. I tried reproducing the CTC loss part but my curve was very spikey and didn't seem to actually figure out any characters. The semantic tokens gave great acoustic info but gibberish lyrics. What did your CTC loss curves look like and did you see anything similar at any point?<p>As a semi-aside, I feel like semantic tokens in general may end up being a bottleneck on how interesting model outputs can be.
Were there any examples of "de novo" music generation using this? The only one I could find on the website was translating the vocals of an existing song, couldn't find any AI compositions.