Thanks for sharing - I've been wondering how the newer generation of AI-stuff would do on chorales.<p>I don't think this would have gotten great marks in my collegiate music theory IV class (iirc covered special cases of voice leading and rarer harmonic progressions like the augmented 6th and neapolitan chords), but honestly neither did I. In my highschool classes this probably would've gotten passing grades, at least on short exercises.<p>Is there an easy way to get it to "true up" the intonation on the longer chords? IMO part of the magic of this style of music is that a capella performances aren't constrained to equal temperament, and get really nice resonances on anything they hold.
It seems it gets the best results when you give it a melody.
Pretty impressive nonetheless.
I wrote a terrible script [1] a few years ago to do similar chorale harmonizations. I guess this is the year when most of my old projects get an AI version that completely outclass them ;-(<p>[1] <a href="https://github.com/bibanez/harmonizer">https://github.com/bibanez/harmonizer</a>
Listening for any span of a few seconds, it sounds quite nice! but beyond the length of a phrase or so, it just does not make any sense.<p>It reminds me a bit of that project to "create Beethoven's tenth" using an AI that I heard about a year ago. It was amazing in many ways, but the music didn't go anywhere and wasn't saying anything. I know that description is nebulous, the sort of feeling you might imagine or trick yourself into having, or a perception you invent perhaps out of defensiveness or say you have just to seem cultured.<p>And perhaps in a "blind" comparison, cantable diffuguesion wouldn't stand out as much. But with all that said, it definitively sounds not quite human after a moment.<p>I wonder how we can teach machines the larger-scale structures of (common practice) music. At the scale of an entire movement, structures can be merely formulaic and the music still turns out alright. At the level of phrases and themes, though, it's harder to articulate, and requires good taste. But it's the sort of "intuitive" thing that I'd expect AI to be good at, so I'm always surprised that it seems to be the thing AIs are <i>worst</i> at.<p>(My background: although I've not studied Bach's chorales in any depth, I've studied a few years of composition and used to be a church organist.)
>Four-part chorales are presented to the network as 4-channel images. As in Stable Diffusion, a U-Net is trained to predict the noise residual.<p>>After training the generative model we add 12 channels to the inputs, with the middle four channels representing a mask, and the last four channels are masked chorales. We mask the four channels individually, as opposed to Stable Diffusion Inpainting that use a one-channel mask.<p>How were they encoded, specifically? Anyway, it's fairly easy to break, say, try with "c'4 c'#4 d'4 d'#4 e'4 f'4 f'#4" as the melody.
My not very substantive response is that I like the project name. Maybe lots of text generators can come up with wordplay involving Subject ! and Subject 2, but I think you've already got one running.