This looks to be an interesting piece on a very interesting paper! Somewhat tangentially (I'm afraid) I just wanted to comment on this para from the article's intro:<p>> Language is made of discrete structures, yet neural networks operate on continuous data: vectors in high-dimensional space. A successful language-processing network must translate this symbolic information into some kind of geometric representation<p>I was a bit surprised recently by another article linked here recently[1] that discusses "direct speech-to-speech translation without relying on intermediate text representation" which (if I read it correctly) works by taking frequency domain representations of speech as input and producing frequency domain representations of translated speech as output. This is indeed as near as you get to "continuous" input and output data in the digital domain, and brings into question (in my mind, anyhow) the assumption that discrete structures are fundamental to language processing (in humans too, for that matter.)<p>I don't mean to detract from the paper, which looks highly interesting, it's just that this business of given discrete structures in language is a bugbear of mine for some time now :)<p>1: <a href="https://ai.googleblog.com/2019/05/introducing-translatotron-end-to-end.html" rel="nofollow">https://ai.googleblog.com/2019/05/introducing-translatotron-...</a>
I’m impressed by the method of mapping higher dimensional vectors to a consistent tree representation, but I’m not sure what the take home point is after that. The BERT embeddings are (possibly randomly) branching structures? I’m only eyeballing figure 5 here, but the BERT embeddings only approximate the dependency parse tree to the same extent that the random trees do.