科技回声

Odd that the page doesn't seem to link to either,<p>paper: <a href="https://arxiv.org/abs/2502.04128" rel="nofollow">https://arxiv.org/abs/2502.04128</a><p>github: <a href="https://github.com/zhenye234/LLaSA_training">https://github.com/zhenye234/LLaSA_training</a>

LLaSA is a simple framework for speech synthesis that employs a single-layer vector quantizer (VQ) codec and a single Transformer architecture to fully align with standard LLMs such as LLaMA.

> employs a single-layer vector quantizer (VQ) codec and a single Transformer architecture to fully align<p>I really wish when new models were released that they would draw a diagram of all the layers and the tensor input and output sizes at each layer, with zoom in/out capabilities if needed using D3.js or whatever visualization framework if needed. Every single layer should be on there with its input and output sizes.<p>These one-sentence descriptions, and approximate block diagrams with arrows pointing at each other are never enough to understand how something is actually implemented.

I can't wait see this integrated into Open WebUI! These sound amazing.

the long 'uuuuhhhhhhh' from some of the lesser models is killing me.

LLaSA is a simple framework for speech synthesis that employs a single-layer vector quantizer (VQ) codec and a single Transformer architecture to fully align with standard LLMs such as LLaMA.

I can't wait see this integrated into Open WebUI! These sound amazing.

the long 'uuuuhhhhhhh' from some of the lesser models is killing me.

Llasa: Llama-Based Speech Synthesis

5 条评论

Llasa: Llama-Based Speech Synthesis

5 条评论