Just to make sure I'm understanding this correctly.<p>This paper signals that the authors have found a way to run Llama 2 70B, but with 1/8th the VRAM requirements as compared to the original model, right?<p>And the output is on-par with the original along some metrics (ArcE/PiQA), within 25% on others (Wiki/C4), and the trajectory of their progress hints that there's even more ground to gain in the future?