I’m struggling to understand from this paper whether the approach is better in the general sense (all cases, with wider models seeing greater benefits) or purely for wider models (with narrower models seeing detriment)?<p>If it’s the former this could effectively halve finetuning cost overnight which would go a significant way towards enabling a wider array of use cases for LoRA.