From criticism of how inadequate the transformer architecture is, depletion of training data, improvements to the attention mechanism, AGI supposedly being miles away; where does the new results put the field and what avenues need more focus and what others need less focus to the point of complete cutoff?