> More specifically, the branch predictor probably keeps an internal stack of function return addresses, which is pushed to whenever a bl is executed. When the branch predictor sees a ret coming down the pipeline, it assumes that you're returning to the address associated with the most recent bl (and begins prefetching / speculative execution / whatever), then pops that top address from its internal stack.<p>There's no need for "probably" here. The micro-architectural mechanism is known as a return stack buffer[1] and is generally separate from the branch predictor unit, though the processor may make use of indirect branch prediction entries for returns as well.<p>[1] It is, indeed, a tiny little stack of return addresses and indeed, the article hit performance issues by misaligning it. The (Intel chips') RSB is behind the Retbleed vulnerabilities.