2013-11-17 Update #2: Tried another new trick. This time I updated the leading bit position function, when using the gccgo compiler, to use libgcc’s __clzdi2 routine. This had the same effect as the update #1 except it’s for when gccgo is used. Performance increase ranged from 0% to 20% for encoding only. Thanks dgryski on reddit and minux on the golang-nuts mailing list.