MIPS32 support - kudos to Vladimir Stefanovic and Imagination Technologies for making this happen. Many people from the embedded world will also greatly appreciate support for soft-float MIPS32 hardware.
There are scores of other optimizations [0] as well:<p><pre><code> Optimizations:
bytes, strings: optimize for ASCII sets (CL 31593)
bytes, strings: optimize multi-byte index operations on s390x (CL 32447)
bytes,strings: use IndexByte more often in Index on AMD64 (CL 31690)
bytes: Use the same algorithm as strings for Index (CL 22550)
bytes: improve WriteRune performance (CL 28816)
bytes: improve performance for bytes.Compare on ppc64x (CL 30949)
bytes: make IndexRune faster (CL 28537)
cmd/asm, go/build: invoke cmd/asm only once per package (CL 27636)
cmd/compile, cmd/link: more efficient typelink generation (CL 31772)
cmd/compile, cmd/link: stop generating unused go.string.hdr symbols. (CL 31030)
cmd/compile,runtime: redo how map assignments work (CL 30815)
cmd/compile/internal/obj/x86: eliminate some function prologues (CL 24814)
cmd/compile/internal/ssa: generate bswap on AMD64 (CL 32222)
cmd/compile: accept literals in samesafeexpr (CL 26666)
cmd/compile: add more non-returning runtime calls (CL 28965)
cmd/compile: add size hint to map literal allocations (CL 23558)
cmd/compile: be more aggressive in tighten pass for booleans (CL 28390)
cmd/compile: directly construct Fields instead of ODCLFIELD nodes (CL 31670)
cmd/compile: don't reserve X15 for float sub/div any more (CL 28272)
cmd/compile: don’t generate pointless gotos during inlining (CL 27461)
cmd/compile: fold negation into comparison operators (CL 28232)
cmd/compile: generate makeslice calls with int arguments (CL 27851)
cmd/compile: handle e == T comparison more efficiently (CL 26660)
cmd/compile: improve s390x SSA rules for logical ops (CL 31754)
cmd/compile: improve s390x rules for folding ADDconst into loads/stores (CL 30616)
cmd/compile: improve string iteration performance (CL 27853)
cmd/compile: improve tighten pass (CL 28712)
cmd/compile: inline _, ok = i.(T) (CL 26658)
cmd/compile: inline atomics from runtime/internal/atomic on amd64 (CL 27641, CL 27813)
cmd/compile: inline convT2{I,E} when result doesn't escape (CL 29373)
cmd/compile: inline x, ok := y.(T) where T is a scalar (CL 26659)
cmd/compile: intrinsify atomic operations on s390x (CL 31614)
cmd/compile: intrinsify math/big.mulWW, divWW on AMD64 (CL 30542)
cmd/compile: intrinsify runtime/internal/atomic.Xaddint64 (CL 29274)
cmd/compile: intrinsify slicebytetostringtmp when not instrumenting (CL 29017)
cmd/compile: intrinsify sync/atomic for amd64 (CL 28076)
cmd/compile: make [0]T and [1]T SSAable types (CL 32416)
cmd/compile: make link register allocatable in non-leaf functions (CL 30597)
cmd/compile: missing float indexed loads/stores on amd64 (CL 28273)
cmd/compile: move stringtoslicebytetmp to the backend (CL 32158)
cmd/compile: only generate ·f symbols when necessary (CL 31031)
cmd/compile: optimize bool to int conversion (CL 22711)
cmd/compile: optimize integer "in range" expressions (CL 27652)
cmd/compile: remove Zero and NilCheck for newobject (CL 27930)
cmd/compile: remove duplicate nilchecks (CL 29952)
cmd/compile: remove some write barriers for stack writes (CL 30290)
cmd/compile: simplify div/mod on ARM (CL 29390)
cmd/compile: statically initialize some interface values (CL 26668)
cmd/compile: unroll comparisons to short constant strings (CL 26758)
cmd/compile: use 2-result divide op (CL 25004)
cmd/compile: use masks instead of branches for slicing (CL 32022)
cmd/compile: when inlining ==, don’t take the address of the values (CL 22277)
container/heap: remove one unnecessary comparison in Fix (CL 24273)
crypto/elliptic: add s390x assembly implementation of NIST P-256 Curve (CL 31231)
crypto/sha256: improve performance for sha256.block on ppc64le (CL 32318)
crypto/sha512: improve performance for sha512.block on ppc64le (CL 32320)
crypto/{aes,cipher}: add optimized implementation of AES-GCM for s390x (CL 30361)
encoding/asn1: reduce allocations in Marshal (CL 27030)
encoding/csv: avoid allocations when reading records (CL 24723)
encoding/hex: change lookup table from string to array (CL 27254)
encoding/json: Use a lookup table for safe characters (CL 24466)
hash/crc32: improve the AMD64 implementation using SSE4.2 (CL 24471)
hash/crc32: improve the AMD64 implementation using SSE4.2 (CL 27931)
hash/crc32: improve the processing of the last bytes in the SSE4.2 code for AMD64 (CL 24470)
image/color: improve speed of RGBA methods (CL 31773)
image/draw: optimize drawFillOver as drawFillSrc for opaque fills (CL 28790)
math/big: 10%-20% faster float->decimal conversion (CL 31250, CL 31275)
math/big: avoid allocation in float.{Add, Sub} when there's no aliasing (CL 23568)
math/big: make division faster (CL 30613)
math/big: use array instead of slice for deBruijn lookups (CL 26663)
math/big: uses SIMD for some math big functions on s390x (CL 32211)
math: speed up Gamma(+Inf) (CL 31370)
math: speed up bessel functions on AMD64 (CL 28086)
math: use SIMD to accelerate some scalar math functions on s390x (CL 32352)
reflect: avoid zeroing memory that will be overwritten (CL 28011)
regexp: avoid alloc in QuoteMeta when not quoting (CL 31395)
regexp: reduce mallocs in Regexp.Find* and Regexp.ReplaceAll* (CL 23030)
runtime: cgo calls are about 100ns faster (CL 29656, CL 30080)
runtime: defer is now 2X faster (CL 29656)
runtime: implement getcallersp in Go (CL 29655)
runtime: improve memmove for amd64 (CL 22515, CL 29590)
runtime: increase malloc size classes (CL 24493)
runtime: large objects no longer cause significant goroutine pauses (CL 23540)
runtime: make append only clear uncopied memory (CL 30192)
runtime: make assists perform root jobs (CL 32432)
runtime: memclr perf improvements on ppc64x (CL 30373)
runtime: minor string/rune optimizations (CL 27460)
runtime: optimize defer code (CL 29656)
runtime: remove a load and shift from scanobject (CL 22712)
runtime: remove defer from standard cgo call (CL 30080)
runtime: speed up StartTrace with lots of blocked goroutines (CL 25573)
runtime: speed up non-ASCII rune decoding (CL 28490)
strconv: make FormatFloat slowpath a little faster (CL 30099)
strings: add special cases for Join of 2 and 3 strings (CL 25005)
strings: make IndexRune faster (CL 28546)
strings: use AVX2 for Index if available (CL 22551)
strings: use Index in Count (CL 28586)
syscall: avoid convT2I allocs for common Windows error values (CL 28484, CL 28990)
text/template: improve lexer performance in finding left delimiters (CL 24863)
unicode/utf8: optimize ValidRune (CL 32122)
unicode/utf8: reduce bounds checks in EncodeRune (CL 28492)
</code></pre>
[0] <a href="https://github.com/golang/go/blob/master/doc/go1.8.txt" rel="nofollow">https://github.com/golang/go/blob/master/doc/go1.8.txt</a>