Hah, funny to see my blog referenced. Yeah, cache misses are a huge part of SAT solving (modern solvers have prefetch code specifically for this loop, some seemingly following the code I wrote). And SAT solving is about 95+% of SMT solving. So, yeah, cache is where it's at.<p>I once had an old, but "enthusiast" i7 that had a 3-lane mem setup, and "upgraded" to a consumer dual-lane i7 that was 2-3 gens ahead. It had the same performance(!) for SAT solving. Was pretty eye-opening.
Apple Silicon has been pretty impressive as it outperforms many laptops and desktops in benchmarks.<p>This guy even used the A12 chip from 2018. The A13 is even faster and ships on all iPhone 11's and even the $399 iPhone SE 2nd gen.<p>I am eager to see how Apple Silicon will do in the upcoming MacBooks.
Note that this sort of software can be extremely sensitive to cache arrangement.<p>Not only actual performance but also small details such as data arrangement, line size, replacement model and coherency model.<p>Source: I worked on performance improvement for a related piece of software.
Smt solvers and other 'old a.i.' workloads depend mainly on memory performance under unpredictably branching workloads, which the iPhone chips seem to be strong in. Compare that to 'new a.i.' workloads that eliminate branch misprediction almost entirely.
I am always amazed how the user experience of an iPad mini (~USD 400) is much better than a top end 13" Windows based notebook (~USD 2700). I know I am comparing apples with oranges in term of running processes and multitasking but the foreground app experience is much better.
I really want to see the Apple Silicon under some real world load running a current gen AAA video game in 4K ultra settings while streaming with OBS, playing music, and having Chrome open with 100+ tabs. Because that's what my current PC does for example so I'm really curious about everything (performance, temps etc.)<p>To some extent it's just hard to believe that the iPhone CPU is better in every thing compared to the current gen top of the line Intel and AMD desktop CPUs.
> Both systems ran Z3 4.8.1, compiled by me <i>using Clang</i> with the same optimization settings.<p>Hypothesis: LLVM's AArch64 backend has had more work put into it (by Apple, at least) than LLVM's x86_64 backend has, specifically for "finishing one-off tasks quickly" (as opposed to "achieving high throughput on long-running tasks.")<p>To me, this would make sense—until recently, AArch64 devices were mostly always-mobile, and so needed to be optimized for performing compute-intensive workloads <i>on battery</i>, and so have had more thought into the <i>efficiency</i> of their single-threaded burst performance (the whole "race to sleep" thing.) I'd expect, for example, AArch64-optimized code-gen to favor low-code-size serial loops over large-code-size SIMD vector ops, ala GCC -Os, in order to both 1. keep the vector units powered down, and 2. keep cache-line contention lower and thereby keep now-unneeded DMA channels powered down; both keeping the chip further from the TDP ceiling; and thus keeping the rest of the core that <i>is</i> powered on able to burst-clock longer. In such a setup, the simple serial-processing loop may potentially outperform the SIMD ops. (Presuming the loop has a variable number of executions that may be long or short, and is itself run frequently.)<p>x86_64 devices, meanwhile, generally are only expected to perform compute-intensive tasks <i>while connected to power</i>, and so the optimizations contributed to compilers like LLVM that specifically impact x86_64, are likely more from the HPC and OLTP crowds, who favor squeezing out continuous aggregate <i>throughput</i>, at the expense of per-task time-to-completion (i.e. holding onto Turbo Boost-like features at maximum duty cycle, to increase <i>mean</i> throughput, even as the overheat conditions and license-switching overhead lower <i>modal</i> task performance.)
> So, in a fit of procrastination, I decided to cross-compile Z3 to iOS, and see just how fast my new phone (or hypothetical future Mac) is.<p>> I bet the new iPad Pro’s A12X is even faster thanks to the larger thermal envelope a tablet affords.<p>iirc Apple intends to complete transition to Apple Silicon within three years (and they often tend to be conservative with their estimates) so I'd imagine the 2023 or 2024 mac pro will be interesting given (I'd assume) it won't have the thermal constraints an iPhone has. Thoughts?
>Indeed, after benchmarking I checked the iPhone’s battery usage report, which said Slack had used 4 times more energy<p>>than the Z3 app despite less time on screen.
The a12 has about 3-4 times the number of transistors as i7-7700k. I know there are a lot of other on than the SOC core than the cpu for a12, but they still have a pretty large transistor budget. It was made with 7nm transistors vs 14nm that intel was using at the time when 7700k came out. Apple did a good job designing the chips, but TSMC is a huge part of why these chips are so good. For the longest time Intel manufacturing process was way ahead of anyone else which reflected in their performance. If I give you way more transistors I can probably make a better chip.<p>If I give you more cache, decoders, execution units etc. I can make a faster chip.<p>Also if you used all cores for sustained amount of time the intel would win out just because it can handle all that heat.
On the topic of the portability of Z3, does anyone know why this example runs fine on native amd64 but fails to solve when built with emscripten/wasm?<p><a href="https://github.com/sielicki/z3wasm" rel="nofollow">https://github.com/sielicki/z3wasm</a><p>Do a build by running, “mkdir bld && cd bld && emcmake cmake ../ -GNinja && emmake ninja”
Not particularly apropos to the thrust of the article, but I want to apply SMT to some tricky infosec use cases. Is there a good place to start for someone without an academic background in math/logic/etc?
For anyone trying to reproduce it in an Android phone you don't need to compile anything. Just "apt install z3" in Termux. It will probably be pretty slow but it will work.
When Apple releases ARM Macs I think we are all going to be really impressed. Take these phone chips, beef them up a bit with more cores and maybe more cache, clock them a bit higher, and give them better cooling so they can sustain higher speeds.<p>If it goes how I think it will go, X86 is done. People will really start wanting ARM on servers.
Discussed at the time: <a href="https://news.ycombinator.com/item?id=18383851" rel="nofollow">https://news.ycombinator.com/item?id=18383851</a>