SMT Solving on an iPhone (2018)

224 点作者 marcobambini将近 5 年前

21 条评论

zero_k将近 5 年前

Hah, funny to see my blog referenced. Yeah, cache misses are a huge part of SAT solving (modern solvers have prefetch code specifically for this loop, some seemingly following the code I wrote). And SAT solving is about 95+% of SMT solving. So, yeah, cache is where it's at.I once had an old, but "enthusiast" i7 that had a 3-lane mem setup, and "upgraded" to a consumer dual-lane i7 that was 2-3 gens ahead. It had the same performance(!) for SAT solving. Was pretty eye-opening.

评论 #24101717 未加载

评论 #24101265 未加载

评论 #24101305 未加载

评论 #24101274 未加载

评论 #24105937 未加载

评论 #24101981 未加载

aminozuur将近 5 年前

Apple Silicon has been pretty impressive as it outperforms many laptops and desktops in benchmarks.This guy even used the A12 chip from 2018. The A13 is even faster and ships on all iPhone 11's and even the $399 iPhone SE 2nd gen.I am eager to see how Apple Silicon will do in the upcoming MacBooks.

评论 #24104393 未加载

panpanna将近 5 年前

Note that this sort of software can be extremely sensitive to cache arrangement.Not only actual performance but also small details such as data arrangement, line size, replacement model and coherency model.Source: I worked on performance improvement for a related piece of software.

评论 #24101676 未加载

PaulHoule将近 5 年前

Smt solvers and other 'old a.i.' workloads depend mainly on memory performance under unpredictably branching workloads, which the iPhone chips seem to be strong in. Compare that to 'new a.i.' workloads that eliminate branch misprediction almost entirely.

评论 #24103815 未加载

评论 #24101451 未加载

评论 #24100473 未加载

wslh将近 5 年前

I am always amazed how the user experience of an iPad mini (~USD 400) is much better than a top end 13" Windows based notebook (~USD 2700). I know I am comparing apples with oranges in term of running processes and multitasking but the foreground app experience is much better.

评论 #24100465 未加载

评论 #24100129 未加载

评论 #24101040 未加载

haunter将近 5 年前

I really want to see the Apple Silicon under some real world load running a current gen AAA video game in 4K ultra settings while streaming with OBS, playing music, and having Chrome open with 100+ tabs. Because that's what my current PC does for example so I'm really curious about everything (performance, temps etc.)To some extent it's just hard to believe that the iPhone CPU is better in every thing compared to the current gen top of the line Intel and AMD desktop CPUs.

评论 #24100262 未加载

评论 #24100890 未加载

评论 #24100330 未加载

评论 #24100767 未加载

评论 #24104402 未加载

评论 #24100323 未加载

评论 #24100291 未加载

评论 #24100709 未加载

评论 #24100314 未加载

ashleyn将近 5 年前

Are smartphone architectures really this amazing, or is Intel just dropping the ball so hard that they're getting lapped by mobile hardware?

评论 #24101338 未加载

评论 #24100844 未加载

评论 #24100934 未加载

derefr将近 5 年前

> Both systems ran Z3 4.8.1, compiled by me using Clang with the same optimization settings.Hypothesis: LLVM's AArch64 backend has had more work put into it (by Apple, at least) than LLVM's x86_64 backend has, specifically for "finishing one-off tasks quickly" (as opposed to "achieving high throughput on long-running tasks.")To me, this would make sense—until recently, AArch64 devices were mostly always-mobile, and so needed to be optimized for performing compute-intensive workloads on battery, and so have had more thought into the efficiency of their single-threaded burst performance (the whole "race to sleep" thing.) I'd expect, for example, AArch64-optimized code-gen to favor low-code-size serial loops over large-code-size SIMD vector ops, ala GCC -Os, in order to both 1. keep the vector units powered down, and 2. keep cache-line contention lower and thereby keep now-unneeded DMA channels powered down; both keeping the chip further from the TDP ceiling; and thus keeping the rest of the core that is powered on able to burst-clock longer. In such a setup, the simple serial-processing loop may potentially outperform the SIMD ops. (Presuming the loop has a variable number of executions that may be long or short, and is itself run frequently.)x86_64 devices, meanwhile, generally are only expected to perform compute-intensive tasks while connected to power, and so the optimizations contributed to compilers like LLVM that specifically impact x86_64, are likely more from the HPC and OLTP crowds, who favor squeezing out continuous aggregate throughput, at the expense of per-task time-to-completion (i.e. holding onto Turbo Boost-like features at maximum duty cycle, to increase mean throughput, even as the overheat conditions and license-switching overhead lower modal task performance.)

评论 #24101390 未加载

评论 #24102423 未加载

mcny将近 5 年前

> So, in a fit of procrastination, I decided to cross-compile Z3 to iOS, and see just how fast my new phone (or hypothetical future Mac) is.> I bet the new iPad Pro’s A12X is even faster thanks to the larger thermal envelope a tablet affords.iirc Apple intends to complete transition to Apple Silicon within three years (and they often tend to be conservative with their estimates) so I'd imagine the 2023 or 2024 mac pro will be interesting given (I'd assume) it won't have the thermal constraints an iPhone has. Thoughts?

评论 #24100165 未加载

tyingq将近 5 年前

Interesting. Is it possible that this particular workload takes a big hit due to the Spectre/Meltdown mitigations?

评论 #24100209 未加载

评论 #24102029 未加载

pinewurst将近 5 年前

>Indeed, after benchmarking I checked the iPhone’s battery usage report, which said Slack had used 4 times more energy>than the Z3 app despite less time on screen.

samfisher83将近 5 年前

The a12 has about 3-4 times the number of transistors as i7-7700k. I know there are a lot of other on than the SOC core than the cpu for a12, but they still have a pretty large transistor budget. It was made with 7nm transistors vs 14nm that intel was using at the time when 7700k came out. Apple did a good job designing the chips, but TSMC is a huge part of why these chips are so good. For the longest time Intel manufacturing process was way ahead of anyone else which reflected in their performance. If I give you way more transistors I can probably make a better chip.If I give you more cache, decoders, execution units etc. I can make a faster chip.Also if you used all cores for sustained amount of time the intel would win out just because it can handle all that heat.

评论 #24100363 未加载

1f60c将近 5 年前

(2018)

评论 #24100046 未加载

nickysielicki将近 5 年前

On the topic of the portability of Z3, does anyone know why this example runs fine on native amd64 but fails to solve when built with emscripten/wasm?<a href="https://github.com/sielicki/z3wasm" rel="nofollow">https://github.com/sielicki/z3wasm</a>Do a build by running, “mkdir bld && cd bld && emcmake cmake ../ -GNinja && emmake ninja”

jcims将近 5 年前

Not particularly apropos to the thrust of the article, but I want to apply SMT to some tricky infosec use cases. Is there a good place to start for someone without an academic background in math/logic/etc?

评论 #24101929 未加载

评论 #24101125 未加载

nromiun将近 5 年前

For anyone trying to reproduce it in an Android phone you don't need to compile anything. Just "apt install z3" in Termux. It will probably be pretty slow but it will work.

评论 #24101838 未加载

vbezhenar将近 5 年前

It's very fun suggestion to put an iPhone in the cold water. Imagine cluster of iPhones working under-water for math problems :)

api将近 5 年前

When Apple releases ARM Macs I think we are all going to be really impressed. Take these phone chips, beef them up a bit with more cores and maybe more cache, clock them a bit higher, and give them better cooling so they can sustain higher speeds.If it goes how I think it will go, X86 is done. People will really start wanting ARM on servers.

评论 #24101281 未加载

评论 #24100249 未加载

评论 #24100488 未加载

评论 #24100251 未加载

dang将近 5 年前

Discussed at the time: <a href="https://news.ycombinator.com/item?id=18383851" rel="nofollow">https://news.ycombinator.com/item?id=18383851</a>

m3kw9将近 5 年前

Could be Apple arm chips have certain instructions that are more optimized

amelius将近 5 年前

Beware: Apple devices are not general purpose computers. Whether you can use them as such may be subject to change.