TechEcho

11 comments

Exactly what the title sounds like. At least one person has gone very far down this rabbit hole, possibly two people. A joy to read through, thank you for posting.This is the sort of comparison that would be great to do for a compiler runtime implementation. Integer division comes to mind. Hard to justify the time.

评论 #38606245 未加载

评论 #38608748 未加载

trealiraover 1 year ago

Although this isn't about the 6502, this reminds me of how I had to write x86_64 assembly for class, and was trying to use LEA to avoid a multiplication. After I wrote it, I wondered if the compiler did the same thing I did, but it didn't; the compiler's version used a register unnecessarily. Compare:<pre><code> ;; Goal: r8 := 10 * r9 ;; Mine: ;; 10x = x + 9x ;; 9x = x + 8x lea r8, [r9 + 8 * r9] add r8, r9 ;; Compiler's code: ;; 10x = 2 * 5x = 5x + 5x ;; 5x = x + 4x lea rdi, [r9 + 4 * r9] lea r8, [rdi + rdi] </code></pre> At first, I wondered why the compiler introduced a temporary.However, I had just done the exercises from the book The Structure and Interpretation of Computer Programs that ask you to create recursive and iterative routines that multiply integers through successive doubling and halving, by taking advantage of the fact that Cx = C + (C - 1)x, if C is odd, and Cx = 2 * (C / 2)x, if C is even (although that's not exactly how they explained it; I don't perfectly remember and I don't feel like looking it up, even though this wasn't that long ago).I realized: the compiler must be using a variant of the Russian peasant method of multiplication (which was the name of the method, as the book had explained) as a form of strength reduction. I was impressed, but I also realized that this meant that it was suboptimal for minimizing temporary registers. However, I couldn't think of some algorithm that would multiply it the way I did when I wrote that assembly, since I hadn't really had an algorithm in mind when writing it.

评论 #38609972 未加载

评论 #38611322 未加载

saulpwover 1 year ago

I don't miss those early days of programming. It's fun to optimize and it's amazing what you can do with minimal resources, but it took an incredible amount of effort to get a 70s-era computer to do elementary school arithmetic fast enough to be useful. People spent their lives working on analyses like this, only to find in a scant few years that the next chip has it built-in as a single instruction.

评论 #38606277 未加载

评论 #38606526 未加载

评论 #38607069 未加载

评论 #38606652 未加载

评论 #38608284 未加载

dzdtover 1 year ago

Just for fun... I think this should be on the efficient frontier of 8 x 8 -> 16 bit multipliers, at an average cost of around 43 cycles and memory of 1330 bytes or so. Compare 45.5 cycles and 1580 bytes for the mult66.a fastest in the collection linked.<pre><code> lmul0 = $06 ; pointer into square table low lmul1 = $08 ; pointer into square table high labs0 = $0a ; pointer into table of abs(i-1) prod_low = $0c \* = $0200 ; Align tables to start of page abstable !for i, 0, 255 { !byte <(abs(i-1)) } squaretable1_lsb !for i, 0, 511 { !byte <((i*i)/4) } squaretable1_msb !for i, 0, 511 { !byte >((i*i)/4) } ; multiply ; ; f(x) = x*x/4 ; ; return f(X+Y) - f(|Y-X|) ; ; 8 bit x 8bit unsigned multiply, 16 bit result ; ; On Entry: ; A: multiplier ; Y: multiplicand ; On Exit: ; (prod_low, A): product mult sta lmul0 sta lmul1 eor #ff sta labs0 ldx (labs0),Y ;X = abs(Y-A) lda (lmul0),Y sec sbc squaretable1_lsb,X sta prod_low lda (lmul1),Y sbc squaretable1_msb,X rts ;call this once to initialise high bytes of pointers to table mult_init lda #>abstable sta labs0+1 lda #>squaretable1_lsb ; high byte (#2 in this instance) sta lmul0+1 lda #>squaretable1_msb ; high byte (#4 in this instance) sta lmul1+1 rts</code></pre>

评论 #38608720 未加载

justinlloydover 1 year ago

The customization section is interesting, though it is missing things such as multiply & conquer, table wrapping, and the mask & jump techniques, but then, you aren't going to find every custom technique covered in any one text. Sine, cosine, and atan, usually used dual interweaving tables that wrapped, and for distance, you could use a short table and a long table that used Manhattan distance to do the lookup depending on whether X > Y or Y > X.Edit: I see a link to another github on that page to sqrt implementations. I suspect that will make for interesting reading too.

sambeauover 1 year ago

Yet again I am struck by how incredible it is that Braben and Bell, two undergraduate students, could produce something as ground-breaking as Elite—all the way down to writing their own multiplication routines.

deaterover 1 year ago

My Apple II "Mode 7" demo relies heavily on a "table of squares" multiply alogorithm, though it's signed (vs unsigned) which is even more of a pain on 6502. <a href="https://www.youtube.com/watch?v=o8427JsfGwk" rel="nofollow noreferrer">https://www.youtube.com/watch?v=o8427JsfGwk</a>So many 16-bit demo effects assume you have a decently fast multiply or divide instruction, which makes it really hard to do the same effects on 6502 (as I found with the second reality remake)

dzdtover 1 year ago

He is missing the state-of-the-art result for 16 x 16 -> 32 bit. That would be Repose in this CSDB thread : <a href="https://csdb.dk/forums/?roomid=11&topicid=91766" rel="nofollow noreferrer">https://csdb.dk/forums/?roomid=11&topicid=91766</a> at 190 cycles (depending on how exactly you count).

评论 #38607570 未加载

sunpazedover 1 year ago

I’m writing a pinball game in assembly for the original Gameboy, and recently had to hand-craft multiply, division and cosine routines to trade off accuracy for spare cycles. Something like this would have come in handy!!

评论 #38606719 未加载

jandreseover 1 year ago

The graphical error plots for the versions that approximate are a joy to behold. You can clearly see the ringing and how the error bars tend to grow larger with bigger inputs.

Dweditover 1 year ago

If you want to multiply by a constant, you instead do Bitsifts and Additions/Subtrations in a particular sequence.

11 comments

JonChesterfieldover 1 year ago

评论 #38606245 未加载

评论 #38608748 未加载

trealiraover 1 year ago

评论 #38609972 未加载

评论 #38611322 未加载

saulpwover 1 year ago

评论 #38606277 未加载

评论 #38606526 未加载

评论 #38607069 未加载

评论 #38606652 未加载

评论 #38608284 未加载

dzdtover 1 year ago

评论 #38608720 未加载

justinlloydover 1 year ago

sambeauover 1 year ago

deaterover 1 year ago

dzdtover 1 year ago

评论 #38607570 未加载

sunpazedover 1 year ago

评论 #38606719 未加载

jandreseover 1 year ago

The graphical error plots for the versions that approximate are a joy to behold. You can clearly see the ringing and how the error bars tend to grow larger with bigger inputs.

Dweditover 1 year ago

If you want to multiply by a constant, you instead do Bitsifts and Additions/Subtrations in a particular sequence.

Comparing 6502 Multiply Routines

11 comments

Comparing 6502 Multiply Routines

11 comments