TechEcho

9 comments

GeertBover 6 years ago

While CORDIC is great for fixed point, it has limitations for floating point. The original 8087 fsin and fcos instructions used CORDIC, but later versions of the architecture switched to polynomial approximations, see <a href="https://software.intel.com/sites/default/files/managed/f8/9c/x87TrigonometricInstructionsVsMathFunctions.pdf" rel="nofollow">https://software.intel.com/sites/default/files/managed/f8/9c...</a>. Today it's possible to develop implementations of these elementary functions on x86 CPUs that are more precise and more performant using regular multiply/addition/fused multiply add than even the current improved post-CORDIC fsin and fcos functions.The main issue is that having an instruction executing a fixed-function block with a given (high) latency and little if any pipelining tends to be far worse than having many more fully pipelined multiply/add instructions. The other issue is that argument reduction and approximation over the reduced domain are not independent. For some parts of the domain, such as computing the sine of a number very close to a multiple of pi, you may need to spend more cycles reducing the argument accurately to counter cancelation effects. However, as the reduced argument is then very close to zero, a simple polynomial suffices.So, for most modern systems, I'd put the effort in efficient pipelined fused-multiply-add and use that for all elementary functions. Fixed-function hardware for elementary functions has generally been proved sub-optimal.

评论 #18319151 未加载

评论 #18320199 未加载

Y_Yover 6 years ago

What a well explained and simple article. Though it would be nice to know why each optimisation is made wrt the eventual hardware.

评论 #18317969 未加载

hatsunearuover 6 years ago

Oh wow, I literally just finished my quadrature sinusoid DDS generator using MyHDL last night. I didn't use CORDIC but rather a LUT. I found out I can optimize generating quadrature sinusoids by having two separate LUTs where each one stores from 0 to pi/2 and the other from pi/2 to pi, and this has an advantage because when the sine output takes inputs from the first LUT, the cos output takes inputs from the second LUT and vice versa, thus saving duplicates.I'm still cleaning up the testbench code and I plan to put out a blog post here if y'all are interested: hatsunearu.github.io

评论 #18318679 未加载

评论 #18319133 未加载

kkaranthover 6 years ago

Nice read! I have 2 questions:When calculating K, the author says “It can be shown through the use of trigonometric identities that:” and proceeds to show a formula. How exactly does this happen?After calculating K, the author assigns it to c in the cordic function, but not to s. Why?

评论 #18319104 未加载

toolsliveover 6 years ago

Isn't it simpler (and more efficient) to build a interpolating polynomial approximation for sin(x) for the range [0,pi/8) (using Chebichev iso Lagrange interpolation fe)

JoeAltmaierover 6 years ago

Is this just an example? Because if I wanted to rotate a vector, I'd do it with vectors, not trig. Which requires multiplication and addition, right? What am I missing.

评论 #18320938 未加载

评论 #18320768 未加载

评论 #18320755 未加载

man-and-laptopover 6 years ago

How is CORDIC different from e^x ~= (1+x/N)^N where N is a power of 2?

评论 #18320217 未加载

gravypodover 6 years ago

How would one build an asynchronous implementation of this circuit in an HDL

评论 #18318542 未加载

评论 #18317952 未加载

andrewflnrover 6 years ago

This is incredibly slick. Is it used in real hardware?

评论 #18318757 未加载

9 comments

GeertBover 6 years ago

评论 #18319151 未加载

评论 #18320199 未加载

Y_Yover 6 years ago

What a well explained and simple article. Though it would be nice to know why each optimisation is made wrt the eventual hardware.

评论 #18317969 未加载

hatsunearuover 6 years ago

评论 #18318679 未加载

评论 #18319133 未加载

kkaranthover 6 years ago

评论 #18319104 未加载

toolsliveover 6 years ago

Isn't it simpler (and more efficient) to build a interpolating polynomial approximation for sin(x) for the range [0,pi/8) (using Chebichev iso Lagrange interpolation fe)

JoeAltmaierover 6 years ago

Is this just an example? Because if I wanted to rotate a vector, I'd do it with vectors, not trig. Which requires multiplication and addition, right? What am I missing.

Computing sin and cos in hardware with synthesisable Verilog

9 comments

Computing sin and cos in hardware with synthesisable Verilog

9 comments