I'm working an SSE / AVX library and have almost all the basics working, but now I'm trying to get the accuracy / speed a bit better on certain functions.
Essentially I'm using a minimax polynomial that when using infinite precision should give me ~2E-9 error, plenty for floats for calculating sin after reducing the range of sin(x/(2*pi)) to [0,0.25] from (-inf,inf). This is the common technique I've been reading about so you can use a smaller minimax polynomial for greater accuracy. However, the range reduction in this section
x = x/(2*pi)
x = x - round(x);
which gives me the range [-0.5,0.5] before being further range reduced is causing me to lose a lot of precision as x moves away from zero since 1E6f/(2*pi) has very few sigfigs after the zero subtracting 1E6f leaves me with 1 or 2 sigfigs when doing the rest of the calculation. The internal sinf implementations doesn't seem to have that issue and match up with double sin(x) for values into 7 sigfigs. I'm curious how does one wrap a value near 6-7 sigfigs from (-inf,inf) - limited by sigfigs of float - to a range [1,-1] without losing so much precision?