Tuesday, January 12, 2016

Optimized find some good stuff, it should be useful for the game




            


Instead of dist = sqrt (distx + disty + distz) stuff:

void CSSETestDlg (
float * pArray1, // [Enter] source array 1
float * pArray2, // [Enter] source array 2
float * pArray3, // [Enter] source array 3
float * pResult, // [output] is used to store the results of an array
int nSize) // [Enter] array size
{
int nLoop = nSize / 4;

__m128 m1, m2, m3, m4;

__m128 * pSrc1 = (__m128 *) pArray1;
__m128 * pSrc2 = (__m128 *) pArray2;
__m128 * pSrc3 = (__m128 *) pArray3;
__m128 * pDest = (__m128 *) pResult;


__m128 m0_5 = _mm_set_ps1 (0.5f); // m0_5 [0, 1, 2, 3] = 0.5

for (int i = 0; i & lt; nLoop; i ++)
{
m1 = _mm_mul_ps (* pSrc1, * pSrc1); // m1 = * pSrc1 * * pSrc1
m2 = _mm_mul_ps (* pSrc2, * pSrc2);
m3 = _mm_mul_ps (* pSrc3, * pSrc3);
m4 = _mm_add_ps (m1, m2);
m4 = _mm_add_ps (m4, m3);
* PDest = _mm_sqrt_ps (m4); // m4 = sqrt (m3)

pSrc1 ++;
pSrc2 ++;
pSrc3 ++;
pDest ++;
}
}

Reply:
Pd it does not seem to reduce the precision hey
Reply:
isline (edge clear)

Oh, you finally see the works
Reply:
Not double
Reply:
I put the above code compiled for a moment, and sqrt (* pSrc1 * * pSrc1 + * pSrc2 * * pSrc2 + * pSrc3 * * pSrc3)
Compared to such a pure C code, to slow down, I ask, how is this going children?
The size of each array are 400K, I have repeatedly tried, whether it is using VC compiler, or use Intel C ++ Compiler, opened all the optimization, the results are like this, I could not believe it.
Please advise.
Reply:
Also, I use the Celeron 2.6, VS2003, Intel C ++ 9.1
Reply:
Which help to answer?

No comments:

Post a Comment