Mobile Developer: Optimized find some good stuff, it should be useful for the game

Tuesday, January 12, 2016

Optimized find some good stuff, it should be useful for the game

Instead of dist = sqrt (distx + disty + distz) stuff:

void CSSETestDlg (
float * pArray1, // [Enter] source array 1
float * pArray2, // [Enter] source array 2
float * pArray3, // [Enter] source array 3
float * pResult, // [output] is used to store the results of an array
int nSize) // [Enter] array size
{
int nLoop = nSize / 4;

__m128 m1, m2, m3, m4;

__m128 * pSrc1 = (__m128 *) pArray1;
__m128 * pSrc2 = (__m128 *) pArray2;
__m128 * pSrc3 = (__m128 *) pArray3;
__m128 * pDest = (__m128 *) pResult;

__m128 m0_5 = _mm_set_ps1 (0.5f); // m0_5 [0, 1, 2, 3] = 0.5

for (int i = 0; i & lt; nLoop; i ++)
{
m1 = _mm_mul_ps (* pSrc1, * pSrc1); // m1 = * pSrc1 * * pSrc1
m2 = _mm_mul_ps (* pSrc2, * pSrc2);
m3 = _mm_mul_ps (* pSrc3, * pSrc3);
m4 = _mm_add_ps (m1, m2);
m4 = _mm_add_ps (m4, m3);
* PDest = _mm_sqrt_ps (m4); // m4 = sqrt (m3)

pSrc1 ++;
pSrc2 ++;
pSrc3 ++;
pDest ++;
}
}

Reply:
Pd it does not seem to reduce the precision hey
Reply:
isline (edge clear)

Oh, you finally see the works
Reply:
Not double
Reply:
I put the above code compiled for a moment, and sqrt (* pSrc1 * * pSrc1 + * pSrc2 * * pSrc2 + * pSrc3 * * pSrc3)
Compared to such a pure C code, to slow down, I ask, how is this going children?
The size of each array are 400K, I have repeatedly tried, whether it is using VC compiler, or use Intel C ++ Compiler, opened all the optimization, the results are like this, I could not believe it.
Please advise.
Reply:
Also, I use the Celeron 2.6, VS2003, Intel C ++ 9.1
Reply:
Which help to answer?

Mobile Developer

Tuesday, January 12, 2016

Optimized find some good stuff, it should be useful for the game

No comments:

Post a Comment