Tuesday, January 12, 2016

Help a real parallel optimization problem




            


In the FDTD calculation, calculation software will save some data in real time, I will read out and the Fourier transform processing, and then delete the data files.
Core processing as follows:

for (fi = 0; fi & lt; 6; fi ++)
{
fp [fi] = fopen (fnc [fi], "r");
if (fp [fi] == NULL)
{
printf ("Error open fnc \ n!");
break;
}
else
{
for (j = 1; j & lt; = jmax [fi]; j ++)
for (i = 1; i & lt; = imax [fi]; i ++)
{
fscanf (fp [fi], "% lg% lg% lg% lg% lg% lg% lg% lg% lg", & amp; Ex, & amp; Ey, & amp; Ez, & amp; Hx, & amp; Hy, & amp; Hz, & amp; Jx, & amp; Jy, & amp; Jz);
omp_set_num_threads (2);
#pragma omp parallel for schedule (static) private (k)
for (k = 0; k & lt; = freqnum; k ++)
{
Exf [fi] [j] [i] [k] + = Ex * efac [k];
Eyf [fi] [j] [i] [k] + = Ey * efac [k];
Ezf [fi] [j] [i] [k] + = Ez * efac [k];
Hxf [fi] [j] [i] [k] + = Hx * efac [k];
Hyf [fi] [j] [i] [k] + = Hy * efac [k];
Hzf [fi] [j] [i] [k] + = Hz * efac [k];
}
}
fclose (fp [fi]);

if (fi & gt; = 5)
{
for (i = 0; i & lt; = 5; i ++) remove (fnc [i]);
Dealed ++;
}
}
}

But even after optimization program parallel optimization slower than not, help us to see the problem?


Reply:
Do not know what is FDTD, but simply looked at your code, the problem may feel in these places:

1, your machine is running on a dual-core or dual-CPU machines? If not, it may increase the cost of some of the cutting process switching.

2, your program, the biggest bottleneck should be on file to read, you should optimize here. For example, first with a thread reads a file, and then use another thread conversion and processing data in the file, the previous threads simultaneously read documents read documents.
Reply:
Do not allow threads to be done a little work, this will only increase the number synchronous operation
I think you should consider this cycle optimization largest

Reply:
VTune Louzhu why not look at the bottleneck where the software, Sampling and Counter Monitor should meet your needs.
Disk access will generally have a buffer, but the amount of data from the disk bottleneck is not great if not necessarily in fscanf. Freqnum variable size that touches you should be able to affect the performance of the code it, if freqnum big bottleneck if it is possible here, in short, still use VTune to analyze better.
Also the schedule Why static way, like general use guide policy more right
Reply:
omp_set_num_threads (2);

See this sentence, I feel
landlord does not run on a dual-CPU or dual-core machine
Static with reason I guess because the parallel region is very simple, it does not occur very uneven situation computation.
Reply:
I run on the 2 * P4Xeon, a total of four cores, but I can use 2-3 to handle, to calculate a FDTD

RAMDisk used to read the file cache, it eliminates most of the hard disk bottleneck

m2213231 quite right, freqnum is the real bottleneck, its value is generally between 30-200 variable, I hope the bigger the better

static that I just add one, in fact, I still do not understand the usage of the schedule, but in any case should not be more than a process also Mana

t_xz say should optimize small cycles, do not know have much influence sync, I am here only two places can be separated, one is k, one fi. i and j are sequentially read file due to the presence of problems, can not parallelize a great impact if a large number of sync, it seems that I read each score open files

programming under linux I was contacted shortly, VTune I went to see how to use it

No change in the cycle i, j, fi, freqnum and other variables should not control him

Reply:
I run on the 2 * P4Xeon, a total of four cores, but I can use 2-3 to handle, to calculate a FDTD

RAMDisk used to read the file cache, it eliminates most of the hard disk bottleneck

m2213231 quite right, freqnum is the real bottleneck, its value is generally between 30-200 variable, I hope the bigger the better

static that I just add one, in fact, I still do not understand the usage of the schedule, but in any case should not be more than a process also Mana

t_xz say should optimize small cycles, do not know have much influence sync, I am here only two places can be separated, one is k, one fi. i and j are sequentially read file due to the presence of problems, can not parallelize a great impact if a large number of sync, it seems that I read each score open files

programming under linux I was contacted shortly, VTune I went to see how to use it

No change in the cycle i, j, fi, freqnum and other variables should not control him

No comments:

Post a Comment