Wednesday, February 3, 2016

Worth considering ~~




            


I heard that Microsoft has a new technology: WCCS (Microsoft's high-performance computing platform), CCS is Microsoft's first to run parallel high-performance computing (HPC) applications software. I do not know whether to give this tool we have developed a parallel program to bring some convenience.
In addition, I would like to ask, based on the traditional. Architecture Development Method net multithreading parallel programs in ways to create multiple threads to create a thread pool, as well as asynchronous calls, etc. are equivalent to each other? There are still some differences among them. Let me give an example, such as a program that can support 100-way parallel execution, but now my only four-core CPU, that is able to support up to four parallel operation. Of course, in the parallel folding principle, we can also use these four cores to perform the above procedure, but if we had 100 threads in the programmed time to execute this program, it is actually very wasteful, because in fact Just use four threads can maximize the performance of the machine. This wasted a lot of time overhead to create threads used. Is there a flexible way for the underlying basis of existing hardware resources (such as 8-core to open eight threads) to adjust the number of threads open it?
Reply:
There ah, up to investigate how much CPU (like kernel has this function), and then open thread
Reply:
OpenMP is open based on different numbers of threads CPU usage.
But in addition to the use of multi-threaded multi-core CPU program to calculate, there are to improve the response functions in those areas open 100 threads instant may not be wasted.
Reply:
In addition, recent use .net thread pools, encountered a very strange question:
I use a lot of open thread pool thread, but sometimes there is a thread (each time a problem occurs all the time instead of many) was not able to perform complete. While these threads are accessing the same memory area, but only one thread did not execute the accomplishment, not because of the deadlock caused (if the deadlock can not be performed at least two threads finish it) we speculate about what causes it ? In addition, there is no way you can find out the threads in the thread pool which finished not implemented. There is a thread for debugging in VS2005 there is no more intuitive ways, such as to know which thread execution there.


Incidentally attach Source:

private int completeThreadNum;

public void Preprocessing (IPositionSet positionSet)
{
completeThreadNum = 0;

positionSet.InitToTraverseSet ();
while (positionSet.NextPosition ())
{
ThreadPool.QueueUserWorkItem (new WaitCallback (ThreadProc), positionSet.GetPosition ());
// Insert (positionSet.GetPosition ());
}

while (completeThreadNum & lt; positionSet.GetNum ())
{
Thread.CurrentThread.Join (1);
}
}

private void ThreadProc (object point)
{
Insert ((IPosition) point);
completeThreadNum ++;
}

public void Insert (IPosition point)
{
IPosition NewPosition = point;

for (int levelSequrence = LevelList.Count - 2; levelSequrence & gt; = -1; levelSequrence--)
{
Part currentPart = LevelList [levelSequrence + 1] .GOCPartRefByPoint (point);

currentPart.AddToSubPositionList (NewPosition);

if (currentPart.GetSubPositionNum () == 1)
{
// If part there is only one point, explain this part is new, macro block to which it belongs must also include it.
NewPosition = currentPart;
}
else
{
break;
}
}

// Update the midpoint of the different layers inside part of the point where the number of
for (int levelSequrence = LevelList.Count - 2; levelSequrence & gt; = -1; levelSequrence--)
{
((Part) LevelList [levelSequrence + 1] .GetPartRefByPoint (point)) SubPointNumIncrease (1);.
}
}


I'm With the completeThreadNum to know how many threads
not finished execution
Reply:
How many threads particular good start, this is not just open to decide how many cpu, more important is to look at your application is what, if there are a lot of IO operation, you open only on a 4-core 4 Thread words, cpu lot of time is idle.
Reply:
Memory read, count io? By the way, the memory can be read at the same time to read a plurality of bits?
Reply:
How many threads open depends on the specific application, if your application is computationally intensive, and there are many calculations can be performed in parallel (such as using OpenMP written in scientific computing program) is generally the number of threads is the number of CPU +1 or +2 < br />The program is I / O intensive, such as network servers, database servers, file servers, usually also open more threads

In order to maximize the efficiency of computers, the way we need to use performance tests to assess the number of threads.
Reply:
But for. net What better way to do performance testing?
Reply:
.net itself there are many performance counters reflect the performance of the program.
Reply:
By the way, I would like to ask whether the time allocated memory parallel processing? Suppose I am now in a multi-core platform, such as a 4-core, I cycle to a class to create multiple objects (in order to have a number of conceptual, suppose you want to create 100,000 objects, 200M of memory space occupied), and and I for every time you create (or every n create objects) to open a separate thread, allowing multiple threads to complete the bulk of the work to create an object, in this case, as opposed to a single check multicore performance without significant increase? My own analysis is: to create an object when the first to find the space available in the memory above, which requires CPU to calculate, so for multi-core architecture, a parallel performance is improved, but at the same time I think the process of memory allocation is IO-intensive process that requires a lot of reading and writing memory, and whether memory can be read in parallel? Or that speed is not the bottleneck of memory read and write performance? (Reason: CPU process is much slower than the memory read and write?), Or write process is completed in the cache, so quickly? (But this seems unlikely, because large chunks of memory to create memory-oriented, rather than repeat a memory read and write, which should lead to frequent cache misses) there is like architecture can now use MDA to Auxiliary write memory, and from this to some extent liberated the CPU, leaving the reader memory bottleneck is not CPU?

I would like to have a friend on this nuclear device some experiments to see multi-core will bring a substantial increase in performance to allocate a lot of memory, and if so, how many times can improve? Can the number of nuclei close?
Reply:
From the operating system level, the allocation of memory can be, each CPU has virtual space; however malloc implementation is not designed for parallel, it is recommended to use hoard memory allocator
Specific test See http://www.cs.umass.edu/~emery/heaplayers.html
However, such a civil level in Intel's multi-core systems, memory access is indeed a bottleneck because bus bandwidth problem
The DMA (not MDA) to communicate between devices without going through CPU design
Reply:
I use C #, in addition, what machines above, access to memory before it can be executed concurrently? And I feel very deeply, to really achieve parallel computing, parallel memory access is critical, because nothing is simple computing computing tasks without involving the calculation result is stored
Reply:
If you read and write to different memory, it can be parallel.
Reply:
You only have to abandon C # and .net Framework ...... that thing more efficient CPU worse ......
Reply:
On the memory allocation, memory allocation mechanism may now do not support the underlying parallel. However, no specific experiment too.

Instant same address for the memory to be read in parallel, the hardware will resolve the conflict, but the software, you need to prepare themselves in sync. In fact, said that the memory of the immediate parallel reading, also affected by memory bandwidth. Therefore, relatively large demand for memory bandwidth calculations need to consider using a good cache
Reply:
I'd just want to be able to read and write in parallel in different memory addresses, now think of it, forturn is very simple to do, you can create a series of objects and parallel to its initial value!

No comments:

Post a Comment