2014/01/15

I see threads everywhere...

A very long time a go, when I was a cluster scheduler developer, I've seen my record of concurrent threads. In the stress test of the system, the scheduler had up to 480 threads consuming a 1% of the cpu. That was good and the objective because an scheduler shall not be doing anything else than planning (with in different levels an abstractions, managing restrictions and complains). The code was full of semaphoresevents and signals. The tests were succeed without deadlocks. Since then this has been my record in number of threads.

In my code I use very often threading. In a object oriented design many threads can be doing different things with those objects and it is fun when many of them are interacting and the application works as expected with a good performance. I've play with OpenMP and MPI, but I've never raised my record of concurrent threads, until now.

I'm following, in coursera, a subject about heterogeneous parallel programming with CUDA and this has broken the record in very very small pieces. In one of the exercises, a matrix multiplication, the data set is: A_{200,100}*B_{100,256} where the result is C_{200,256}. That is 200*256 cells and one thread each to do the underlying basic operations.

Using CUDA blocks and grids, where often the data doesn't fit exactly and you exceed a bit, this matrix, in my case I've used 16*16 blocks on a 17*12 grid of blocks. This rises my record of parallel threads up to 52224 threads. More than a hundred times...

Event this numbering, imho there are categories on this threading count. In this record, those threads where doing almost the same with different data. In my previous record, there where only a few of them doing the same thing. That splits also the threading count in two categories...

Update 20140202: Did I set my threading record to 52.224? Doing another exercise on this cuda course, I've break it again: a convolution of a 2k*2k coloured image (3 channels, and a 5*5 mask) 12.582.912 parallel threads did the job in 19ms...