Joe Duffy is building a custom threadpool on his blog, exploring the different trade-offs:
The first threadpool he designs in part 1 is a very simple one, using a naive algorithme. The second one in part 2 is more interesting, since this allows threads to 'steal' work from each other and tries to take advantage of work still being in the memory cache by using a LIFO (Last In First Out) algorithm.
Update:
Part 3 was posted some time ago, which integrates the queue from part 2 in the pool from part 1.