Thanks a lot for your help.
We have a case open so hopefully we'll have more information from them. Its not actually affecting production at the moment as we're throttling the load - but we'd like to run at full capacity - but from what we can see the capacity is a function of time since reboot.
We've been able to induce this on a test env by running 100s of procs simultaneously.
[On production we managed to see the issue with ~100 procs]
The one consistent factor is that it always happens at about 80% usage of the procedure cache.
Running your sql shows the top function of the stack trace to be
either upyield or upsleepgeneric
Not sure what I should be looking for - but I'll have a look through the 100's of spids.
We'll try TF 7790; I suspect it will help but mostly by delaying the time for the problem to occur.
We're going to increase the amount of procedure cache as well.
Thanks again for your help - its most appreciated.
Mike