I have two bugs I'm very proud of finding.
The first was a weird bug in which a piece of cross-platform proprietary boxed software my company wrote would hang. It was seemingly random.
I asked someone to get a freeze it and get a stack trace next time it hung. The person still couldn't figure out what was going on, so I looked at it and figured out that it was grabbing a mutex inside the malloc call inside a signal handler that had fired during a malloc call. The main thread was deadlocking with itself.
I fixed it by changing the signal handler to simply poke a byte into a pipe so the main process' event loop would wake up and call the handler from within the main event loop.
The second was a similarly weird bug in a specialized web server running on Linux that was originally chalked up to 'poorly implemented consumer routers'.
The main event loop was using the select system call and they had also fiddled the ulimit to allow 8192 open file descriptors per process. It turned out that the kernel they were using had a compiled-in limit on the select bit-vector size of 1024 (which is the standard limit for Linux kernels). So basically any connection assigned a file handle of >1024 was going to be ignored.
The best solution would've been to use epoll because poll exhibits poor scaling behavior (it requires a linear search every time an event happens). But that would've required significant re-architecting of the main event loop and that was out-of-scope.
So I told the person to fix it by using poll, which takes an array of pollev structs of arbitrary size (the size is passed in) and from a design standpoint is basically a drop in replacement for select. That solved the problem