Also, surprisingly, our SPSC queue keeps on par with the JCTools' queues which is not something I was expecting
It depends by the state of the queue. If the consumer doesn't have nothing to do, it will keep on polling, invalidating what the producer is going to offer against, slowing it down. In such cases all queues perform (nearly) the same if the bottleneck is the sharing hit. I suggest to practice active benchmarking to find it out! Disclaimer: I am one of the JCTools authors