Elimination can be enhanced by using a combination strategy, such as DECS [1]. Using a combiner approach should also work on a queue structure, though interestingly enough there is a paper on elimination-based fifo queues. I wrote an elimination stack in Java [2], but I haven't gotten around to adding combining to the arena yet. The performance is pretty good compared to other shared concurrent structures.
However, in my specific case I was writing a toy thread pool. This meant that a failed pend could just result in an attempt to pend it to a different worker thread, instead of trying the same stack over again.
[1] http://www.cs.bgu.ac.il/~hendlerd/papers/DECS.pdf [2] https://github.com/ben-manes/caffeine