When I first learned about usefaultfd my initial reaction was - wow, finally! - this is a very useful feature, I can see some interesting applications of it outside of QEMU/KVM kind of scenarios.
So, I finally tried it to see if it works for my case. Functionally - yes, it's perfect, and it trivially solves a tricky problem that otherwise would require kernel changes.
However, it seems very slow. A quick benchmark shows that in my test program it takes ~50 microseconds to handle a page fault. Which is way longer than what kernel is able to do for, say, private anonymous mapping updates - it consistency falls below ~2 microseconds per page fault or so.
I wonder if I am missing something very obvious here and somehow don't set it up correctly?