Nick Luparev
it's one of the basic transformations of the graph to switch "memory" vs "time" so a DFA takes more memory but it has less options to move between states.
In some cases it's better to have NFA but usually in compiler design you move towards the optimized DFA instead of the NFA.
I like basics of compiler design from Torben Æ Mogensen. But there are lot of others :) they don't specialize into parser specifically but the first chapters till the AST and the optimizations might help to get a better grip.