That's not a great algorithm in your handler. You should have ZERO ifs. No decisions.
Store your AB state, i.e., 00 or 01, then append your next state, i.e 0001 means AB went from 00 to 01,thus B changed from 0 to 1. Make this a +1. If starting from 00, and you change to 10, then call this a -1. Build a 16 element array of all possible transitions holding the number that needs to be added to your count if it occurs, noting that some are illegal and need to be handled.
0000 0 0 no transition
0001 1 +1
0010 2 -1
0011 3 0 Illegal, two transitions, and so on.
Index into this array on every transition, watching for illegal events and dealing with them as you see fit. Add the result to the count. Shift the new values to the old value spot in the index number, and repeat forever
In Pseudocode
signed int8 add_subt[16] = { 0, 1 ,-1 , ....};
unsigned int8 idx;
signed int32 pos_count;
main() {
% initialize idx
idx = readA <<3 + readB<<2 + readA<<1 + readB;
while(1){}
}
interrupt_on_any_change(){
idx=idx<<2 & 0x0F + readA<<1 + readB;
pos_count=pos_count+add_subt[idx];
}
You could maintain err_idx to help you flag bad transitions
Doesn't that look a hair simpler??
Yes! It's possible for mechanical encoders with detents. It's going to limit you to a fairly low number of steps per revolution because the mechanical tolerances get troublesome.
Consider this CUI part (photo from Digikey):
The detents are such that the encoder outputs are guaranteed to be 'open' at the detent positions, so your pullups will draw no power (the vertical dashed lines indicate possible detent positions).
Best Answer
If you're trying to use 64 encoders as "frob knobs", the more typical way of doing this is to use each encoder for multiple purposes, and have some way of controlling which purpose the knob is serving at any given moment. Otherwise, I'd probably urge you to throw a microcontroller at each encoder, or at least have more than one microcontroller, each servicing as many individual encoders as it can without multiplexing -- you're already throwing more money at a problem than would typically be used, just keep going down the same path to make the device you want.
Alternatively, you might consider absolute encoders, so you don't need to worry about missed pulses.
That's the best I can offer without knowing more about what you're trying to accomplish (hence, "XY problem", as my comment says).