On AMD, operand size is never forced to 64 bit - instead, it only defaults to 64 bit, which means that 0x66 can be used to encode 16 bit version of the instructions.
Fixed VEX decoding in 32 bit mode - vex.vvvv bit 3 is simply ignored.
Fixed several FMA instructions decoding (L/W flag should be ignored).
Print the 64 bit immediate value in disassembly, instead of the raw immediate (note that the operand always contains the sign-extended, full immediate).
XBEGIN always uses 32/64 bit RIP size (0x66 does not affect its size).
Decode WBINVD even if it's preceded by 0x66/0xF2 prefixes.
Several mnemonic fixes (FXSAVE64, FXRSTOR64, PUSHA/PUSHAD...).
Properly decode VPERMIL2* instructions.
Fixed SSE register decoding when it is encoded in immediate.
Decode SCATTER instructions even though they use the VSIB index as source.
Some disp8 fixes (t1s -> t1s8/t1s16).
SYSCALL/SYSRET are decoded and executed in 32 bit compat modem, even though SDM states they are invalid.
RDPID uses 32/64 bit reg size, never 16.
Various other minor tweaks & fixes.
Re-generated the test files, and added some more, new tests.