On AMD, operand size is never forced to 64 bit - instead, it only defaults to 64 bit, which means that 0x66 can be used to encode 16 bit version of the instructions.
New capability - bddisasm can now be instructed whether to decode some instructions as NOPs are as MPX/CET/CLDEMOTE. This is the case for instructions that are mapped onto the wide NOP space: in that case, an encoding might be NOP if the feature is off, but might be something else (even #UD) if the feature is on.
Added NdDecodeWithContext API - this becomes the base decode API; it received the input information filled in a ND_CONTEXT structure, whih has to be initialized only once, and can be reused across calls. The NdInitContext function must be used to initialize the context, as it ensures backwards compatibility by filling new options with default values.
Improvements to the README file.
Fixed VEX decoding in 32 bit mode - vex.vvvv bit 3 is simply ignored.
Fixed several FMA instructions decoding (L/W flag should be ignored).
Print the 64 bit immediate value in disassembly, instead of the raw immediate (note that the operand always contains the sign-extended, full immediate).
XBEGIN always uses 32/64 bit RIP size (0x66 does not affect its size).
Decode WBINVD even if it's preceded by 0x66/0xF2 prefixes.
Several mnemonic fixes (FXSAVE64, FXRSTOR64, PUSHA/PUSHAD...).
Properly decode VPERMIL2* instructions.
Fixed SSE register decoding when it is encoded in immediate.
Decode SCATTER instructions even though they use the VSIB index as source.
Some disp8 fixes (t1s -> t1s8/t1s16).
SYSCALL/SYSRET are decoded and executed in 32 bit compat modem, even though SDM states they are invalid.
RDPID uses 32/64 bit reg size, never 16.
Various other minor tweaks & fixes.
Re-generated the test files, and added some more, new tests.