From febb6692e097ba5915dd2167ec77829591be4665 Mon Sep 17 00:00:00 2001 From: philsmd <921533+philsmd@users.noreply.github.com> Date: Fri, 3 Jan 2020 11:41:10 +0100 Subject: [PATCH] fixes #2121: explain the utf16-le / utf16-be limitation in docs/limits.txt --- docs/limits.txt | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/docs/limits.txt b/docs/limits.txt index 884a781be..0ee928080 100644 --- a/docs/limits.txt +++ b/docs/limits.txt @@ -22,6 +22,17 @@ Important: That does not mean UTF-16 file content, which is fully supported. It only means the filename itself. +## +## Hashing algorithms that internally use UTF-16 characters could in special cases lead to false negatives +## + +The UTF-16 conversion implementation used within the kernel code is very elementary and for performance +reasons does not respect all complicated encoding rules required to correctly convert, for instance, ASCII +or UTF-8 to UTF-16LE (or UTF-16BE). + +The implementation most likely fails with multi-byte characters, because we basically add a zero byte every +second byte within the kernel conversion code. + ## ## The use of --keep-guessing eventually skips reporting duplicate passwords ##