1
0
mirror of https://github.com/hashcat/hashcat.git synced 2025-07-03 05:12:41 +00:00

The Assimilation Bridge (comprehensive python plugin documentation)

This commit is contained in:
Jens Steube 2025-06-01 08:40:26 +02:00
parent ed71e41ae1
commit 1a49b6c81e
3 changed files with 335 additions and 86 deletions

View File

@ -0,0 +1,258 @@
# Hashcat Python Plugin Development Guide
This document is a comprehensive guide for writing custom hash modes in Python via Hashcat's Assimilation Bridge plugin.
## 1. Introduction
The The Assimilation Bridge enables developers to implement complete hash mode logic in languages other than C, most notably Python. Traditionally, customizing Hashcat required writing a module in C and a kernel in OpenCL/CUDA. With the bridge, you can now implement a complete hash mode in Python.
The bridge supports two hash modes to run python code:
- `-m 72000`: Uses single-threaded Python interpreter and hashcat controlling the multi-threading.
- `-m 73000`: Uses classic multiprocessing module controlling multi-threaded support.
Having two hash modes is currently a workaround; future Python developments toward a fully GIL-free mode should eventually resolve this. The single-threaded Python route is the way to go, and when Python will be totally GIL-free, we will remove the multiprocessing support completely. For now, we must work around platform-specific behavior (see `hashcat-python-plugin-requirements.md`).
## 2. Requirements
Ideally, start by walking through `hashcat-python-plugin-quickstart.md`, or read `hashcat-python-plugin-requirements.md`.
## 3. Python Bridge basics
Hashcat implements the CPython interface, loading the embedded interpreter via dynamic loading mechanisms (`dlopen()`, `LoadLibrary()`, etc). This enables runtime flexibility, allowing Hashcat to use whatever Python version is installed on the system. Users of the precompiled Hashcat binaries dont need Python headers and just a working Python interpreter. Compatibility is checked at runtime, not compile time. If hashcat detect an invalid python version it will stop and print informative instructions on what to do next.
In general, when using any assimilation bridge "application" (such as the Python bridge), the hash mode determines which bridge plugin is loaded (this is a 1:1 relationship). From there, the bridge decides how to proceed. In the case of the Python bridge, it loads the Python library, sets up the interpreter, and finally selects which Python script to execute. Understanding this flow is essential, especially if you plan to contribute to upstream Hashcat on GitHub or want to register a dedicated hash mode number.
You can decide to use the generic hash mode and only contribute the .py file itself with your implementation, or you copy the bridge code and hardcode the path to you .py implementation in there. The advantage in the generic mode is that it's super simple, but it will run a little slower and you have less control about workload tuning. Also having a dedicated mode allows you to implement a unit test, because you have a dedicated hash mode that you can refer to.
Hashcat includes a top-level `Python/` directory with standard helpers and bridge modules:
The following three files are relevant regardless of whether you plan to create a generic or a dedicated hash mode. These modules do the heavy lifting to interact with Hashcat internals from Python. Basically you should not need to change them, and instead you import them from your implementation:
```text
- hcsp.py: Helper for single-threaded mode. Manages queue handling, function invocation, and context propagation.
- hcmp.py: Extends `hcsp.py` to support multiprocessing. It spawns worker processes and routes password batches via queues.
- hcshared.py: Shared utility functions between SP and MP, for instance some getter function for salt data retrieval.
```
There are two additional files, and they are mainly relevant in case you plan to make use of the generic hash mode. But even if you plan to make a dedicated hash mode have a look into them, most likely they will be a very good template for your non-generic mode
```text
- generic_hash_mp.py
- generic_hash_sp.py
```
We will discuss these two in more detail in the `generic hash mode` section.
## 4. Required Functions in a Python Module
Both `-m 72000` and `-m 73000` follow the same requirements, with the idea that your python code will run in both hash-modes, whatever your user decides to use.
The requirements are to implement the following three functions:
```python
def init(ctx):
def term(ctx):
def kernel_loop(ctx, passwords, salt_id, is_selftest):
```
- `init(ctx)`: Called once during plugin startup. All salts and esalts are copied at this stage. You use it to wire up callbacks to helper modules.
- `term(ctx)`: Called once at shutdown. Use it to clean up resources like file handles or sockets if you use them.
- `kernel_loop(...)`: Main function for processing password batches. This is called many times during cracking.
A typical `init()` might look like this:
```python
def init(ctx):
hcsp.init(ctx, calc_hash, extract_esalts)
```
Here:
- `calc_hash()` is your main implementation that processes one password (with one specific salt). You return the result in the format that hashcat requires. In generic hash mode that would be just the same format as in your hashlist. Instead, if you write your own decoder and encoder in the module, this can also be in binary for better performance.
- `extract_esalts()` is an optional function to deserialize binary esalt blobs. Depends on your hash, if esalts (such as binary blobs for decryption) are required.
- `hcsp.init()` stores these so that `handle_queue()` (described later) can call `calc_hash()` for each password in a batch.
Note the the ctx will hold your salt data from all hashes. Whenever calc_hash() is called, this context is given. If you have multiple hashes with multiple salts, the context will have all of them. The helper module hcsp.init() will deserialze the static salt and the dynamic salt and store the data in your context.
A typical `term()` might look like this:
```python
def term(ctx):
hcsp.term(ctx)
```
This should be used in case you had open files, open networking connection, or similar. We are good citizen!
Here's our main function `kernel_loop()` where we spend almost all our time:
```python
def kernel_loop(ctx,passwords,salt_id,is_selftest):
return hcsp.handle_queue(ctx,passwords,salt_id,is_selftest)
```
Hashcat optimizes performance by sending password candidates in batches. The `passwords` parameter in `kernel_loop()` is a list. Instead of manually looping over them, the helper module will queue them, and call your callback function which you had specified in the `init()` function before. The idea is that whenever your calc_hash() is called, it will always be only about one password and one salt (and optional some binary blobs), and you do not have to deal with queuing, whatever it is threaded or not threaded.
Of course, you can also fully control this yourself:
```python
def calc_hash(ctx, password, salt_id, is_selftest):
# Your custom logic here
return encoded_guess
```
If you want to control all by youself, here's what's important to know:
- salt_id: Basically a index number which tells you about which salt your calculation is about. When you initially receive the context, it will hold all salts at once, and you need to store them in the context. The helper scripts do that for your, but just for you to know, its the salt_id which tells the handle_queue() which salt data to pick before it calls your hash_calc() function.
- is_selftest: Historically hashcat keeps two parallel structures for the the selftest hash and real hash. As such they arrive in the context buffer, and you need to make a decision on that `is_selftest` flag which salt buffer to pick.
## 5. Esalts and Structured Binary Blobs, and fixed Salts
One of the most confusing parts for developers new to hashcat is salt handling. While simple hash modes may work out-of-the-box with default helpers, dealing with salts in real-world formats requires deeper understanding.
For complex formats, you may need a structured binary blob ("esalt") passed from the C plugin to Python. Since only you as the developer know the structures of your hash mode, structures vary. For that reason you can optionally write Python code to unpack it.
### Some C Structure
Let's say you need to transfer a salt value to python. You can specify an exact structure in the module to do so. Or, as in this example, this is how we had designed a generic hash mode:
```c
typedef struct {
u32 hash_buf[16384];
u32 hash_len;
u32 salt_buf[16384];
u32 salt_len;
} generic_io_t;
```
### Unpacking esalts
To access the data, we typically want to unpack it so it's easier to access from python:
```python
def extract_esalts(esalts_buf):
esalts = []
for hash_buf, hash_len, salt_buf, salt_len in struct.iter_unpack("65536s I 65536s I", esalts_buf):
hash_buf = hash_buf[0:hash_len]
salt_buf = salt_buf[0:salt_len]
esalts.append({ "hash_buf": hash_buf, "salt_buf": salt_buf })
return esalts
```
Remember, the extract_esalts() was given as function pointer to hcsp.init(). That's how the helper can include your code from outside the helper code. The esalt format is based on what is defined in the module struct.
### Salts Appear as Binary Blobs using 32 bit datatypes
Hashcat is optimized for performance, especially on GPUs. To improve performance, it mostly works on 32 bit datatypes instead of 8 bit datatypes. In python the helper scripts convert these binary blobs into byte[] objects that are easier to work with. As you can see from the above example: `16384 * 4 = 65536`
### Fixed salt datatypes
In general, in all hashcat hash modes:
- The `salt_t` structure is **fixed and consistent**.
- The esalt (extra salt) is **custom and plugin-specific**.
Since `salt_t` is a fixed structure, the helper mode come with a salt unpacker code and in addition, it provides getter functions:
```python
def get_salt_buf(salt: dict) -> bytes:
def get_salt_buf_pc(salt: dict) -> bytes:
def get_salt_iter(salt: dict) -> int:
def get_salt_iter2(salt: dict) -> int:
def get_salt_sign(salt: dict) -> bytes:
def get_salt_repeats(salt: dict) -> int:
def get_orig_pos(salt: dict) -> int:
def get_digests_cnt(salt: dict) -> int:
def get_digests_done(salt: dict) -> int:
def get_digests_offset(salt: dict) -> int:
def get_scrypt_N(salt: dict) -> int:
def get_scrypt_r(salt: dict) -> int:
```
These go back to the `salt_t` fixed structure you can find in `OpenCL/inc_types.h`. As an example on how to use these, here's a snippet from the `yescrypt`:
```python
settings=hcshared.get_salt_buf(salt)
```
The `salt` variable is one of the parameters from the calc_hash():
```python
def calc_hash(password: bytes, salt: dict) -> str:
```
Note that if fully exhaust the Hashcat keyspace, your function has been called X times Y. X is the number of candidates, and Y is all the salts (except if a salt has been cracked). What's important to realize that within your function, you implement hashing logic only for precisely that situation where you have one password and one salt.
### Merging Salts and Esalts into a Single Object
Finally, after unpacking both salts and esalts from their binary blob form, they are explicitly combined into a single dictionary object to simplify access:
```python
for salt, esalt in zip(salts, esalts):
salt["esalt"] = esalt
```
Initially, salts and esalts are unpacked separately from their respective binary structures. Each salt entry contains standardized fields defined by the fixed `salt_t` structure and each esalt is dynamically structured and plugin-specific. Merging the esalt dictionary into the salt dictionary makes accessing all related data straightforward and intuitive within Python.
## 6. Python generic hash mode `-m 72000` and `-m 73000`
The "generic hash" support in hashcat is using python. The main idea behind "generic" is the goal is to write freely ideal for rapid prototyping.
The most straight-forward way is to edit the following files directly:
- `generic_hash_sp.py` for single-threaded (SP), typically when the user is using `-m 72000`.
- `generic_hash_mp.py` for multiprocessing (MP), typically when the user is using `-m 73000`.
Notes:
- Even though `-m 72000` uses single-threaded Python, the bridge plugin above it manages multiple Python interpreters (one per thread) making it effectively multi-threaded.
- On Windows, if `-m 73000` is selected, it silently falls back to `generic_hash_sp.py` due to limitations with multiprocessing. This behavior is important to understand and you might otherwise wonder why your code changes have no effect.
If you modify one of these plugin files, there's a trade-off: you wont be able to contribute that code directly to the upstream Hashcat repository, since those files are meant to remain clean for demonstration purposes.
To address this, the assimilation bridge provides a generic parameter that users can specify via the command line. In the case of the Python bridge, only the first parameter is used. You can override the Python script to be loaded using `--bridge-parameter1`:
```
$ ./hashcat -m 73000 --bridge-parameter1 myimplementation.py hash.txt wordlist.txt ...
```
This tells the Python bridge plugin to load `myimplementation.py` instead of the default `generic_hash_mp.py`. This approach is especially useful if you plan to contribute `myimplementation.py` to the upstream Hashcat repository. If you choose to stay within the generic mode, your Python code wont have a dedicated hash mode, and you'll need to instruct users to use the `--bridge-parameter1` flag to load your implementation.
### Design Tradeoffs and Format Considerations
In the generic hash mode, we are using a generic binary esalt to avoid writing complex C encode/decode logic. However, guesses returned from Python must match the **original encoded format** exactly. This can be inefficient if encoding is complex. The hash lines are intentionally not decoded and re-encoded in a structured way. Instead, a simple trick such as appending the salt after an asterisk (`*`) is used:
```
hash-with-embedded-salt*salt
```
This technique makes each hash appear unique, especially when multiple salts are involved, and simplifies initial parsing and processing.
However, it is crucial to highlight:
- You are **not obligated to follow this generic approach**. In fact, it's generally preferable to implement proper hash line decoding and encoding logic.
- For instance, a proper Yescrypt implementation (unlike the quickstart document) would ideally decode hash lines into clear, separate components (digest, salt, parameters) and encode them accordingly upon successful cracking.
The reason the generic hash mode provided by Hashcat employs a simplified approach is to:
- Demonstrate a flexible, format-agnostic solution suitable for initial prototyping or unfamiliar hash formats.
- Avoid complexity and make it easy for plugin developers to get started quickly without deep understanding of specific hash format parsing logic.
In summary, while the generic mode is quick and easy, robust real-world plugins **should implement proper hash decoding and encoding logic** to ensure accuracy, efficiency, and maintainability.
## 7. Debugging Without Hashcat
You can run your plugin as a standalone script:
```
python3 generic_hash.py
```
It reads passwords from stdin and prints the result of `calc_hash()`:
```
echo "password" | python3 generic_hash_mp.py
```
Note that you probably want to inline the correct salt value, see the `main` section in the code. TBD: Add some sample

View File

@ -1,31 +1,29 @@
# Hashcat Python Plugin Developer Guide
# Hashcat Python Plugin Quickstart
## Introduction
This guide walks you through building custom hash modes in **pure Python** using Hashcat v7's Python plugin from the new assimilation bridge. Whether you're experimenting with a new algorithm, supporting a proprietary format, hacking a prototype, or just writing hashing logic in a high-level language, this plugin interface makes it fast and easy.
This guide walks you through building custom hash modes in **pure Python** using Hashcat v7's Python plugin interface via the new assimilation bridge.
No C required. No recompilation. Just write your logic in pure python code in `calc_hash()`.
Whether you're experimenting with a new algorithm, supporting a proprietary format, prototyping a new feature, or simply prefer writing in a high-level language, this plugin interface makes development fast and straightforward.
You can use any python modules you want.
No C required. No recompilation. Just write your logic in Python in the `calc_hash()` function.
---
You can use any Python modules you like.
## Quick Start
Hashcat mode `73000` is preconfigured to load a generic Python plugin source file from `Python/generic_hash_mp.py`:
A benchmark is a good way to verify that your setup is working correctly.
Hashcat mode `73000` is preconfigured to load a generic Python plugin from the source file `Python/generic_hash_mp.py`:
```
hashcat -m 73000 -b
```
You can edit the `Python/generic_hash_mp.py`, or override the plugin source file with:
If you encounter issues with your Python installation, refer to `hashcat-python-plugin-requirements.md`.
```
hashcat -m 73000 --bridge-parameter1=my_hash.py hash.txt wordlist.txt ...
```
---
To learn how to modify the plugin source, see `hashcat-python-plugin-development-guide.md`.
## Yescrypt in One Line
@ -41,104 +39,48 @@ Example output:
$y$j9T$uxVFACnNnGBakt9MLrpFf0$SmbSZAge5oa1BfHPBxYGq3mITgHeO/iG2Mdfgo93UN0
```
### Prepare Hash Line for Hashcat
### Prepare the Hash Line for Hashcat
```
$y$j9T$uxVFACnNnGBakt9MLrpFf0$SmbSZAge5oa1BfHPBxYGq3mITgHeO/iG2Mdfgo93UN0*$y$j9T$uxVFACnNnGBakt9MLrpFf0$
```
(Use the full hash before the `*`, and the salt portion after the `*`.)
(Use the full hash before the `*` and the salt portion after the `*`.)
Hashcat modes `73000` and `72000` are generic modes that do not parse the hash, which can lead to redundancy.
Refer to `hashcat-python-plugin-development-guide.md` to learn how to develop plugins for the generic hash mode.
### Plugin Code
Install the module:
Install the required module:
```
pip install pyescrypt
```
Then in your plugin (either `generic_hash_mp.py` for -m 73000 or `generic_hash_sp.py` for -m 72000)
Then in your plugin (either `generic_hash_mp.py` for `-m 73000` or `generic_hash_sp.py` for `-m 72000`):
**Note for Windows users:** Mode `73000` automatically switches to `generic_hash_sp.py`, so be sure to edit that file.
```python
from pyescrypt import Yescrypt,Mode
from pyescrypt import Yescrypt, Mode
# Self-Test pair
# Self-test pair
ST_HASH = "$y$j9T$uxVFACnNnGBakt9MLrpFf0$SmbSZAge5oa1BfHPBxYGq3mITgHeO/iG2Mdfgo93UN0*$y$j9T$uxVFACnNnGBakt9MLrpFf0$"
ST_PASS = "password"
def calc_hash(password: bytes, salt: dict) -> str:
return Yescrypt(n=4096, r=32, p=1, mode=Mode.MCF).digest(password=password, settings=hcshared.get_salt_buf(salt)).decode('utf8')
return Yescrypt(n=4096, r=32, p=1, mode=Mode.MCF).digest(
password=password,
settings=hcshared.get_salt_buf(salt)
).decode("utf-8")
```
That's it - full Yescrypt support in Hashcat with a single line of code.
Thats it.
### Run
### Run Regularly
```
hashcat -m 73000 yescrypt.hash wordlist.txt
```
---
## Debugging Without Hashcat
You can run your plugin as a standalone script:
```
python3 generic_hash.py
```
It reads passwords from stdin and prints the result of `calc_hash()`:
```
echo "password" | python3 generic_hash_mp.py
```
Note that you probably want to inline the correct salt value, see the `main` section in the code.
---
## Windows and Linux/macOS
There are significant differences between Windows and Linux/macOS when embedding Python as done here. It's a complex issue, and we hope future Python developments toward a fully GIL-free mode will resolve it. For now, we must work around platform-specific behavior.
### On Windows
The `multiprocessing` module is not fully supported in this embedded setup. As a result, only one thread can run effectively. While the `threading` module does work, most cryptographic functions like `sha256()` block the GIL. CPU-intensive algorithms such as 10,000 iterations of `sha256()` will monopolize the GIL, making the program effectively single-threaded.
### On Linux/macOS
The `multiprocessing` module works correctly, enabling full CPU utilization through parallel worker processes.
### Free-threaded Python (3.13+)
Python 3.13 introduces optional GIL-free support. This allows multithreading to work even in embedded Python, both on Linux and Windows. Hashcat leverages this through two modes:
- `-m 72000`: Uses free-threaded Python (no multiprocessing)
- `-m 73000`: Uses GIL-bound Python with multiprocessing
Multiprocessing (73000) supports most modules and is generally better for real-world workloads, but it works only on Linux. Developers may use `-m 73000` on Linux for performance and `-m 72000` on Windows for development, provided their code does not rely on modules that require `cffi` because this as of now lacks support for running with Python free-treaded ABI.
---
## Python 3.13 Requirement
The `-m 72000` mode requires Python 3.13 due to its reliance on the new free-threading feature. This feature is not available in earlier versions.
### Why Python 3.13 Isn't Preinstalled
Several Linux distributions, including Ubuntu 24.04, do not ship with Python 3.13 because it was released after the distros feature freeze. You will likely need to install it manually.
### Installing Python 3.13
**On Windows**: Use the official installer and ensure you check the "Install free-threaded" option - it's disabled by default.
**On Linux/macOS**: Use `pyenv`. It's the easiest way to install and manage Python versions:
```
pyenv install 3.13t
pyenv local 3.13t
```
This makes it easy to manage `pip` packages without global installs or virtual environments.

View File

@ -0,0 +1,49 @@
# Hashcat Python Plugin Requirements
## Windows and Linux/macOS
There are significant differences between Windows and Linux/macOS when embedding Python as done here.
### On Windows
The `multiprocessing` module is not fully supported in this embedded environment, so only a single process can run effectively. In contrast, the `threading` module does work correctly on Windows for starting threads and enabling parallelism. However, most cryptographic functions like `sha256()` block the Global Interpreter Lock (GIL). Since we often run CPU-intensive algorithms (e.g., 10,000 iterations of `sha256()`), this monopolizes the GIL, making the program effectively single-threaded. To achieve true multithreading on Windows, we need to move to a free-threaded Python runtime.
**On Windows**: Use the official installer from https://www.python.org/downloads/windows/ and ensure you check the "Install free-threaded" option - it's disabled by default.
Do not use python from Microsoft Store it's too old.
### On Linux/macOS
The `multiprocessing` module functions correctly, allowing full CPU utilization through parallel worker processes. However, since threading is managed by Python, it relies on `fork()` and inter-process communication (IPC). This adds complexity and code bloat to Hashcat, effectively duplicating modules and bridge plugins, making the codebase harder to understand for those exploring how it all works. We could switch to a free-threaded Python runtime, but it's still unstable at the time of writing even on Linux (see the `cffi` problem below). For now, weve chosen to use the `multiprocessing` module as a more practical solution.
**On Linux/macOS**: Use `pyenv`. It's the easiest way to install and manage Python versions, see below section
### Free-threaded Python (3.13+)
In order to have multithreading on Windows, we were looking into Python 3.13 which introduces optional GIL-free support. This allows multithreading to work even in embedded Python. However, it has a major downside. Most relevant modules such as `cffi` still lacks support for running with the Python free-threaded ABI. But if your hash-mode does not rely on modules with `cffi` you should be fine using `-m 72000` no matter the OS.
At the time of writing, several Linux distributions, including Ubuntu 24.04, do not ship with Python 3.13 because it was released after the distros feature freeze. You will likely need to install it manually, which is one of the reason we are refering to use `pyenv`.
### Real-world best practice
For now, multiprocessing (73000) supports most modules and is generally better for real-world workloads, but it works only on Linux. Developers may use `-m 73000` on Linux for performance and `-m 72000` on Windows for development.
### Pyenv
Pyenv is great for managing local python versions, and also frees us from using virtual environments while at the same time to not break global system installs when using `pip` to install new modules.
Check out https://github.com/pyenv/pyenv in order how to install `pyenv`.
After install, if you are fine with `-m 73000`
```
pyenv install 3.13
pyenv local 3.13
```
In order to use `-m 72000`
```
pyenv install 3.13t
pyenv local 3.13t
```