preview

azoth

Azoth answers one question: is this file malicious? It reads cleave's output — MITRE ATT&CK techniques and MBC behaviors extracted from the sample — and returns a label (benign, suspicious, hostile), a calibrated probability, and the specific capabilities that drove the call.

Anything cleave can decompose, azoth can classify: binaries in six formats (ELF, PE, Mach-O, Java .class, Python .pyc, compiled AppleScript), source in 20+ languages, and packaged forms (Python wheels, npm modules, VS Code extensions).

Everything runs locally. CPU only, no network, no telemetry, no per-call cost. Weights, the training pipeline, and the capability schema are Apache 2.0; litmus is the reference scanner.

What azoth does not do

It does not execute samples. It does not read bytes. If cleave cannot decompose a sample — unrecognized packer, dynamic loader it cannot follow, control flow it cannot reconstruct — azoth returns clean. The classifier is bounded by the feature pipeline beneath it: the failure modes are inspectable, and what fools azoth would fool a human reading cleave's output.

The reverse holds too. Recompiling, repacking, or shuffling strings does not change the input — azoth scores capabilities, not surface.

Design

No prompt injection. There is no instruction channel. A malicious sample cannot talk the classifier into ignoring its training the way it can with an LLM. The attack surface is the feature pipeline, not the model.

Hierarchy-aware. MBC is a tree; azoth learns combinations across leaves, objectives, and intent. A set of individually unremarkable capabilities can still trip the model when the combination matches what real families do.

Local and fast. Distilled for CPU inference; fast enough to scan a full disk on commodity hardware. Same weights on a laptop, a CI runner, or an endpoint agent.

Status and numbers

Preview. The model is a weighted ensemble — classifiers trained on different slices of the capability space, blended by calibrated weights. TPR, FPR, and per-family breakdowns will be published once the holdout corpus is locked and the thresholds stop moving. Weights, training pipeline, and evaluation harness are on Codeberg. Until then: read the contributing capabilities, not just the verdict.

Dataset providers

This project wouldn't have gone anywhere without the service and support of the researchers and curators who collect, label, and publish malware samples. The training corpus draws from: