cleave
cleave answers one question — what can this program do? It extracts capabilities from binaries, source, and archives, scoring each against 60,000+ behavior rules aligned to MBC and MITRE ATT&CK. Apache-2.0, no telemetry.
- Supply-chain & malware triage. Run on a release, a suspicious sample, or a directory of dropped files.
cleave diff old/ new/highlights new capabilities, tampered headers, and provenance anomalies between versions. - Feature extraction for ML/AI pipelines. Stable JSON schema, deterministic output, SHA256-keyed cache. litmus is the reference downstream classifier.


What cleave analyzes
- Binaries — Mach-O, ELF, PE, MSI, CHM, PyInstaller, Java
.class, Python.pyc, Python pickle, compiled AppleScript - Source (~22 langs, tree-sitter) — Python, JS/TS, Go, Rust, C/C++, Java, Kotlin, C#, Swift, ObjC, Ruby, PHP, Perl, Lua, Shell, PowerShell, Groovy, Scala, Zig, Elixir, Batch, VBScript, Makefile
- Archives (recursive) — zip, tar (gz/bz2/xz/zst), 7z, rar, cab, jar/war, deb, rpm, pkg, apk, gem, crate, whl, nupkg, phar, vsix, xpi, crx, ipa, epub
- Documents & data — PDF structure, RTF, LNK shortcut metadata, Office (OLE2 + OOXML), OpenDocument, plist, HTML, XML, Markdown, PNG/JPEG, package manifests, GitHub Actions, systemd units, XDG
.desktop
AI-derived rulesets
The 60,000+ rules are a mix. The hand-written ones are precise for patterns humans understand well. The AI-derived ones come from a training corpus of about a million samples and cover the long tail across 20+ languages — territory no team is going to keep up with by hand.
The AI runs at build time, not at runtime. The cleave you ship is deterministic: same input, same output, every run. No model weights loaded, no GPU, no network, no telemetry. If you don't trust that, the rules are in the source tree.
Install
brew install atomdrift/tap/cleave # macOS / Linux make install # via cargo
Usage
cleave suspect.bin # single sample cleave /tmp/box-o-malware # recursive, unpacks archives cleave diff v1.2.0/ v1.3.0/ # release-to-release diff cleave --format jsonl --min-crit suspicious # streaming JSON for pipelines