stable

cleave

cleave answers one question — what can this program do? It extracts capabilities from binaries, source, and archives, scoring each against 60,000+ behavior rules aligned to MBC and MITRE ATT&CK. Apache-2.0, no telemetry.

  • Supply-chain & malware triage. Run on a release, a suspicious sample, or a directory of dropped files. cleave diff old/ new/ highlights new capabilities, tampered headers, and provenance anomalies between versions.
  • Feature extraction for ML/AI pipelines. Stable JSON schema, deterministic output, SHA256-keyed cache. litmus is the reference downstream classifier.

cleave analyze — capabilities of a single sample

cleave diff — what changed between two releases

What cleave analyzes

  • Binaries — Mach-O, ELF, PE, MSI, CHM, PyInstaller, Java .class, Python .pyc, Python pickle, compiled AppleScript
  • Source (~22 langs, tree-sitter) — Python, JS/TS, Go, Rust, C/C++, Java, Kotlin, C#, Swift, ObjC, Ruby, PHP, Perl, Lua, Shell, PowerShell, Groovy, Scala, Zig, Elixir, Batch, VBScript, Makefile
  • Archives (recursive) — zip, tar (gz/bz2/xz/zst), 7z, rar, cab, jar/war, deb, rpm, pkg, apk, gem, crate, whl, nupkg, phar, vsix, xpi, crx, ipa, epub
  • Documents & data — PDF structure, RTF, LNK shortcut metadata, Office (OLE2 + OOXML), OpenDocument, plist, HTML, XML, Markdown, PNG/JPEG, package manifests, GitHub Actions, systemd units, XDG .desktop

AI-derived rulesets

The 60,000+ rules are a mix. The hand-written ones are precise for patterns humans understand well. The AI-derived ones come from a training corpus of about a million samples and cover the long tail across 20+ languages — territory no team is going to keep up with by hand.

The AI runs at build time, not at runtime. The cleave you ship is deterministic: same input, same output, every run. No model weights loaded, no GPU, no network, no telemetry. If you don't trust that, the rules are in the source tree.

Install

brew install atomdrift/tap/cleave             # macOS / Linux
make install                                  # via cargo

Usage

cleave suspect.bin                            # single sample
cleave /tmp/box-o-malware                     # recursive, unpacks archives
cleave diff v1.2.0/ v1.3.0/                   # release-to-release diff
cleave --format jsonl --min-crit suspicious   # streaming JSON for pipelines

Optional: rizin for disassembly, upx for runtime unpacking.