Anthropic-Cybersecurity-Skills performing-malware-triage-with-yara
'Performs rapid malware triage and classification using YARA rules to match file patterns, strings, byte sequences,
git clone https://github.com/mukul975/Anthropic-Cybersecurity-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/mukul975/Anthropic-Cybersecurity-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/performing-malware-triage-with-yara" ~/.claude/skills/mukul975-anthropic-cybersecurity-skills-performing-malware-triage-with-yara && rm -rf "$T"
skills/performing-malware-triage-with-yara/SKILL.mdPerforming Malware Triage with YARA
When to Use
- Rapidly classifying a large batch of malware samples against known family signatures
- Writing detection rules for a newly analyzed malware family based on unique byte patterns
- Scanning file shares, endpoints, or memory dumps for indicators of a specific threat
- Building automated triage pipelines that classify samples before manual analysis
- Hunting for variants of a known threat across an enterprise using YARA scans
Do not use as the sole analysis method; YARA triage identifies known patterns but does not reveal new or unknown malware behaviors.
Prerequisites
- YARA 4.x installed (
orapt install yara
)pip install yara-python - YARA rule repositories (YARA-Rules, awesome-yara, Malpedia rules, Florian Roth's signature-base)
- Python 3.8+ with
for scripted scanningyara-python - Sample collection organized in a directory structure for batch scanning
- Understanding of PE file format, hex patterns, and regular expressions for rule writing
Workflow
Step 1: Scan Samples with Existing Rule Sets
Apply community and commercial YARA rules to classify samples:
# Scan a single file yara -s malware_rules.yar suspect.exe # Scan a directory of samples yara -r malware_rules.yar /path/to/samples/ # Scan with multiple rule files yara -r rules/apt_rules.yar rules/ransomware_rules.yar rules/trojan_rules.yar suspect.exe # Scan with timeout (prevent hanging on large files) yara -t 30 malware_rules.yar suspect.exe # Scan and show matching strings yara -s -r malware_rules.yar suspect.exe # Scan with compiled rules (faster for repeated scans) yarac malware_rules.yar compiled_rules.yarc yara compiled_rules.yarc suspect.exe
# Download community rule sets git clone https://github.com/Yara-Rules/rules.git yara-community-rules git clone https://github.com/Neo23x0/signature-base.git signature-base # Scan with signature-base yara -r signature-base/yara/*.yar suspect.exe
Step 2: Write Rules for Unique String Patterns
Create YARA rules based on strings extracted during malware analysis:
rule MalwareX_Strings { meta: description = "Detects MalwareX based on unique strings" author = "analyst" date = "2025-09-15" reference = "Internal Analysis Report #1547" hash = "e3b0c44298fc1c149afbf4c8996fb924" tlp = "WHITE" strings: // C2 URL pattern $url1 = "/gate.php?id=" ascii $url2 = "/panel/connect.php" ascii // Unique mutex name $mutex = "Global\\CryptLocker_2025" ascii wide // User-Agent string $ua = "Mozilla/5.0 (compatible; MSIE 10.0)" ascii // Registry persistence path $reg = "Software\\Microsoft\\Windows\\CurrentVersion\\Run\\WindowsUpdate" ascii // Campaign identifier $campaign = "campaign_2025_q3" ascii condition: uint16(0) == 0x5A4D and // PE file (MZ header) filesize < 500KB and // Size constraint ($url1 or $url2) and // At least one C2 URL ($mutex or $campaign) and // Campaign identifier $ua // Specific User-Agent }
Step 3: Write Rules for Byte Patterns
Create rules matching specific code sequences:
rule MalwareX_Decryptor { meta: description = "Detects MalwareX XOR decryption routine" author = "analyst" date = "2025-09-15" strings: // XOR decryption loop (x86 assembly) // mov al, [esi+ecx] // xor al, [edi+ecx] // mov [esi+ecx], al // inc ecx // cmp ecx, edx // jl loop $xor_loop = { 8A 04 0E 32 04 0F 88 04 0E 41 3B CA 7C F3 } // RC4 KSA initialization (256-byte loop) $rc4_ksa = { 33 C0 88 04 ?8 40 3D 00 01 00 00 7? } // Embedded RSA public key marker $rsa_key = { 06 02 00 00 00 A4 00 00 52 53 41 31 } // PUBLICKEYBLOB condition: uint16(0) == 0x5A4D and ($xor_loop or $rc4_ksa) and $rsa_key }
Step 4: Write Rules with PE Module
Leverage YARA's PE module for structural detection:
import "pe" import "hash" import "math" rule MalwareX_PE_Characteristics { meta: description = "Detects MalwareX by PE structure and imports" author = "analyst" condition: pe.is_pe and // Compiled within specific timeframe pe.timestamp > 1693526400 and // After 2023-09-01 pe.timestamp < 1727740800 and // Before 2024-10-01 // Specific import hash pe.imphash() == "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6" or // Suspicious import combination ( pe.imports("kernel32.dll", "VirtualAllocEx") and pe.imports("kernel32.dll", "WriteProcessMemory") and pe.imports("kernel32.dll", "CreateRemoteThread") and pe.imports("wininet.dll", "InternetOpenA") ) or // High entropy .text section (packed) ( for any section in pe.sections : ( section.name == ".text" and math.entropy(section.raw_data_offset, section.raw_data_size) > 7.0 ) ) } rule MalwareX_Rich_Header { meta: description = "Detects MalwareX by Rich header hash" condition: pe.is_pe and hash.md5(pe.rich_signature.clear_data) == "abc123def456abc123def456abc123de" }
Step 5: Batch Triage with Python
Automate scanning of sample collections:
import yara import os import json import hashlib from datetime import datetime # Compile all rule files rule_files = { "apt": "rules/apt_rules.yar", "ransomware": "rules/ransomware_rules.yar", "trojan": "rules/trojan_rules.yar", "custom": "rules/custom_rules.yar", } rules = yara.compile(filepaths=rule_files) # Scan sample directory results = [] sample_dir = "/path/to/samples" for filename in os.listdir(sample_dir): filepath = os.path.join(sample_dir, filename) if not os.path.isfile(filepath): continue with open(filepath, "rb") as f: data = f.read() sha256 = hashlib.sha256(data).hexdigest() matches = rules.match(filepath) result = { "filename": filename, "sha256": sha256, "size": len(data), "matches": [], "classification": "UNKNOWN", } for match in matches: result["matches"].append({ "rule": match.rule, "namespace": match.namespace, "tags": match.tags, "strings": [(hex(s[0]), s[1], s[2].decode("utf-8", errors="replace")[:100]) for s in match.strings] if match.strings else [] }) if result["matches"]: result["classification"] = result["matches"][0]["namespace"].upper() results.append(result) # Summary classified = sum(1 for r in results if r["classification"] != "UNKNOWN") print(f"Scanned: {len(results)} samples") print(f"Classified: {classified} ({classified/len(results)*100:.1f}%)") print(f"Unknown: {len(results)-classified}") # Export results with open("triage_results.json", "w") as f: json.dump(results, f, indent=2)
Step 6: Validate and Optimize Rules
Test rules for false positives and performance:
# Test rule syntax yara -C custom_rules.yar # Scan known-clean directory to check false positives yara -r custom_rules.yar /path/to/clean_files/ > false_positives.txt wc -l false_positives.txt # Benchmark rule performance time yara -r custom_rules.yar /path/to/large_sample_collection/ # Profile individual rule performance yara -p custom_rules.yar suspect.exe
Key Concepts
| Term | Definition |
|---|---|
| YARA Rule | Pattern matching rule defining strings, byte sequences, and conditions that identify a specific file or malware family |
| Condition | Boolean expression combining string matches, file properties, and module functions to determine if a rule matches |
| Hex String | Byte pattern with optional wildcards (??) and jumps ([N-M]) for matching machine code or binary data |
| PE Module | YARA module providing access to PE file properties (imports, sections, timestamps, resources) for structural matching |
| Imphash | MD5 hash of a PE file's import table; samples from the same family often share import hashes |
| Rich Header | Undocumented PE structure containing compiler/linker metadata; consistent within malware build environments |
| YARA-C | Compiled YARA rule format enabling faster scanning by pre-compiling rules for repeated use |
Tools & Systems
- YARA: Pattern matching engine for identifying and classifying malware based on text, hex, and structural patterns
- yara-python: Python bindings for YARA enabling scripted scanning, rule compilation, and integration with analysis pipelines
- yarGen: Automatic YARA rule generator that creates rules from malware samples by identifying unique strings and opcodes
- YARA-Rules (GitHub): Community-maintained repository of YARA rules covering malware families, exploits, and suspicious indicators
- Malpedia YARA: Curated YARA rules from the Malpedia malware encyclopedia with high-quality family-specific rules
Common Scenarios
Scenario: Creating Detection Rules for a New Malware Family
Context: Reverse engineering of a new malware sample has identified unique strings, byte patterns, and PE characteristics. YARA rules are needed for enterprise-wide hunting and ongoing detection.
Approach:
- Extract unique strings from the unpacked binary (C2 URLs, mutex names, registry paths)
- Identify unique byte sequences from the encryption routine or C2 protocol (from Ghidra analysis)
- Record PE characteristics (imphash, Rich header hash, section names, compilation timestamp range)
- Write a YARA rule combining string, byte pattern, and PE module conditions
- Test against the known malware samples to confirm true positive detection
- Test against a clean file corpus (Windows system files, common applications) to verify zero false positives
- Deploy to enterprise scanning infrastructure and threat intelligence platform
Pitfalls:
- Writing rules too specific to a single sample (will not detect variants with minor changes)
- Writing rules too generic (matching legitimate software, causing false positives)
- Using strings that appear in common libraries or frameworks (e.g., OpenSSL strings)
- Not testing on a sufficiently large clean corpus before deployment
Output Format
YARA TRIAGE RESULTS ===================== Scan Date: 2025-09-15 Rule Sets: apt_rules (847 rules), ransomware_rules (312 rules), trojan_rules (1,204 rules), custom_rules (45 rules) Samples Scanned: 2,500 Processing Time: 47 seconds CLASSIFICATION SUMMARY APT: 12 samples (0.5%) Ransomware: 187 samples (7.5%) Trojan: 423 samples (16.9%) Unknown: 1,878 samples (75.1%) TOP MATCHING RULES Rule Matches Family MalwareX_C2_Beacon 45 MalwareX LockBit3_Ransom_Note 38 LockBit 3.0 Emotet_Epoch5_Loader 32 Emotet CobaltStrike_Beacon_Config 28 Cobalt Strike QakBot_DLL_Loader 25 QakBot SAMPLE DETAIL File: suspect.exe SHA-256: e3b0c44298fc1c149afbf4c8996fb924... Matches: [1] MalwareX_Strings (custom) - $url1 at 0x4A20: "/gate.php?id=" - $mutex at 0x5100: "Global\\CryptLocker_2025" [2] MalwareX_Decryptor (custom) - $xor_loop at 0x401200: { 8A 04 0E 32 04 0F ... } [3] MalwareX_PE_Characteristics (custom) - PE import combination matched Classification: MALWAREX (HIGH CONFIDENCE)