Awesome-omni-skill seatunnel-skill
Apache SeaTunnel - A multimodal, high-performance, distributed data integration tool for massive data synchronization across 100+ connectors
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/seatunnel-skill" ~/.claude/skills/diegosouzapw-awesome-omni-skill-seatunnel-skill && rm -rf "$T"
skills/development/seatunnel-skill/SKILL.mdApache SeaTunnel OpenCode Skill
Apache SeaTunnel is a multimodal, high-performance, distributed data integration tool capable of synchronizing vast amounts of data daily. It connects hundreds of evolving data sources with support for real-time, CDC (Change Data Capture), and full database synchronization.
Quick Start
Prerequisites
- Java: JDK 8 or higher
- Maven: 3.6.0 or higher (for building from source)
- Python: 3.7+ (optional, for Python API)
- Git: For cloning the repository
Installation
1. Clone Repository
git clone https://github.com/apache/seatunnel.git cd seatunnel
2. Build from Source
# Build entire project mvn clean install -DskipTests # Build specific module mvn clean install -pl seatunnel-core -DskipTests
3. Download Pre-built Binary (Recommended)
Visit the official download page and select your version:
VERSION=2.3.12 wget https://archive.apache.org/dist/seatunnel/${VERSION}/apache-seatunnel-${VERSION}-bin.tar.gz tar -xzf apache-seatunnel-${VERSION}-bin.tar.gz cd apache-seatunnel-${VERSION}
4. Basic Configuration
# Set JAVA_HOME export JAVA_HOME=/path/to/java # Add to PATH export PATH=$PATH:/path/to/seatunnel/bin
5. Verify Installation
seatunnel --version
Hello World Example
Create a simple data integration job:
config/hello_world.conf
env { job.mode = "BATCH" job.name = "Hello World" } source { FakeSource { row.num = 100 schema = { fields { id = "bigint" name = "string" age = "int" } } } } sink { Console { format = "json" } }
Run the job:
seatunnel.sh -c config/hello_world.conf -e spark # or with flink seatunnel.sh -c config/hello_world.conf -e flink
Overview
Purpose and Target Users
SeaTunnel is designed for:
- Data Engineers: Building large-scale data pipelines with minimal complexity
- DevOps Teams: Managing data integration infrastructure
- Enterprise Platforms: Handling 100+ billion data records daily
- Real-time Analytics: Supporting streaming data synchronization
- Legacy System Migration: CDC-based incremental sync from transactional databases
Core Capabilities
-
Multimodal Support
- Structured data (databases, data warehouses)
- Unstructured data (video, images, binaries)
- Semi-structured data (JSON, logs, binlog streams)
-
Multiple Synchronization Methods
- Batch: Full historical data transfer
- Streaming: Real-time data pipeline
- CDC: Incremental capture from databases
- Full + Incremental: Combined approach
-
100+ Pre-built Connectors
- Databases: MySQL, PostgreSQL, Oracle, SQL Server, MongoDB, DB2, OceanBase
- Data Warehouses: Snowflake, BigQuery, Redshift, Iceberg
- Data Lakes: Hive, Iceberg, Hudi, Paimon
- Cloud SaaS: Salesforce, Shopify, Google Sheets
- Message Queues: Kafka, RabbitMQ, Pulsar, RocketMQ, ActiveMQ
- Search Engines: Elasticsearch, OpenSearch, Easysearch
- OLAP Engines: ClickHouse, StarRocks, Doris, Druid
- Time-series Databases: IoTDB, TDengine, InfluxDB
- Vector Databases: Milvus, Qdrant
- Graph Databases: Neo4j
- Object Storage: S3, GCS, HDFS, OssFile, CosFile
-
Multi-Engine Support
- Zeta Engine: Lightweight, standalone deployment (no Spark/Flink required)
- Apache Flink: Distributed streaming engine
- Apache Spark: Distributed batch/batch-stream processing
Features
High-Performance
- Distributed Snapshot Algorithm: Ensures data consistency without locks
- JDBC Multiplexing: Minimizes database connections for real-time sync
- Log Parsing: Efficient CDC implementation with binary log analysis
- Resource Optimization: Reduces computing resources and I/O overhead
Data Quality & Reliability
- Real-time Monitoring: Track synchronization progress and data metrics
- Data Loss Prevention: Transactional guarantees (exactly-once semantics)
- Deduplication: Prevents duplicate records during reprocessing
- Error Handling: Graceful failure recovery and retry logic
Developer-Friendly
- SQL-like Configuration: Intuitive HOCON job definition syntax
- Visual Web UI: Drag-and-drop job builder (SeaTunnel Web Project)
- Extensive Documentation: Comprehensive guides with i18n (English + Chinese)
- Community Support: Active community via Slack and mailing lists
Production Ready
- Proven at Scale: Used in enterprises processing billions of records daily
- Version Stability: Regular releases with backward compatibility
- Enterprise Features: Multi-tenancy, RBAC, audit logging
- Cloud Native: Kubernetes-ready deployment
Installation
Installation Methods
Method 1: Binary Download (Recommended for Quick Start)
# 1. Download binary VERSION=2.3.12 wget https://archive.apache.org/dist/seatunnel/${VERSION}/apache-seatunnel-${VERSION}-bin.tar.gz # 2. Extract tar -xzf apache-seatunnel-${VERSION}-bin.tar.gz cd apache-seatunnel-${VERSION} # 3. Verify ./bin/seatunnel.sh --version
Method 2: Build from Source
# 1. Clone repository git clone https://github.com/apache/seatunnel.git cd seatunnel # 2. Build with Maven mvn clean install -DskipTests # 3. Navigate to distribution cd seatunnel-dist/target/apache-seatunnel-*-bin/apache-seatunnel-* # 4. Verify ./bin/seatunnel.sh --version
Method 3: Docker
# Pull official Docker image docker pull apache/seatunnel:latest # Run container docker run -it apache/seatunnel:latest /bin/bash # Or run a job directly docker run -v /path/to/config:/config \ apache/seatunnel:latest \ seatunnel.sh -c /config/job.conf -e spark
Environment Setup
Set Java Home
# Bash/Zsh export JAVA_HOME=/path/to/java export PATH=$JAVA_HOME/bin:$PATH # Verify java -version
Configure for Spark Engine
# Set Spark Home (if using Spark engine) export SPARK_HOME=/path/to/spark # Verify Spark installation $SPARK_HOME/bin/spark-submit --version
Configure for Flink Engine
# Set Flink Home (if using Flink engine) export FLINK_HOME=/path/to/flink # Verify Flink installation $FLINK_HOME/bin/flink --version
System Requirements
| Requirement | Version/Spec |
|---|---|
| Java | JDK 1.8+ |
| Memory | 2GB+ (minimum), 8GB+ (recommended) |
| Disk | 500MB (binary) + job storage |
| Network | Connectivity to source/sink systems |
| Scala | 2.12.15 (for Spark/Flink integration) |
Usage
1. Job Configuration (HOCON Format)
SeaTunnel uses HOCON (Human-Optimized Config Object Notation) for job configuration.
Basic Structure:
env { job.mode = "BATCH" # or STREAMING job.name = "My Job" parallelism = 4 } source { SourceConnector { option1 = value1 option2 = value2 schema = { fields { column1 = "type" column2 = "type" } } } } # Optional: Transform data transform { TransformName { option = value } } sink { SinkConnector { option1 = value1 option2 = value2 } }
2. Common Use Cases
Use Case 1: MySQL to PostgreSQL (Batch)
config/mysql_to_postgres.conf
env { job.mode = "BATCH" job.name = "MySQL to PostgreSQL" } source { Jdbc { driver = "com.mysql.cj.jdbc.Driver" url = "jdbc:mysql://mysql-host:3306/mydb" user = "root" password = "password" query = "SELECT * FROM users" connection_check_timeout_sec = 100 } } sink { Jdbc { driver = "org.postgresql.Driver" url = "jdbc:postgresql://pg-host:5432/mydb" user = "postgres" password = "password" database = "mydb" table = "users" primary_keys = ["id"] connection_check_timeout_sec = 100 } }
Run:
seatunnel.sh -c config/mysql_to_postgres.conf -e spark
Use Case 2: Kafka Streaming to Elasticsearch
config/kafka_to_es.conf
env { job.mode = "STREAMING" job.name = "Kafka to Elasticsearch" parallelism = 2 } source { Kafka { bootstrap.servers = "kafka-host:9092" topic = "events" patterns = "event.*" consumer.group = "seatunnel-group" format = "json" schema = { fields { event_id = "bigint" event_name = "string" timestamp = "bigint" payload = "string" } } } } transform { Sql { sql = "SELECT event_id, event_name, FROM_UNIXTIME(timestamp/1000) as ts, payload FROM source" } } sink { Elasticsearch { hosts = ["es-host:9200"] index = "events" index_type = "_doc" username = "elastic" password = "password" } }
Run:
seatunnel.sh -c config/kafka_to_es.conf -e flink
Use Case 3: CDC from MySQL to Kafka
config/mysql_cdc_kafka.conf
env { job.mode = "STREAMING" job.name = "MySQL CDC to Kafka" } source { Mysql { server_id = 5400 hostname = "mysql-host" port = 3306 username = "root" password = "password" database = ["mydb"] table = ["users", "orders"] startup.mode = "initial" snapshot.split.size = 8096 incremental.snapshot.chunk.size = 1024 snapshot_fetch_size = 1024 snapshot_lock_timeout_sec = 10 server_time_zone = "UTC" } } sink { Kafka { bootstrap.servers = "kafka-host:9092" topic = "mysql_cdc" format = "canal_json" semantic = "EXACTLY_ONCE" } }
Run:
seatunnel.sh -c config/mysql_cdc_kafka.conf -e flink
3. Running Jobs
Local Mode (Single Machine)
# Using Spark (default) seatunnel.sh -c config/job.conf -e spark # Using Flink seatunnel.sh -c config/job.conf -e flink # Using Zeta (lightweight) seatunnel.sh -c config/job.conf -e zeta
Cluster Mode
# Submit to Spark cluster seatunnel.sh -c config/job.conf -e spark -m cluster -n hadoop-master:7077 # Submit to Flink cluster seatunnel.sh -c config/job.conf -e flink -m remote -s localhost 8081
Verbose Output
seatunnel.sh -c config/job.conf -e spark -l DEBUG
Check Status
# View running jobs (Spark Cluster) spark-submit --status <driver-id> # View running jobs (Flink Cluster) $FLINK_HOME/bin/flink list
4. SQL API (Advanced)
Use SQL for complex transformations:
source { MySQL { # Source config... } } transform { Sql { # Multiple SQL statements sql = """ SELECT id, UPPER(name) as name, age + 10 as age_plus_10, CURRENT_TIMESTAMP as created_at FROM source WHERE age > 18 """ } } sink { PostgreSQL { # Sink config... } }
API Reference
Core Connector Types
Source Connectors
- Jdbc: Generic JDBC databases (MySQL, PostgreSQL, Oracle, SQL Server, DB2, OceanBase)
- Kafka: Apache Kafka topics
- Mysql: MySQL with CDC support
- MongoDB: MongoDB collections
- PostgreSQL: PostgreSQL with CDC
- S3/OssFile/CosFile/HdfsFile: Object storage / file systems
- Http: HTTP/HTTPS endpoints
- Hive/Iceberg/Hudi/Paimon: Data lake formats
- Elasticsearch/Easysearch: Search engines
- Redis/HBase/Cassandra: NoSQL databases
- Pulsar/RocketMQ/RabbitMQ/ActiveMQ: Message queues
- ClickHouse/StarRocks/Doris/Druid: OLAP engines
- IoTDB/TDengine/InfluxDB: Time-series databases
- Milvus/Qdrant: Vector databases
- Neo4j: Graph databases
- FakeSource: For testing and development
Transform Connectors
- Sql: Execute SQL transformations
- Dummy: Pass-through (testing)
- FieldMapper: Rename/map columns
- JsonPath: Extract data from JSON
Sink Connectors
- Jdbc: Write to JDBC-compatible databases
- Kafka: Publish to Kafka topics
- Elasticsearch: Write to Elasticsearch indices
- S3: Write to S3 buckets
- Redis: Write to Redis
- HBase: Write to HBase tables
- StarRocks: Write to StarRocks tables
- ClickHouse/Doris/Druid: Write to OLAP engines
- Hive/Iceberg/Hudi/Paimon: Write to data lakes
- Console: Output to console (testing)
All source connectors above also have corresponding sink implementations where applicable.
Configuration Options
Common Source Options
source { ConnectorName { # Connection hostname = "host" port = 3306 username = "user" password = "pass" # Data selection database = "db_name" table = "table_name" # Performance fetch_size = 1000 connection_check_timeout_sec = 100 split_size = 10000 # Schema schema = { fields { id = "bigint" name = "string" } } } }
Common Sink Options
sink { ConnectorName { # Connection hostname = "host" port = 3306 username = "user" password = "pass" # Write behavior database = "db_name" table = "table_name" primary_keys = ["id"] batch_size = 500 # Error handling max_retries = 3 retry_wait_time_ms = 1000 on_duplicate_key_update_column_names = ["field1"] } }
Engine Options
env { # Execution mode job.mode = "BATCH" # or STREAMING # Job identity job.name = "My Job" # Parallelism parallelism = 4 # Checkpoint (streaming) checkpoint.interval = 60000 # Restart strategy restart_strategy = "fixed_delay" restart_strategy.fixed_delay.attempts = 3 restart_strategy.fixed_delay.delay = 10000 }
Debugging and Monitoring
View Job Metrics
# During execution, monitor logs tail -f logs/seatunnel.log # Check specific log level grep ERROR logs/seatunnel.log
Enable Debug Logging
# Set log level to DEBUG seatunnel.sh -c config/job.conf -e spark -l DEBUG
Use Test Sources
source { FakeSource { row.num = 1000 schema = { fields { id = "bigint" name = "string" } } } } sink { Console { format = "json" } }
Configuration
Project Structure
seatunnel/ ├── bin/ # Executable scripts │ ├── seatunnel.sh # Main entry point │ └── seatunnel-submit.sh # Spark submission script ├── config/ # Configuration examples │ ├── flink-conf.yaml # Flink configuration │ └── spark-conf.yaml # Spark configuration ├── connectors/ # Pre-built connectors ├── lib/ # JAR dependencies ├── logs/ # Runtime logs └── plugin/ # Plugin directory
Environment Variables
# Java configuration export JAVA_HOME=/path/to/java export JVM_OPTS="-Xms1G -Xmx4G" # Spark configuration export SPARK_HOME=/path/to/spark export SPARK_MASTER=spark://master:7077 # Flink configuration export FLINK_HOME=/path/to/flink # SeaTunnel configuration export SEATUNNEL_HOME=/path/to/seatunnel
Key Configuration Files
seatunnel-env.sh
seatunnel-env.shConfigure runtime environment:
JAVA_HOME=/path/to/java JVM_OPTS="-Xms1G -Xmx4G -XX:+UseG1GC" SPARK_HOME=/path/to/spark FLINK_HOME=/path/to/flink
flink-conf.yaml
(for Flink engine)
flink-conf.yamltaskmanager.memory.process.size: 2g taskmanager.memory.jvm-overhead.fraction: 0.1 parallelism.default: 4
spark-conf.yaml
(for Spark engine)
spark-conf.yamldriver-memory: 2g executor-memory: 4g num-executors: 4
Performance Tuning
For Batch Jobs
env { job.mode = "BATCH" parallelism = 8 # Increase for larger clusters } source { Jdbc { # Split large tables for parallel reads split_size = 100000 fetch_size = 5000 } } sink { Jdbc { # Batch inserts for better throughput batch_size = 1000 # Use connection pooling max_retries = 3 } }
For Streaming Jobs
env { job.mode = "STREAMING" parallelism = 4 checkpoint.interval = 30000 # 30 seconds } source { Kafka { # Consumer group for parallel reads consumer.group = "seatunnel-consumer" # Batch reading max_poll_records = 500 } }
Website & Documentation System
The official documentation site (https://seatunnel.apache.org) is managed through the
apache/seatunnel-website repository, built with Docusaurus 2.4.3.
How Documentation Works
Documentation content lives in the main codebase (
apache/seatunnel repo, docs/ directory). The website repo fetches docs via npm run sync (or tools/build-docs.js) before building.
The sync process:
- Creates
directory, clonesswap/
codebaseapache/seatunnel - Copies
to website'sdocs/en
directorydocs/ - Copies
todocs/zh
for Chinese translationsi18n/zh-CN/ - Copies
todocs/images
andstatic/image_enstatic/image_zh - Copies
to root leveldocs/sidebars.js
Website Repository Structure
seatunnel-website/ ├── blog/ # Blog posts (EN) ├── community/ # Community docs │ ├── contribution_guide/ # Contribution guidelines │ │ ├── contribute.md │ │ ├── committer.md │ │ ├── code-review.md │ │ ├── release.md │ │ └── subscribe.md │ └── submit_guide/ # Submission guidelines │ ├── document.md │ ├── license.md │ └── submit-code.md ├── docs/ # Current version docs (synced from main repo) ├── docusaurus.config.js # Site configuration ├── i18n/zh-CN/ # Chinese translations │ ├── docusaurus-plugin-content-blog/ │ ├── docusaurus-plugin-content-docs/ │ └── docusaurus-plugin-content-docs-community/ ├── package.json ├── sidebars.js # Doc sidebar config (synced) ├── src/ │ ├── components/ │ ├── css/ │ ├── pages/ │ │ ├── home/ # Homepage │ │ ├── team/ # Team page (with Base64 avatars) │ │ ├── user/ # User showcase │ │ └── versions/ # Version selector │ └── styles/ ├── static/ # Static assets │ ├── doc/image/, image_en/, image_zh/ │ ├── home/, image/, user/ │ └── js/google_translate_init.js ├── tools/ │ ├── build-docs.js # Doc sync script │ ├── common.js # Shared constants │ ├── version.js # Version management │ ├── image-copy.js # Image processing │ └── fetch-team-avatars.js # Team avatar updater ├── versioned_docs/ # Historical version docs (20 versions: 1.x ~ 2.3.12) ├── versioned_sidebars/ # Historical sidebars ├── versions.json # Version registry └── seatunnel_web_versions.json # Web UI version registry
Documentation Categories (per version)
Each version's docs contain:
- Project overviewabout.md
- CLI commands (e.g., connector-check)command/
- Core concepts (config, schema-evolution, speed-limit, SQL config, event-listener)concept/
- Connector documentationconnector-v2/
- Source connector docs (100+ connectors)source/
- Sink connector docssink/
- Per-connector changelogschangelog/
- Data formats (Avro, Canal JSON, Debezium JSON, OGG JSON, Protobuf)formats/
- Contributor guidescontribution/
- Frequently asked questionsfaq.md
- Flink/Spark engine specificsother-engine/
- Zeta engine docs + telemetryseatunnel-engine/
- Getting started guidesstart-v2/
- Docker deploymentdocker/
- K8s deploymentkubernetes/
- Local setuplocally/
- Transform documentation (SQL, etc.)transform-v2/
Versioned Documentation
20 historical versions maintained:
1.x, 2.1.02.1.3, 2.2.0-beta, 2.3.0-beta2.3.12
Total: ~3691 versioned doc files + ~1652 i18n files
Website Development
# Clone git clone git@github.com:apache/seatunnel-website.git cd seatunnel-website # Sync docs from main repo npm run sync # Or with SSH: export PROTOCOL_MODE=ssh && npm run sync # Install dependencies npm install # Dev server (English) npm run start # Dev server (Chinese) npm run start-zh # Production build (needs ~10GB heap) npm run build # Or faster parallel build npm run build:fast # Serve built site npm run serve
Adding a New Version
# 1. Create versioned snapshot npm run version <target_version> # 2. Update download links # Edit src/pages/download/st_data.json
Branching Strategy
| Branch | Purpose |
|---|---|
| Default development branch |
| Production (https://seatunnel.apache.org) |
| Staging (https://seatunnel.staged.apache.org) |
CI/CD
GitHub Actions workflow (
.github/workflows/deploy.yml):
- Triggers: push to
, PRs tomain
, daily cron (5:00 AM UTC)main - Node.js 18.20.7
- Steps:
->npm install
->npm run sync
-> deploy tonpm run build
branchasf-site
Key Conventions
- Directory names: lowercase, underscore-separated, plural (e.g.,
,scripts
)components - JS/static files: lowercase, dash-separated (e.g.,
)render-dom.js - Images: stored under
static/{module_name} - Styles: placed in
src/css/ - Most pages have "Edit this page" link pointing to GitHub source
Development
Project Architecture
SeaTunnel follows a modular architecture:
seatunnel/ ├── seatunnel-api/ # Core APIs ├── seatunnel-core/ # Execution engine ├── seatunnel-engines/ # Engine implementations │ ├── seatunnel-engine-flink/ │ ├── seatunnel-engine-spark/ │ └── seatunnel-engine-zeta/ ├── seatunnel-connectors/ # 100+ connector implementations │ ├── seatunnel-connectors-*/ # One per connector type └── seatunnel-dist/ # Distribution package
Building from Source
Full Build
# Clone repository git clone https://github.com/apache/seatunnel.git cd seatunnel # Build all modules mvn clean install -DskipTests # Build with tests (slower) mvn clean install
Build Specific Module
# Build only Kafka connector mvn clean install -pl seatunnel-connectors/seatunnel-connectors-seatunnel-kafka -DskipTests # Build only Flink engine mvn clean install -pl seatunnel-engines/seatunnel-engine-flink -DskipTests
Build Distribution
# Create binary distribution mvn clean install -DskipTests cd seatunnel-dist tar -tzf target/apache-seatunnel-*-bin.tar.gz | head
Running Tests
Unit Tests
# Run all tests mvn test # Run specific test class mvn test -Dtest=MySqlConnectorTest # Run with coverage mvn test jacoco:report
Integration Tests
# Run integration tests mvn verify # Run with Docker containers mvn verify -Pintegration-tests
Development Setup
IDE Configuration (IntelliJ IDEA)
-
Import Project
- File → Open → Select seatunnel directory
- Choose Maven as build system
-
Configure JDK
- Project Settings → Project → JDK
- Select JDK 1.8 or higher
-
Enable Annotation Processing
- Project Settings → Build → Compiler → Annotation Processors
- Enable annotation processing
-
Run Configuration
- Run → Edit Configurations
- Add new "Application" configuration
- Set main class:
org.apache.seatunnel.core.starter.command.CommandExecuteRunner
Code Style
SeaTunnel follows Apache project conventions:
- 4-space indentation (not tabs)
- Line length max 120 characters
- Standard Java naming conventions
- Organize imports alphabetically
Use the provided
.editorconfig:
# Install EditorConfig plugin (IntelliJ) # Then your IDE will auto-format code
Creating Custom Connectors
1. Extend SeConnector Interface
import org.apache.seatunnel.api.source.SeSource; import org.apache.seatunnel.api.table.catalog.Table; public class CustomSource extends SeSource { @Override public String getPluginName() { return "Custom"; } @Override public void validate() { // Validation logic } @Override public ResultSet read(Boundedness boundedness) { // Implementation } }
2. Create Configuration Class
import org.apache.seatunnel.api.configuration.Option; import org.apache.seatunnel.api.configuration.Options; public class CustomSourceOptions { public static final Option<String> HOST = Options.key("host") .stringType() .noDefaultValue() .withDescription("Source hostname"); public static final Option<Integer> PORT = Options.key("port") .intType() .defaultValue(9200) .withDescription("Source port"); }
3. Register Connector
META-INF/services/org.apache.seatunnel.api.source.SeSource
Contributing Guide
-
Fork and Clone
git clone https://github.com/YOUR_USERNAME/seatunnel.git cd seatunnel git remote add upstream https://github.com/apache/seatunnel.git -
Create Feature Branch
git checkout -b feature/my-feature -
Make Changes
- Follow code style guide
- Add tests for new features
- Update documentation
-
Commit and Push
git add . git commit -m "feat: add new feature" git push origin feature/my-feature -
Create Pull Request
- Go to GitHub repository
- Create PR with clear description
- Link any related issues
- Wait for review
-
Code Review Process
- Address feedback from maintainers
- Update PR with changes
- After approval, maintainers will merge
Troubleshooting
Common Issues and Solutions
Issue 1: "ClassNotFoundException: com.mysql.jdbc.Driver"
Cause: JDBC driver JAR not in classpath
Solution:
# 1. Download MySQL JDBC driver wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.33.jar # 2. Copy to lib directory cp mysql-connector-java-8.0.33.jar $SEATUNNEL_HOME/lib/ # 3. Restart job seatunnel.sh -c config/job.conf -e spark
Issue 2: "OutOfMemoryError: Java heap space"
Cause: Insufficient JVM heap memory
Solution:
# Increase JVM memory export JVM_OPTS="-Xms2G -Xmx8G" # Or set in seatunnel-env.sh echo 'JVM_OPTS="-Xms2G -Xmx8G"' >> $SEATUNNEL_HOME/bin/seatunnel-env.sh
Issue 3: "Connection refused: connect"
Cause: Source/sink service not reachable
Solution:
# 1. Verify connectivity ping source-host telnet source-host 3306 # 2. Check credentials mysql -h source-host -u root -p # 3. Check firewall rules # Ensure port 3306 is open
Issue 4: "Table not found" during CDC
Cause: Binlog not enabled on MySQL
Solution:
-- Check binlog status SHOW VARIABLES LIKE 'log_bin'; -- Enable binlog in my.cnf [mysqld] log_bin = mysql-bin binlog_format = row # Important for CDC -- Restart MySQL and verify SHOW MASTER STATUS;
Issue 5: Slow Job Performance
Cause: Suboptimal configuration
Solutions:
# 1. Increase parallelism env { parallelism = 8 # or higher based on cluster } # 2. Increase batch sizes source { Jdbc { fetch_size = 5000 split_size = 100000 } } sink { Jdbc { batch_size = 2000 } } # 3. Enable connection pooling source { Jdbc { pool_size = 10 max_idle_time = 300 } }
Issue 6: "Kafka topic offset out of range"
Cause: Offset doesn't exist in topic
Solution:
source { Kafka { # Reset to earliest or latest auto.offset.reset = "earliest" # or "latest" # Or specify explicit offsets start_mode = "earliest" } }
FAQ
Q: What's the difference between BATCH and STREAMING mode?
A:
- BATCH: One-time execution, suitable for full database migration
- STREAMING: Continuous execution, suitable for real-time data sync and CDC
Q: How do I handle schema changes during CDC?
A: SeaTunnel automatically detects schema changes in CDC mode. Configure in source:
source { Mysql { schema_change_mode = "auto" # auto-detect and apply } }
Q: Can I transform data during synchronization?
A: Yes, use SQL transform:
transform { Sql { sql = "SELECT id, UPPER(name) as name FROM source" } }
Q: What's the maximum throughput?
A: Depends on:
- Hardware (CPU, RAM, Network)
- Source/sink database configuration
- Data size per record
- Network latency
Typical throughput: 100K - 1M records/second per executor
Q: How do I handle errors in production?
A: Configure restart strategy:
env { restart_strategy = "exponential_delay" restart_strategy.exponential_delay.initial_delay = 1000 restart_strategy.exponential_delay.max_delay = 30000 restart_strategy.exponential_delay.multiplier = 2.0 restart_strategy.exponential_delay.attempts_unlimited = true }
Q: Is there a web UI for job management?
A: Yes! Use SeaTunnel Web Project:
# Check out web UI project git clone https://github.com/apache/seatunnel-web.git cd seatunnel-web mvn clean install # Run web UI java -jar target/seatunnel-web-*.jar # Access at http://localhost:8080
Resources
Official Links
- SeaTunnel Official Website
- GitHub (Engine)
- GitHub (Website)
- GitHub (Web UI)
- Documentation Hub
- Connector List
- Downloads
Community
Related Projects
Learning Resources
- HOCON Configuration Guide
- SQL Functions Reference
- CDC Pattern Explained
- Distributed Systems Concepts
Version History
- 2.3.12 (Latest Stable) - Current recommended version
- 2.3.13-SNAPSHOT (Development)
- 20 historical versions maintained in documentation (1.x ~ 2.3.12)
- All Releases
Additional Notes
License
Apache License 2.0 - See LICENSE file
Security
- Report security issues via Apache Security
- Do NOT create public issues for security vulnerabilities
Support & Contribution
- Join the community Slack for support
- Submit feature requests on GitHub Issues
- Contribute code via Pull Requests
- Follow Contributing Guide
Last Updated: 2026-02-26 Skill Version: 2.3.13 Sources: apache/seatunnel + apache/seatunnel-website