Awesome-omni-skill seatunnel-skill

Apache SeaTunnel - A multimodal, high-performance, distributed data integration tool for massive data synchronization across 100+ connectors

install

source · Clone the upstream repo

git clone https://github.com/diegosouzapw/awesome-omni-skill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/seatunnel-skill" ~/.claude/skills/diegosouzapw-awesome-omni-skill-seatunnel-skill && rm -rf "$T"

manifest: skills/development/seatunnel-skill/SKILL.md

Apache SeaTunnel OpenCode Skill

Apache SeaTunnel is a multimodal, high-performance, distributed data integration tool capable of synchronizing vast amounts of data daily. It connects hundreds of evolving data sources with support for real-time, CDC (Change Data Capture), and full database synchronization.

Quick Start

Prerequisites

Java: JDK 8 or higher
Maven: 3.6.0 or higher (for building from source)
Python: 3.7+ (optional, for Python API)
Git: For cloning the repository

Installation

1. Clone Repository

git clone https://github.com/apache/seatunnel.git
cd seatunnel

2. Build from Source

# Build entire project
mvn clean install -DskipTests

# Build specific module
mvn clean install -pl seatunnel-core -DskipTests

3. Download Pre-built Binary (Recommended)

Visit the official download page and select your version:

VERSION=2.3.12
wget https://archive.apache.org/dist/seatunnel/${VERSION}/apache-seatunnel-${VERSION}-bin.tar.gz
tar -xzf apache-seatunnel-${VERSION}-bin.tar.gz
cd apache-seatunnel-${VERSION}

4. Basic Configuration

# Set JAVA_HOME
export JAVA_HOME=/path/to/java

# Add to PATH
export PATH=$PATH:/path/to/seatunnel/bin

5. Verify Installation

seatunnel --version

Hello World Example

Create a simple data integration job:

config/hello_world.conf

env {
  job.mode = "BATCH"
  job.name = "Hello World"
}

source {
  FakeSource {
    row.num = 100
    schema = {
      fields {
        id = "bigint"
        name = "string"
        age = "int"
      }
    }
  }
}

sink {
  Console {
    format = "json"
  }
}

Run the job:

seatunnel.sh -c config/hello_world.conf -e spark
# or with flink
seatunnel.sh -c config/hello_world.conf -e flink

Overview

Purpose and Target Users

SeaTunnel is designed for:

Data Engineers: Building large-scale data pipelines with minimal complexity
DevOps Teams: Managing data integration infrastructure
Enterprise Platforms: Handling 100+ billion data records daily
Real-time Analytics: Supporting streaming data synchronization
Legacy System Migration: CDC-based incremental sync from transactional databases

Core Capabilities

Multimodal Support
- Structured data (databases, data warehouses)
- Unstructured data (video, images, binaries)
- Semi-structured data (JSON, logs, binlog streams)
Multiple Synchronization Methods
- Batch: Full historical data transfer
- Streaming: Real-time data pipeline
- CDC: Incremental capture from databases
- Full + Incremental: Combined approach
100+ Pre-built Connectors
- Databases: MySQL, PostgreSQL, Oracle, SQL Server, MongoDB, DB2, OceanBase
- Data Warehouses: Snowflake, BigQuery, Redshift, Iceberg
- Data Lakes: Hive, Iceberg, Hudi, Paimon
- Cloud SaaS: Salesforce, Shopify, Google Sheets
- Message Queues: Kafka, RabbitMQ, Pulsar, RocketMQ, ActiveMQ
- Search Engines: Elasticsearch, OpenSearch, Easysearch
- OLAP Engines: ClickHouse, StarRocks, Doris, Druid
- Time-series Databases: IoTDB, TDengine, InfluxDB
- Vector Databases: Milvus, Qdrant
- Graph Databases: Neo4j
- Object Storage: S3, GCS, HDFS, OssFile, CosFile
Multi-Engine Support
- Zeta Engine: Lightweight, standalone deployment (no Spark/Flink required)
- Apache Flink: Distributed streaming engine
- Apache Spark: Distributed batch/batch-stream processing

Features

High-Performance

Distributed Snapshot Algorithm: Ensures data consistency without locks
JDBC Multiplexing: Minimizes database connections for real-time sync
Log Parsing: Efficient CDC implementation with binary log analysis
Resource Optimization: Reduces computing resources and I/O overhead

Data Quality & Reliability

Real-time Monitoring: Track synchronization progress and data metrics
Data Loss Prevention: Transactional guarantees (exactly-once semantics)
Deduplication: Prevents duplicate records during reprocessing
Error Handling: Graceful failure recovery and retry logic

Developer-Friendly

SQL-like Configuration: Intuitive HOCON job definition syntax
Visual Web UI: Drag-and-drop job builder (SeaTunnel Web Project)
Extensive Documentation: Comprehensive guides with i18n (English + Chinese)
Community Support: Active community via Slack and mailing lists

Production Ready

Proven at Scale: Used in enterprises processing billions of records daily
Version Stability: Regular releases with backward compatibility
Enterprise Features: Multi-tenancy, RBAC, audit logging
Cloud Native: Kubernetes-ready deployment

Installation

Installation Methods

Method 1: Binary Download (Recommended for Quick Start)

# 1. Download binary
VERSION=2.3.12
wget https://archive.apache.org/dist/seatunnel/${VERSION}/apache-seatunnel-${VERSION}-bin.tar.gz

# 2. Extract
tar -xzf apache-seatunnel-${VERSION}-bin.tar.gz
cd apache-seatunnel-${VERSION}

# 3. Verify
./bin/seatunnel.sh --version

Method 2: Build from Source

# 1. Clone repository
git clone https://github.com/apache/seatunnel.git
cd seatunnel

# 2. Build with Maven
mvn clean install -DskipTests

# 3. Navigate to distribution
cd seatunnel-dist/target/apache-seatunnel-*-bin/apache-seatunnel-*

# 4. Verify
./bin/seatunnel.sh --version

Method 3: Docker

# Pull official Docker image
docker pull apache/seatunnel:latest

# Run container
docker run -it apache/seatunnel:latest /bin/bash

# Or run a job directly
docker run -v /path/to/config:/config \
  apache/seatunnel:latest \
  seatunnel.sh -c /config/job.conf -e spark

Environment Setup

Set Java Home

# Bash/Zsh
export JAVA_HOME=/path/to/java
export PATH=$JAVA_HOME/bin:$PATH

# Verify
java -version

Configure for Spark Engine

# Set Spark Home (if using Spark engine)
export SPARK_HOME=/path/to/spark

# Verify Spark installation
$SPARK_HOME/bin/spark-submit --version

Configure for Flink Engine

# Set Flink Home (if using Flink engine)
export FLINK_HOME=/path/to/flink

# Verify Flink installation
$FLINK_HOME/bin/flink --version

System Requirements

Requirement	Version/Spec
Java	JDK 1.8+
Memory	2GB+ (minimum), 8GB+ (recommended)
Disk	500MB (binary) + job storage
Network	Connectivity to source/sink systems
Scala	2.12.15 (for Spark/Flink integration)

Usage

1. Job Configuration (HOCON Format)

SeaTunnel uses HOCON (Human-Optimized Config Object Notation) for job configuration.

Basic Structure:

env {
  job.mode = "BATCH"  # or STREAMING
  job.name = "My Job"
  parallelism = 4
}

source {
  SourceConnector {
    option1 = value1
    option2 = value2
    schema = {
      fields {
        column1 = "type"
        column2 = "type"
      }
    }
  }
}

# Optional: Transform data
transform {
  TransformName {
    option = value
  }
}

sink {
  SinkConnector {
    option1 = value1
    option2 = value2
  }
}

2. Common Use Cases

Use Case 1: MySQL to PostgreSQL (Batch)

config/mysql_to_postgres.conf

env {
  job.mode = "BATCH"
  job.name = "MySQL to PostgreSQL"
}

source {
  Jdbc {
    driver = "com.mysql.cj.jdbc.Driver"
    url = "jdbc:mysql://mysql-host:3306/mydb"
    user = "root"
    password = "password"
    query = "SELECT * FROM users"
    connection_check_timeout_sec = 100
  }
}

sink {
  Jdbc {
    driver = "org.postgresql.Driver"
    url = "jdbc:postgresql://pg-host:5432/mydb"
    user = "postgres"
    password = "password"
    database = "mydb"
    table = "users"
    primary_keys = ["id"]
    connection_check_timeout_sec = 100
  }
}

Run:

seatunnel.sh -c config/mysql_to_postgres.conf -e spark

Use Case 2: Kafka Streaming to Elasticsearch

config/kafka_to_es.conf

env {
  job.mode = "STREAMING"
  job.name = "Kafka to Elasticsearch"
  parallelism = 2
}

source {
  Kafka {
    bootstrap.servers = "kafka-host:9092"
    topic = "events"
    patterns = "event.*"
    consumer.group = "seatunnel-group"
    format = "json"
    schema = {
      fields {
        event_id = "bigint"
        event_name = "string"
        timestamp = "bigint"
        payload = "string"
      }
    }
  }
}

transform {
  Sql {
    sql = "SELECT event_id, event_name, FROM_UNIXTIME(timestamp/1000) as ts, payload FROM source"
  }
}

sink {
  Elasticsearch {
    hosts = ["es-host:9200"]
    index = "events"
    index_type = "_doc"
    username = "elastic"
    password = "password"
  }
}

Run:

seatunnel.sh -c config/kafka_to_es.conf -e flink

Use Case 3: CDC from MySQL to Kafka

config/mysql_cdc_kafka.conf

env {
  job.mode = "STREAMING"
  job.name = "MySQL CDC to Kafka"
}

source {
  Mysql {
    server_id = 5400
    hostname = "mysql-host"
    port = 3306
    username = "root"
    password = "password"
    database = ["mydb"]
    table = ["users", "orders"]
    startup.mode = "initial"
    snapshot.split.size = 8096
    incremental.snapshot.chunk.size = 1024
    snapshot_fetch_size = 1024
    snapshot_lock_timeout_sec = 10
    server_time_zone = "UTC"
  }
}

sink {
  Kafka {
    bootstrap.servers = "kafka-host:9092"
    topic = "mysql_cdc"
    format = "canal_json"
    semantic = "EXACTLY_ONCE"
  }
}

Run:

seatunnel.sh -c config/mysql_cdc_kafka.conf -e flink

3. Running Jobs

Local Mode (Single Machine)

# Using Spark (default)
seatunnel.sh -c config/job.conf -e spark

# Using Flink
seatunnel.sh -c config/job.conf -e flink

# Using Zeta (lightweight)
seatunnel.sh -c config/job.conf -e zeta

Cluster Mode

# Submit to Spark cluster
seatunnel.sh -c config/job.conf -e spark -m cluster -n hadoop-master:7077

# Submit to Flink cluster
seatunnel.sh -c config/job.conf -e flink -m remote -s localhost 8081

Verbose Output

seatunnel.sh -c config/job.conf -e spark -l DEBUG

Check Status

# View running jobs (Spark Cluster)
spark-submit --status <driver-id>

# View running jobs (Flink Cluster)
$FLINK_HOME/bin/flink list

4. SQL API (Advanced)

Use SQL for complex transformations:

source {
  MySQL {
    # Source config...
  }
}

transform {
  Sql {
    # Multiple SQL statements
    sql = """
      SELECT
        id,
        UPPER(name) as name,
        age + 10 as age_plus_10,
        CURRENT_TIMESTAMP as created_at
      FROM source
      WHERE age > 18
    """
  }
}

sink {
  PostgreSQL {
    # Sink config...
  }
}

API Reference

Core Connector Types

Source Connectors

Jdbc: Generic JDBC databases (MySQL, PostgreSQL, Oracle, SQL Server, DB2, OceanBase)
Kafka: Apache Kafka topics
Mysql: MySQL with CDC support
MongoDB: MongoDB collections
PostgreSQL: PostgreSQL with CDC
S3/OssFile/CosFile/HdfsFile: Object storage / file systems
Http: HTTP/HTTPS endpoints
Hive/Iceberg/Hudi/Paimon: Data lake formats
Elasticsearch/Easysearch: Search engines
Redis/HBase/Cassandra: NoSQL databases
Pulsar/RocketMQ/RabbitMQ/ActiveMQ: Message queues
ClickHouse/StarRocks/Doris/Druid: OLAP engines
IoTDB/TDengine/InfluxDB: Time-series databases
Milvus/Qdrant: Vector databases
Neo4j: Graph databases
FakeSource: For testing and development

Transform Connectors

Sql: Execute SQL transformations
Dummy: Pass-through (testing)
FieldMapper: Rename/map columns
JsonPath: Extract data from JSON

Sink Connectors

Jdbc: Write to JDBC-compatible databases
Kafka: Publish to Kafka topics
Elasticsearch: Write to Elasticsearch indices
S3: Write to S3 buckets
Redis: Write to Redis
HBase: Write to HBase tables
StarRocks: Write to StarRocks tables
ClickHouse/Doris/Druid: Write to OLAP engines
Hive/Iceberg/Hudi/Paimon: Write to data lakes
Console: Output to console (testing)

All source connectors above also have corresponding sink implementations where applicable.

Configuration Options

Common Source Options

source {
  ConnectorName {
    # Connection
    hostname = "host"
    port = 3306
    username = "user"
    password = "pass"

    # Data selection
    database = "db_name"
    table = "table_name"

    # Performance
    fetch_size = 1000
    connection_check_timeout_sec = 100
    split_size = 10000

    # Schema
    schema = {
      fields {
        id = "bigint"
        name = "string"
      }
    }
  }
}

Common Sink Options

sink {
  ConnectorName {
    # Connection
    hostname = "host"
    port = 3306
    username = "user"
    password = "pass"

    # Write behavior
    database = "db_name"
    table = "table_name"
    primary_keys = ["id"]
    batch_size = 500

    # Error handling
    max_retries = 3
    retry_wait_time_ms = 1000
    on_duplicate_key_update_column_names = ["field1"]
  }
}

Engine Options

env {
  # Execution mode
  job.mode = "BATCH"  # or STREAMING

  # Job identity
  job.name = "My Job"

  # Parallelism
  parallelism = 4

  # Checkpoint (streaming)
  checkpoint.interval = 60000

  # Restart strategy
  restart_strategy = "fixed_delay"
  restart_strategy.fixed_delay.attempts = 3
  restart_strategy.fixed_delay.delay = 10000
}

Debugging and Monitoring

View Job Metrics

# During execution, monitor logs
tail -f logs/seatunnel.log

# Check specific log level
grep ERROR logs/seatunnel.log

Enable Debug Logging

# Set log level to DEBUG
seatunnel.sh -c config/job.conf -e spark -l DEBUG

Use Test Sources

source {
  FakeSource {
    row.num = 1000
    schema = {
      fields {
        id = "bigint"
        name = "string"
      }
    }
  }
}

sink {
  Console {
    format = "json"
  }
}

Configuration

Project Structure

seatunnel/
├── bin/                          # Executable scripts
│   ├── seatunnel.sh             # Main entry point
│   └── seatunnel-submit.sh      # Spark submission script
├── config/                       # Configuration examples
│   ├── flink-conf.yaml          # Flink configuration
│   └── spark-conf.yaml          # Spark configuration
├── connectors/                  # Pre-built connectors
├── lib/                         # JAR dependencies
├── logs/                        # Runtime logs
└── plugin/                      # Plugin directory

Environment Variables

# Java configuration
export JAVA_HOME=/path/to/java
export JVM_OPTS="-Xms1G -Xmx4G"

# Spark configuration
export SPARK_HOME=/path/to/spark
export SPARK_MASTER=spark://master:7077

# Flink configuration
export FLINK_HOME=/path/to/flink

# SeaTunnel configuration
export SEATUNNEL_HOME=/path/to/seatunnel

Key Configuration Files

seatunnel-env.sh

Configure runtime environment:

JAVA_HOME=/path/to/java
JVM_OPTS="-Xms1G -Xmx4G -XX:+UseG1GC"
SPARK_HOME=/path/to/spark
FLINK_HOME=/path/to/flink

flink-conf.yaml

(for Flink engine)

taskmanager.memory.process.size: 2g
taskmanager.memory.jvm-overhead.fraction: 0.1
parallelism.default: 4

spark-conf.yaml

(for Spark engine)

driver-memory: 2g
executor-memory: 4g
num-executors: 4

Performance Tuning

For Batch Jobs

env {
  job.mode = "BATCH"
  parallelism = 8  # Increase for larger clusters
}

source {
  Jdbc {
    # Split large tables for parallel reads
    split_size = 100000
    fetch_size = 5000
  }
}

sink {
  Jdbc {
    # Batch inserts for better throughput
    batch_size = 1000
    # Use connection pooling
    max_retries = 3
  }
}

For Streaming Jobs

env {
  job.mode = "STREAMING"
  parallelism = 4
  checkpoint.interval = 30000  # 30 seconds
}

source {
  Kafka {
    # Consumer group for parallel reads
    consumer.group = "seatunnel-consumer"
    # Batch reading
    max_poll_records = 500
  }
}

Website & Documentation System

The official documentation site (https://seatunnel.apache.org) is managed through the

apache/seatunnel-website

repository, built with Docusaurus 2.4.3.

How Documentation Works

Documentation content lives in the main codebase (

apache/seatunnel

repo,

docs/

directory). The website repo fetches docs via

npm run sync

(or

tools/build-docs.js

) before building.

The sync process:

Creates
```
swap/
```
directory, clones
```
apache/seatunnel
```
codebase
Copies
```
docs/en
```
to website's
```
docs/
```
directory
Copies
```
docs/zh
```
to
```
i18n/zh-CN/
```
for Chinese translations

Copies

docs/images

static/image_en

and

static/image_zh

Copies
```
docs/sidebars.js
```
to root level

Website Repository Structure

seatunnel-website/
├── blog/                          # Blog posts (EN)
├── community/                     # Community docs
│   ├── contribution_guide/        # Contribution guidelines
│   │   ├── contribute.md
│   │   ├── committer.md
│   │   ├── code-review.md
│   │   ├── release.md
│   │   └── subscribe.md
│   └── submit_guide/             # Submission guidelines
│       ├── document.md
│       ├── license.md
│       └── submit-code.md
├── docs/                          # Current version docs (synced from main repo)
├── docusaurus.config.js           # Site configuration
├── i18n/zh-CN/                    # Chinese translations
│   ├── docusaurus-plugin-content-blog/
│   ├── docusaurus-plugin-content-docs/
│   └── docusaurus-plugin-content-docs-community/
├── package.json
├── sidebars.js                    # Doc sidebar config (synced)
├── src/
│   ├── components/
│   ├── css/
│   ├── pages/
│   │   ├── home/                  # Homepage
│   │   ├── team/                  # Team page (with Base64 avatars)
│   │   ├── user/                  # User showcase
│   │   └── versions/             # Version selector
│   └── styles/
├── static/                        # Static assets
│   ├── doc/image/, image_en/, image_zh/
│   ├── home/, image/, user/
│   └── js/google_translate_init.js
├── tools/
│   ├── build-docs.js              # Doc sync script
│   ├── common.js                  # Shared constants
│   ├── version.js                 # Version management
│   ├── image-copy.js              # Image processing
│   └── fetch-team-avatars.js      # Team avatar updater
├── versioned_docs/                # Historical version docs (20 versions: 1.x ~ 2.3.12)
├── versioned_sidebars/            # Historical sidebars
├── versions.json                  # Version registry
└── seatunnel_web_versions.json    # Web UI version registry

Documentation Categories (per version)

Each version's docs contain:

```
about.md
```
- Project overview
```
command/
```
- CLI commands (e.g., connector-check)
```
concept/
```
- Core concepts (config, schema-evolution, speed-limit, SQL config, event-listener)
```
connector-v2/
```
- Connector documentation
- ```
source/
```
  - Source connector docs (100+ connectors)
- ```
sink/
```
  - Sink connector docs
- ```
changelog/
```
  - Per-connector changelogs
- ```
formats/
```
  - Data formats (Avro, Canal JSON, Debezium JSON, OGG JSON, Protobuf)
```
contribution/
```
- Contributor guides
```
faq.md
```
- Frequently asked questions
```
other-engine/
```
- Flink/Spark engine specifics
```
seatunnel-engine/
```
- Zeta engine docs + telemetry
```
start-v2/
```
- Getting started guides
- ```
docker/
```
  - Docker deployment
- ```
kubernetes/
```
  - K8s deployment
- ```
locally/
```
  - Local setup
```
transform-v2/
```
- Transform documentation (SQL, etc.)

Versioned Documentation

20 historical versions maintained:

1.x

2.1.0

2.1.3
,
2.2.0-beta
,
2.3.0-beta

2.3.12

Total: ~3691 versioned doc files + ~1652 i18n files

Website Development

# Clone
git clone git@github.com:apache/seatunnel-website.git
cd seatunnel-website

# Sync docs from main repo
npm run sync
# Or with SSH: export PROTOCOL_MODE=ssh && npm run sync

# Install dependencies
npm install

# Dev server (English)
npm run start

# Dev server (Chinese)
npm run start-zh

# Production build (needs ~10GB heap)
npm run build
# Or faster parallel build
npm run build:fast

# Serve built site
npm run serve

Adding a New Version

# 1. Create versioned snapshot
npm run version <target_version>

# 2. Update download links
# Edit src/pages/download/st_data.json

Branching Strategy

Branch	Purpose
`main`	Default development branch
`asf-site`	Production (https://seatunnel.apache.org)
`asf-staging`	Staging (https://seatunnel.staged.apache.org)

CI/CD

GitHub Actions workflow (

.github/workflows/deploy.yml

Triggers: push to
```
main
```
, PRs to
```
main
```
, daily cron (5:00 AM UTC)
Node.js 18.20.7

Steps:

npm install

npm run sync

npm run build

-> deploy to

asf-site

branch

Key Conventions

Directory names: lowercase, underscore-separated, plural (e.g.,
```
scripts
```
,
```
components
```
)
JS/static files: lowercase, dash-separated (e.g.,
```
render-dom.js
```
)
Images: stored under
```
static/{module_name}
```
Styles: placed in
```
src/css/
```
Most pages have "Edit this page" link pointing to GitHub source

Development

Project Architecture

SeaTunnel follows a modular architecture:

seatunnel/
├── seatunnel-api/              # Core APIs
├── seatunnel-core/             # Execution engine
├── seatunnel-engines/          # Engine implementations
│   ├── seatunnel-engine-flink/
│   ├── seatunnel-engine-spark/
│   └── seatunnel-engine-zeta/
├── seatunnel-connectors/       # 100+ connector implementations
│   ├── seatunnel-connectors-*/ # One per connector type
└── seatunnel-dist/             # Distribution package

Building from Source

Full Build

# Clone repository
git clone https://github.com/apache/seatunnel.git
cd seatunnel

# Build all modules
mvn clean install -DskipTests

# Build with tests (slower)
mvn clean install

Build Specific Module

# Build only Kafka connector
mvn clean install -pl seatunnel-connectors/seatunnel-connectors-seatunnel-kafka -DskipTests

# Build only Flink engine
mvn clean install -pl seatunnel-engines/seatunnel-engine-flink -DskipTests

Build Distribution

# Create binary distribution
mvn clean install -DskipTests
cd seatunnel-dist
tar -tzf target/apache-seatunnel-*-bin.tar.gz | head

Running Tests

Unit Tests

# Run all tests
mvn test

# Run specific test class
mvn test -Dtest=MySqlConnectorTest

# Run with coverage
mvn test jacoco:report

Integration Tests

# Run integration tests
mvn verify

# Run with Docker containers
mvn verify -Pintegration-tests

Development Setup

IDE Configuration (IntelliJ IDEA)

Import Project
- File → Open → Select seatunnel directory
- Choose Maven as build system
Configure JDK
- Project Settings → Project → JDK
- Select JDK 1.8 or higher
Enable Annotation Processing
- Project Settings → Build → Compiler → Annotation Processors
- Enable annotation processing
Run Configuration
- Run → Edit Configurations
- Add new "Application" configuration
- Set main class:
```
org.apache.seatunnel.core.starter.command.CommandExecuteRunner
```

Code Style

SeaTunnel follows Apache project conventions:

4-space indentation (not tabs)
Line length max 120 characters
Standard Java naming conventions
Organize imports alphabetically

Use the provided

.editorconfig

# Install EditorConfig plugin (IntelliJ)
# Then your IDE will auto-format code

Creating Custom Connectors

1. Extend SeConnector Interface

import org.apache.seatunnel.api.source.SeSource;
import org.apache.seatunnel.api.table.catalog.Table;

public class CustomSource extends SeSource {
    @Override
    public String getPluginName() {
        return "Custom";
    }

    @Override
    public void validate() {
        // Validation logic
    }

    @Override
    public ResultSet read(Boundedness boundedness) {
        // Implementation
    }
}

2. Create Configuration Class

import org.apache.seatunnel.api.configuration.Option;
import org.apache.seatunnel.api.configuration.Options;

public class CustomSourceOptions {
    public static final Option<String> HOST =
        Options.key("host")
            .stringType()
            .noDefaultValue()
            .withDescription("Source hostname");

    public static final Option<Integer> PORT =
        Options.key("port")
            .intType()
            .defaultValue(9200)
            .withDescription("Source port");
}

3. Register Connector

META-INF/services/org.apache.seatunnel.api.source.SeSource

Contributing Guide

Fork and Clone

git clone https://github.com/YOUR_USERNAME/seatunnel.git
cd seatunnel
git remote add upstream https://github.com/apache/seatunnel.git

Create Feature Branch

git checkout -b feature/my-feature

Make Changes
- Follow code style guide
- Add tests for new features
- Update documentation

Commit and Push

git add .
git commit -m "feat: add new feature"
git push origin feature/my-feature

Create Pull Request
- Go to GitHub repository
- Create PR with clear description
- Link any related issues
- Wait for review
Code Review Process
- Address feedback from maintainers
- Update PR with changes
- After approval, maintainers will merge

Troubleshooting

Common Issues and Solutions

Issue 1: "ClassNotFoundException: com.mysql.jdbc.Driver"

Cause: JDBC driver JAR not in classpath

Solution:

# 1. Download MySQL JDBC driver
wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.33.jar

# 2. Copy to lib directory
cp mysql-connector-java-8.0.33.jar $SEATUNNEL_HOME/lib/

# 3. Restart job
seatunnel.sh -c config/job.conf -e spark

Issue 2: "OutOfMemoryError: Java heap space"

Cause: Insufficient JVM heap memory

Solution:

# Increase JVM memory
export JVM_OPTS="-Xms2G -Xmx8G"

# Or set in seatunnel-env.sh
echo 'JVM_OPTS="-Xms2G -Xmx8G"' >> $SEATUNNEL_HOME/bin/seatunnel-env.sh

Issue 3: "Connection refused: connect"

Cause: Source/sink service not reachable

Solution:

# 1. Verify connectivity
ping source-host
telnet source-host 3306

# 2. Check credentials
mysql -h source-host -u root -p

# 3. Check firewall rules
# Ensure port 3306 is open

Issue 4: "Table not found" during CDC

Cause: Binlog not enabled on MySQL

Solution:

-- Check binlog status
SHOW VARIABLES LIKE 'log_bin';

-- Enable binlog in my.cnf
[mysqld]
log_bin = mysql-bin
binlog_format = row  # Important for CDC

-- Restart MySQL and verify
SHOW MASTER STATUS;

Issue 5: Slow Job Performance

Cause: Suboptimal configuration

Solutions:

# 1. Increase parallelism
env {
  parallelism = 8  # or higher based on cluster
}

# 2. Increase batch sizes
source {
  Jdbc {
    fetch_size = 5000
    split_size = 100000
  }
}

sink {
  Jdbc {
    batch_size = 2000
  }
}

# 3. Enable connection pooling
source {
  Jdbc {
    pool_size = 10
    max_idle_time = 300
  }
}

Issue 6: "Kafka topic offset out of range"

Cause: Offset doesn't exist in topic

Solution:

source {
  Kafka {
    # Reset to earliest or latest
    auto.offset.reset = "earliest"  # or "latest"

    # Or specify explicit offsets
    start_mode = "earliest"
  }
}

FAQ

Q: What's the difference between BATCH and STREAMING mode?

BATCH: One-time execution, suitable for full database migration
STREAMING: Continuous execution, suitable for real-time data sync and CDC

Q: How do I handle schema changes during CDC?

A: SeaTunnel automatically detects schema changes in CDC mode. Configure in source:

source {
  Mysql {
    schema_change_mode = "auto"  # auto-detect and apply
  }
}

Q: Can I transform data during synchronization?

A: Yes, use SQL transform:

transform {
  Sql {
    sql = "SELECT id, UPPER(name) as name FROM source"
  }
}

Q: What's the maximum throughput?

A: Depends on:

Hardware (CPU, RAM, Network)
Source/sink database configuration
Data size per record
Network latency

Typical throughput: 100K - 1M records/second per executor

Q: How do I handle errors in production?

A: Configure restart strategy:

env {
  restart_strategy = "exponential_delay"
  restart_strategy.exponential_delay.initial_delay = 1000
  restart_strategy.exponential_delay.max_delay = 30000
  restart_strategy.exponential_delay.multiplier = 2.0
  restart_strategy.exponential_delay.attempts_unlimited = true
}

Q: Is there a web UI for job management?

A: Yes! Use SeaTunnel Web Project:

# Check out web UI project
git clone https://github.com/apache/seatunnel-web.git
cd seatunnel-web
mvn clean install

# Run web UI
java -jar target/seatunnel-web-*.jar
# Access at http://localhost:8080

Resources

Official Links

Community

Related Projects

Learning Resources

Version History

2.3.12 (Latest Stable) - Current recommended version
2.3.13-SNAPSHOT (Development)
20 historical versions maintained in documentation (1.x ~ 2.3.12)
All Releases

Additional Notes

License

Apache License 2.0 - See LICENSE file

Security

Report security issues via Apache Security
Do NOT create public issues for security vulnerabilities

Support & Contribution

Join the community Slack for support
Submit feature requests on GitHub Issues
Contribute code via Pull Requests
Follow Contributing Guide

Last Updated: 2026-02-26 Skill Version: 2.3.13 Sources: apache/seatunnel + apache/seatunnel-website