Awesome-omni-skill fluentbit-validator
Comprehensive toolkit for validating, linting, and testing Fluent Bit configurations. Use this skill when working with Fluent Bit config files, validating syntax, checking for best practices, identifying security issues, or performing dry-run testing.
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tools/fluentbit-validator" ~/.claude/skills/diegosouzapw-awesome-omni-skill-fluentbit-validator && rm -rf "$T"
skills/tools/fluentbit-validator/SKILL.mdFluent Bit Config Validator
Overview
This skill provides a comprehensive validation workflow for Fluent Bit configurations, combining syntax validation, semantic checks, security auditing, best practice enforcement, and dry-run testing. Validate Fluent Bit configs with confidence before deploying to production.
Fluent Bit uses an INI-like configuration format with sections ([SERVICE], [INPUT], [FILTER], [OUTPUT], [PARSER]) and key-value pairs. This validator ensures configurations are syntactically correct, semantically valid, secure, and optimized for production use.
When to Use This Skill
Invoke this skill when:
- Validating Fluent Bit configurations before deployment
- Debugging configuration syntax errors
- Testing configurations with fluent-bit --dry-run
- Working with custom plugins that need documentation
- Ensuring configs follow Fluent Bit best practices
- Auditing configurations for security issues
- Optimizing performance settings (buffers, flush intervals)
- The user asks to "validate", "lint", "check", or "test" Fluent Bit configs
- Troubleshooting configuration-related errors
Validation Workflow
Follow this sequential validation workflow. Each stage catches different types of issues.
Recommended: For comprehensive validation, use
which runs all validation stages in sequence:--check allpython3 scripts/validate_config.py --file <config-file> --check allIndividual check modes are available for targeted validation when debugging specific issues.
Stage 1: Configuration File Structure
Verify the basic file structure and format:
python3 scripts/validate_config.py --file <config-file> --check structure
Expected format:
- INI-style sections with
headers[SECTION] - Key-value pairs with proper spacing
- Comments starting with
# - Sections: SERVICE, INPUT, FILTER, OUTPUT, PARSER (or MULTILINE_PARSER)
- Proper indentation (spaces, not tabs recommended)
Common issues caught:
- Missing section headers
- Malformed key-value pairs
- Invalid section names
- Syntax errors (unclosed brackets, etc.)
- Mixed tabs and spaces
- UTF-8 encoding issues
Stage 2: Section Validation
Validate all configuration sections (SERVICE, INPUT, FILTER, OUTPUT, PARSER):
python3 scripts/validate_config.py --file <config-file> --check sections
This single command validates all section types. The checks performed for each section type are detailed below.
SERVICE Section Checks
Checks:
- Required parameters: Flush
- Valid parameter names (no typos)
- Parameter value types (Flush must be numeric)
- Log_Level values: off, error, warn, info, debug, trace
- HTTP_Server values: On/Off
- Parsers_File references (file existence)
Common issues:
- Missing Flush parameter
- Invalid Log_Level value
- Parsers_File path doesn't exist
- Negative or zero Flush interval
Best practices:
- Flush: 1-5 seconds (balance latency vs. efficiency)
- Log_Level: info for production, debug for troubleshooting
- HTTP_Server: On (for health checks and metrics)
- storage.metrics: on (for monitoring)
INPUT Section Checks
Checks:
- Required parameters: Name
- Valid plugin names (tail, systemd, tcp, forward, http, etc.)
- Tag format (no spaces, valid characters)
- File paths exist (for tail plugin)
- Memory limits are set (Mem_Buf_Limit)
- DB file paths are valid
- Port numbers are in valid range (1-65535)
Common issues:
- Missing Name parameter
- Invalid plugin name (typo)
- Missing Tag parameter
- Path doesn't exist
- Missing Mem_Buf_Limit (OOM risk)
- Missing DB file (no position tracking)
- Port conflicts
Best practices:
- Always set Mem_Buf_Limit (50-100MB typical)
- Use DB for tail inputs (crash recovery)
- Set Skip_Long_Lines On (prevents hang)
- Use appropriate Tag patterns for routing
- Set Refresh_Interval for tail (10 seconds typical)
FILTER Section Checks
Checks:
- Required parameters: Name, Match (or Match_Regex)
- Valid filter plugin names
- Match pattern syntax
- Tag pattern wildcards are valid
- Filter-specific parameters
Common issues:
- Missing Match parameter
- Invalid filter plugin name
- Match pattern doesn't match any INPUT tags
- Missing required plugin-specific parameters
Best practices:
- Use specific Match patterns (avoid "*" unless intended)
- Order filters logically (parsers before modifiers)
- Use kubernetes filter in K8s environments
- Parse JSON logs early in pipeline
OUTPUT Section Checks
Checks:
- Required parameters: Name, Match
- Valid output plugin names (including elasticsearch, kafka, loki, s3, cloudwatch, http, forward, file, opentelemetry)
- Host/Port validity
- Retry_Limit is set
- Storage limits are configured
- TLS configuration (if enabled)
- OpenTelemetry-specific: URI endpoints (metrics_uri, logs_uri, traces_uri), authentication headers, resource attributes
Common issues:
- Missing Match parameter
- Invalid output plugin name
- Match pattern doesn't match any INPUT tags
- Missing Retry_Limit (infinite retries risk)
- Missing storage.total_limit_size (disk exhaustion risk)
- Hardcoded credentials (security issue)
Best practices:
- Set Retry_Limit 3-5
- Configure storage.total_limit_size
- Enable TLS in production
- Use environment variables for credentials
- Enable compression when available
PARSER Section Checks
Checks:
- Required parameters: Name, Format
- Valid parser formats: json, regex, logfmt, ltsv
- Regex syntax validity
- Time_Format compatibility with Time_Key
- MULTILINE_PARSER rule syntax
Common issues:
- Invalid regex patterns
- Time_Format doesn't match log timestamps
- Missing Time_Key when using Time_Format
- MULTILINE_PARSER rules don't match
Best practices:
- Test regex patterns with sample logs
- Use built-in parsers when possible
- Set proper Time_Format for timestamp parsing
- Use MULTILINE_PARSER for stack traces
Stage 3: Tag Consistency Check
Validate that tags flow correctly through the pipeline:
python3 scripts/validate_config.py --file <config-file> --check tags
Checks:
- INPUT tags match FILTER Match patterns
- FILTER tags match OUTPUT Match patterns
- No orphaned filters (Match pattern doesn't match any INPUT)
- No orphaned outputs (Match pattern doesn't match any INPUT/FILTER)
- Tag wildcards are used correctly
Common issues:
- FILTER Match pattern doesn't match any INPUT Tag
- OUTPUT Match pattern doesn't match any logs
- Typo in Match pattern
- Incorrect wildcard usage
Example validation:
[INPUT] Tag kube.* # Produces: kube.var.log.containers.pod.log [FILTER] Match kube.* # Matches: ✅ [OUTPUT] Match app.* # Matches: ❌ No logs will reach this output
Stage 4: Security Audit
Scan configuration for security issues:
python3 scripts/validate_config.py --file <config-file> --check security
Checks performed:
-
Hardcoded credentials:
- HTTP_User, HTTP_Passwd in OUTPUT
- AWS_Access_Key, AWS_Secret_Key
- Passwords in plain text
- API keys and tokens
-
TLS configuration:
- TLS disabled for production outputs
- tls.verify Off (man-in-the-middle risk)
- Missing certificate files
-
File permissions:
- DB files readable/writable
- Parser files exist and readable
- Log files have appropriate permissions
-
Network exposure:
- INPUT plugins listening on 0.0.0.0 without auth
- Open ports without firewall mentions
- HTTP_Server exposed without auth
Security best practices:
- Use environment variables:
HTTP_User ${ES_USER} - Enable TLS:
tls On - Verify certificates:
tls.verify On - Don't listen on 0.0.0.0 for sensitive inputs
- Use authentication for HTTP endpoints
Auto-fix suggestions:
# Before (insecure) [OUTPUT] HTTP_User admin HTTP_Passwd password123 # After (secure) [OUTPUT] HTTP_User ${ES_USER} HTTP_Passwd ${ES_PASSWORD}
Stage 5: Performance Analysis
Analyze configuration for performance issues:
python3 scripts/validate_config.py --file <config-file> --check performance
Checks:
-
Buffer limits:
- Mem_Buf_Limit is set on all tail inputs
- storage.total_limit_size is set on outputs
- Limits are reasonable (not too small or too large)
-
Flush intervals:
- Flush interval is appropriate (1-5 sec typical)
- Not too low (high CPU) or too high (high memory)
-
Resource usage:
- Skip_Long_Lines enabled (prevents hang)
- Refresh_Interval set (file discovery)
- Compression enabled on network outputs
-
Kubernetes-specific:
- Buffer_Size 0 for kubernetes filter (recommended)
- Mem_Buf_Limit not too low for container logs
Performance recommendations:
# Good configuration [SERVICE] Flush 1 # 1 second: good balance [INPUT] Mem_Buf_Limit 50MB # Prevents OOM Skip_Long_Lines On # Prevents hang Refresh_Interval 10 # File discovery every 10s [OUTPUT] storage.total_limit_size 5G # Disk buffer limit Retry_Limit 3 # Don't retry forever Compress gzip # Reduce bandwidth
Stage 6: Best Practice Validation
Check against Fluent Bit best practices:
python3 scripts/validate_config.py --file <config-file> --check best-practices
Checks:
-
Required configurations:
- SERVICE section exists
- At least one INPUT
- At least one OUTPUT
- HTTP_Server enabled (for health checks)
-
Kubernetes configurations:
- kubernetes filter used for K8s logs
- Proper Kube_URL, Kube_CA_File, Kube_Token_File
- Exclude_Path to prevent log loops
- DB file for position tracking
-
Reliability:
- Retry_Limit set on outputs
- DB file for tail inputs
- storage.type filesystem for critical logs
-
Observability:
- HTTP_Server enabled
- storage.metrics enabled
- Proper Log_Level (info or debug)
Best practice checklist:
- ✅ SERVICE section with Flush parameter
- ✅ HTTP_Server enabled for health checks
- ✅ Mem_Buf_Limit on all tail inputs
- ✅ DB file for tail inputs (position tracking)
- ✅ Retry_Limit on all outputs
- ✅ storage.total_limit_size on outputs
- ✅ TLS enabled for production
- ✅ Environment variables for credentials
- ✅ kubernetes filter for K8s environments
- ✅ Exclude_Path to prevent log loops
Stage 7: Dry-Run Testing
Test configuration with Fluent Bit dry-run (if binary available):
fluent-bit -c <config-file> --dry-run
This catches:
- Configuration parsing errors
- Plugin loading errors
- Parser syntax errors
- File permission issues
- Missing dependencies
Common errors:
- Parser file not found:
[error] [config] parser file 'parsers.conf' not found
Fix: Create parser file or update Parsers_File path
- Plugin not found:
[error] [plugins] invalid plugin 'unknownplugin'
Fix: Check plugin name spelling or install plugin
- Invalid parameter:
[error] [input:tail] invalid property 'InvalidParam'
Fix: Remove invalid parameter or check documentation
- Permission denied:
[error] cannot open /var/log/containers/*.log
Fix: Check file permissions or run with appropriate user
If fluent-bit binary is not available:
- Skip this stage
- Document that dry-run testing was skipped
- Recommend testing in development environment
Stage 8: Documentation Lookup (if needed)
If configuration uses unfamiliar plugins or parameters:
Try context7 MCP first:
Use mcp__context7__resolve-library-id with "fluent-bit" Then use mcp__context7__get-library-docs with: - context7CompatibleLibraryID: /fluent/fluent-bit-docs - topic: "<plugin-type> <plugin-name> configuration" - page: 1
Fallback to WebSearch:
Search query: "fluent-bit <plugin-type> <plugin-name> configuration parameters site:docs.fluentbit.io" Examples: - "fluent-bit output elasticsearch configuration parameters site:docs.fluentbit.io" - "fluent-bit filter kubernetes configuration parameters site:docs.fluentbit.io"
Extract information:
- Required parameters
- Optional parameters and defaults
- Valid value ranges
- Example configurations
Stage 9: Report and Fix Issues
After validation, present comprehensive findings:
1. Summarize all issues:
Validation Report for fluent-bit.conf ===================================== Errors (3): - [Line 15] OUTPUT elasticsearch missing required parameter 'Host' - [Line 25] FILTER Match pattern 'app.*' doesn't match any INPUT tags - [Line 8] INPUT tail missing Mem_Buf_Limit (OOM risk) Warnings (2): - [Line 30] OUTPUT elasticsearch has hardcoded password (security risk) - [Line 12] INPUT tail missing DB file (no crash recovery) Info (1): - [Line 3] SERVICE Flush interval is 10s (consider reducing for lower latency) Best Practices (2): - Consider enabling HTTP_Server for health checks - Consider enabling compression on OUTPUT elasticsearch
2. Categorize by severity:
- Errors (must fix): Configuration won't work, Fluent Bit won't start
- Warnings (should fix): Configuration works but has issues
- Info (consider): Optimization opportunities
- Best Practices: Recommended improvements
3. Propose specific fixes:
# Fix 1: Add missing Host parameter [OUTPUT] Name es Match * Host elasticsearch.logging.svc # Added Port 9200 # Fix 2: Add Mem_Buf_Limit to prevent OOM [INPUT] Name tail Tag kube.* Path /var/log/containers/*.log Mem_Buf_Limit 50MB # Added # Fix 3: Use environment variable for password [OUTPUT] Name es HTTP_User admin HTTP_Passwd ${ES_PASSWORD} # Changed from hardcoded
4. Get user approval via AskUserQuestion
5. Apply approved fixes using Edit tool
6. Re-run validation to confirm
7. Provide completion summary:
✅ Validation Complete - 5 issues fixed Fixed Issues: - fluent-bit.conf:15 - Added missing Host parameter to OUTPUT elasticsearch - fluent-bit.conf:8 - Added Mem_Buf_Limit 50MB to INPUT tail - fluent-bit.conf:30 - Changed hardcoded password to environment variable - fluent-bit.conf:12 - Added DB file for crash recovery - fluent-bit.conf:25 - Fixed FILTER Match pattern to match INPUT tags Validation Status: All checks passed ✅ - Structure: Valid - Syntax: Valid - Tags: Consistent - Security: No issues - Performance: Optimized - Best Practices: Compliant - Dry-run: Passed (if applicable)
8. Report-only summary (when user declines fixes):
If user chooses not to apply fixes, provide a report-only summary:
📋 Validation Report Complete - No fixes applied Summary: - Errors: 2 (must fix before deployment) - Warnings: 16 (should fix) - Info: 15 (optimization suggestions) Critical Issues Requiring Attention: - [Line 5] Invalid Log_Level 'invalid_level' - [Line 52] [OUTPUT opentelemetry] missing required parameter 'Host' Recommendations: - Review the errors above before deploying this configuration - Consider addressing warnings to improve reliability and security - Run validation again after manual fixes: python3 scripts/validate_config.py --file <config> --check all
Common Issues and Solutions
Configuration Errors
Issue: Parser file not found
[error] [config] parser file 'parsers.conf' not found
Solution:
- Verify Parsers_File path in SERVICE section
- Check if file exists at specified location
- Use relative path from config file location
Issue: Missing required parameter
[error] [output:es] property 'Host' not set
Solution:
- Add required parameter to OUTPUT section
- Check documentation for required fields
Issue: Invalid plugin name
[error] [plugins] invalid plugin 'unknownplugin'
Solution:
- Check plugin name spelling
- Verify plugin is available (may need installation)
- Consult documentation for correct plugin names
Tag Routing Issues
Issue: No logs reaching output
# Logs are generated but don't appear in output
Debug:
- Check INPUT Tag matches FILTER Match
- Check FILTER Match/tag_prefix matches OUTPUT Match
- Enable debug logging:
Log_Level debug - Check for grep filters excluding all logs
Solution:
[INPUT] Tag kube.* [FILTER] Match kube.* # Must match INPUT Tag [OUTPUT] Match kube.* # Must match INPUT or FILTER tag
Memory Issues
Issue: Fluent Bit OOM killed
# Container or process killed due to memory
Solution:
- Add Mem_Buf_Limit to all tail inputs
- Reduce Mem_Buf_Limit values
- Set storage.total_limit_size on outputs
- Increase Flush interval (batch more)
- Add log filtering to reduce volume
Security Issues
Issue: Hardcoded credentials in config
[OUTPUT] HTTP_Passwd secretpassword
Solution:
- Use environment variables:
[OUTPUT] HTTP_Passwd ${ES_PASSWORD}
- Mount secrets in Kubernetes
- Use IAM roles for cloud services (AWS, GCP, Azure)
Issue: TLS disabled or not verified
[OUTPUT] tls On tls.verify Off
Solution:
- Enable verification for production:
[OUTPUT] tls On tls.verify On tls.ca_file /path/to/ca.crt
Integration with fluentbit-generator
This validator is automatically invoked by the fluentbit-generator skill after generating configurations. It can also be used standalone to validate existing configurations.
Generator workflow:
- Generate configuration using fluentbit-generator
- Automatically validate using fluentbit-validator
- Fix any issues found
- Re-validate until all checks pass
- Deploy with confidence
Resources
scripts/
validate_config.py
- Main validation script with all checks integrated in a single file
- Usage:
python3 scripts/validate_config.py --file <config> --check <type> - Available check types:
,all
,structure
,syntax
,sections
,tags
,security
,performance
,best-practicesdry-run - Comprehensive 1000+ line validator covering all validation stages
- Includes syntax validation, section validation, tag consistency, security audit, performance analysis, and best practices
- Returns detailed error messages with line numbers
- Supports JSON output format:
--json
validate.sh
- Convenience wrapper script for easier invocation
- Usage:
bash scripts/validate.sh <config-file> - Automatically calls validate_config.py with proper Python interpreter
- Simplifies command-line usage
tests/
Test Configuration Files:
- Valid basic Kubernetes logging setupvalid-basic.conf
- Valid configuration with multiple outputsvalid-multioutput.conf
- Valid OpenTelemetry output configuration (Fluent Bit 2.x+)valid-opentelemetry.conf
- Missing required parametersinvalid-missing-required.conf
- Security vulnerabilities (hardcoded credentials, disabled TLS)invalid-security-issues.conf
- OpenTelemetry configuration errorsinvalid-opentelemetry.conf
- Tag routing issuesinvalid-tag-mismatch.conf
Running Tests:
# Test on valid config python3 scripts/validate_config.py --file tests/valid-basic.conf # Test on invalid config (should report errors) python3 scripts/validate_config.py --file tests/invalid-security-issues.conf # Test all configs for config in tests/*.conf; do echo "Testing $config" python3 scripts/validate_config.py --file "$config" done
Documentation Sources
Based on comprehensive research from:
- Fluent Bit Official Documentation
- Fluent Bit Operations and Best Practices
- Configuration File Format
- Context7 Fluent Bit documentation (/fluent/fluent-bit-docs)