Asi performing-log-source-onboarding-in-siem

Perform structured log source onboarding into SIEM platforms by configuring collectors, parsers, normalization, and validation for complete security visibility.

install

source · Clone the upstream repo

git clone https://github.com/plurigrid/asi

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/plurigrid/asi "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/asi/skills/performing-log-source-onboarding-in-siem" ~/.claude/skills/plurigrid-asi-performing-log-source-onboarding-in-siem && rm -rf "$T"

manifest: plugins/asi/skills/performing-log-source-onboarding-in-siem/SKILL.md

Performing Log Source Onboarding in SIEM

Overview

Log source onboarding is the systematic process of integrating new data sources into a SIEM platform to enable security monitoring and detection. Proper onboarding requires planning data sources, configuring collection agents, building parsers, normalizing fields to a common schema, and validating data quality. According to the UK NCSC, onboarding should prioritize log sources that provide the highest security value relative to their ingestion cost.

When to Use

When conducting security assessments that involve performing log source onboarding in siem
When following incident response procedures for related security events
When performing scheduled security testing or auditing activities
When validating security controls through hands-on testing

Prerequisites

SIEM platform deployed (Splunk, Elastic, Sentinel, QRadar, or similar)
Network access from source systems to SIEM collectors
Administrative access on source systems for agent installation
Common Information Model (CIM) or equivalent schema documentation
Change management approval for production system modifications

Log Source Priority Framework

Tier 1 - Critical (Onboard First)

Source	Log Type	Security Value
Active Directory	Security Event Logs	Authentication, privilege escalation
Firewalls	Traffic logs	Network access, C2 detection
EDR/AV	Endpoint alerts	Malware, process execution
VPN/Remote Access	Connection logs	Unauthorized access
DNS Servers	Query logs	C2 beaconing, data exfiltration
Email Gateway	Email security logs	Phishing, BEC

Tier 2 - High Priority

Source	Log Type	Security Value
Web Proxy	HTTP/HTTPS logs	Web-based attacks, data exfiltration
Cloud platforms (AWS/Azure/GCP)	Audit logs	Cloud security posture
Database servers	Audit/query logs	Data access, SQL injection
DHCP/IPAM	Address allocation	Asset tracking
File servers	Access logs	Data access monitoring

Tier 3 - Standard

Source	Log Type	Security Value
Application servers	App logs	Application-level attacks
Print servers	Print logs	Data loss prevention
Badge/physical access	Access logs	Physical security correlation
Network devices (switches/routers)	Syslog	Network anomalies

Onboarding Process

Step 1: Discovery and Assessment

1. Identify the log source:
   - System type and version
   - Log format (syslog, CEF, JSON, Windows Events, etc.)
   - Log volume estimate (EPS - events per second)
   - Network location and firewall requirements

2. Assess security value:
   - What threats can this source help detect?
   - Which MITRE ATT&CK techniques does it cover?
   - Is there an existing SIEM parser?

3. Estimate ingestion cost:
   - Daily volume in GB
   - License impact (per-GB or per-EPS pricing)
   - Storage retention requirements

Step 2: Configure Log Collection

Syslog-Based Collection (Firewalls, Network Devices)

# rsyslog configuration for receiving syslog
# /etc/rsyslog.d/10-siem-collection.conf

# UDP reception
module(load="imudp")
input(type="imudp" port="514" ruleset="siem_forwarding")

# TCP reception
module(load="imtcp")
input(type="imtcp" port="514" ruleset="siem_forwarding")

# TLS reception
module(load="imtcp" StreamDriver.AuthMode="x509/name"
       StreamDriver.Mode="1" StreamDriver.Name="gtls")
input(type="imtcp" port="6514" ruleset="siem_forwarding")

ruleset(name="siem_forwarding") {
    # Forward to SIEM
    action(type="omfwd" target="siem.company.com" port="9514"
           protocol="tcp" queue.type="LinkedList"
           queue.filename="siem_fwd" queue.maxdiskspace="1g"
           queue.saveonshutdown="on" action.resumeRetryCount="-1")
}

Windows Event Log Collection (Splunk Universal Forwarder)

# inputs.conf on Splunk Universal Forwarder
[WinEventLog://Security]
disabled = 0
index = wineventlog
sourcetype = WinEventLog:Security
evt_resolve_ad_obj = 1
checkpointInterval = 5

[WinEventLog://System]
disabled = 0
index = wineventlog
sourcetype = WinEventLog:System

[WinEventLog://Microsoft-Windows-Sysmon/Operational]
disabled = 0
index = wineventlog
sourcetype = XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
renderXml = true

[WinEventLog://Microsoft-Windows-PowerShell/Operational]
disabled = 0
index = wineventlog
sourcetype = XmlWinEventLog:Microsoft-Windows-PowerShell/Operational

Cloud Log Collection (AWS CloudTrail)

{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Resources": {
    "CloudTrailToSIEM": {
      "Type": "AWS::CloudTrail::Trail",
      "Properties": {
        "TrailName": "siem-cloudtrail",
        "S3BucketName": "company-cloudtrail-logs",
        "IsLogging": true,
        "IsMultiRegionTrail": true,
        "IncludeGlobalServiceEvents": true,
        "EnableLogFileValidation": true,
        "EventSelectors": [
          {
            "ReadWriteType": "All",
            "IncludeManagementEvents": true,
            "DataResources": [
              {
                "Type": "AWS::S3::Object",
                "Values": ["arn:aws:s3"]
              }
            ]
          }
        ]
      }
    }
  }
}

Step 3: Parse and Normalize

Custom Parser Example (Splunk props.conf/transforms.conf)

# props.conf
[custom:firewall:logs]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%dT%H:%M:%S%z
MAX_TIMESTAMP_LOOKAHEAD = 30
TRANSFORMS-firewall = firewall_extract_fields
FIELDALIAS-src = src_addr AS src_ip
FIELDALIAS-dst = dst_addr AS dest_ip
EVAL-action = case(fw_action=="allow", "allowed", fw_action=="deny", "blocked", true(), "unknown")
EVAL-vendor_product = "Custom Firewall"
LOOKUP-geo = geo_ip_lookup ip AS dest_ip OUTPUT country, city, latitude, longitude

# transforms.conf
[firewall_extract_fields]
REGEX = ^(\S+)\s+(\S+)\s+action=(\w+)\s+src=(\S+):(\d+)\s+dst=(\S+):(\d+)\s+proto=(\w+)\s+bytes=(\d+)
FORMAT = timestamp::$1 hostname::$2 fw_action::$3 src_addr::$4 src_port::$5 dst_addr::$6 dst_port::$7 protocol::$8 bytes::$9

CIM Field Mapping

Raw Field	CIM Field	Data Model
src_addr	src_ip	Network_Traffic
dst_addr	dest_ip	Network_Traffic
dst_port	dest_port	Network_Traffic
fw_action	action	Network_Traffic
bytes_sent + bytes_recv	bytes	Network_Traffic
user_name	user	Authentication
login_result	action	Authentication
process_path	process	Endpoint

Step 4: Validate Data Quality

# Verify events are arriving
index=new_source earliest=-1h
| stats count by sourcetype, host, source

# Check field extraction quality
index=new_source earliest=-1h
| stats count(src_ip) as has_src count(dest_ip) as has_dest count(action) as has_action count by sourcetype
| eval src_coverage=round(has_src/count*100,1)
| eval dest_coverage=round(has_dest/count*100,1)
| eval action_coverage=round(has_action/count*100,1)

# Verify CIM compliance
| datamodel Network_Traffic search
| search sourcetype=new_sourcetype
| stats count by source, sourcetype

# Check for timestamp parsing issues
index=new_source earliest=-1h
| eval time_diff=abs(_time - _indextime)
| stats avg(time_diff) as avg_lag max(time_diff) as max_lag by host
| where avg_lag > 300

Step 5: Enable Detection Coverage

# Verify existing correlation searches work with new source
index=new_source sourcetype=new_sourcetype
| tstats count from datamodel=Authentication by _time span=1h
| timechart span=1h count

# Create source-specific detection rule
[New Source - Authentication Anomaly]
search = index=new_source sourcetype=new_sourcetype action=failure \
| stats count by src_ip, user \
| where count > 10