Agents tracing-upstream-lineage
Trace upstream data lineage. Use when the user asks where data comes from, what feeds a table, upstream dependencies, data sources, or needs to understand data origins.
git clone https://github.com/astronomer/agents
T=$(mktemp -d) && git clone --depth=1 https://github.com/astronomer/agents "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tracing-upstream-lineage" ~/.claude/skills/astronomer-agents-tracing-upstream-lineage && rm -rf "$T"
skills/tracing-upstream-lineage/SKILL.mdUpstream Lineage: Sources
Trace the origins of data - answer "Where does this data come from?"
Lineage Investigation
Step 1: Identify the Target Type
Determine what we're tracing:
- Table: Trace what populates this table
- Column: Trace where this specific column comes from
- DAG: Trace what data sources this DAG reads from
Step 2: Find the Producing DAG
Tables are typically populated by Airflow DAGs. Find the connection:
-
Search DAGs by name: Use
and look for DAG names matching the table nameaf dags list
->load_customers
tablecustomers
->etl_daily_orders
tableorders
-
Explore DAG source code: Use
to read the DAG definitionaf dags source <dag_id>- Look for INSERT, MERGE, CREATE TABLE statements
- Find the target table in the code
-
Check DAG tasks: Use
to see what operations the DAG performsaf tasks list <dag_id>
On Astro
If you're running on Astro, the Lineage tab in the Astro UI provides visual lineage exploration across DAGs and datasets. Use it to quickly trace upstream dependencies without manually searching DAG source code.
On OSS Airflow
Use DAG source code and task logs to trace lineage (no built-in cross-DAG UI).
Step 3: Trace Data Sources
From the DAG code, identify source tables and systems:
SQL Sources (look for FROM clauses):
# In DAG code: SELECT * FROM source_schema.source_table # <- This is an upstream source
External Sources (look for connection references):
-> S3 bucket sourceS3Operator
-> Postgres database sourcePostgresOperator
-> Salesforce API sourceSalesforceOperator
-> REST API sourceHttpOperator
File Sources:
- CSV/Parquet files in object storage
- SFTP drops
- Local file paths
Step 4: Build the Lineage Chain
Recursively trace each source:
TARGET: analytics.orders_daily ^ +-- DAG: etl_daily_orders ^ +-- SOURCE: raw.orders (table) | ^ | +-- DAG: ingest_orders | ^ | +-- SOURCE: Salesforce API (external) | +-- SOURCE: dim.customers (table) ^ +-- DAG: load_customers ^ +-- SOURCE: PostgreSQL (external DB)
Step 5: Check Source Health
For each upstream source:
- Tables: Check freshness with the checking-freshness skill
- DAGs: Check recent run status with
af dags stats - External systems: Note connection info from DAG code
Lineage for Columns
When tracing a specific column:
- Find the column in the target table schema
- Search DAG source code for references to that column name
- Trace through transformations:
- Direct mappings:
source.col AS target_col - Transformations:
COALESCE(a.col, b.col) AS target_col - Aggregations:
SUM(detail.amount) AS total_amount
- Direct mappings:
Output: Lineage Report
Summary
One-line answer: "This table is populated by DAG X from sources Y and Z"
Lineage Diagram
[Salesforce] --> [raw.opportunities] --> [stg.opportunities] --> [fct.sales] | | DAG: ingest_sfdc DAG: transform_sales
Source Details
| Source | Type | Connection | Freshness | Owner |
|---|---|---|---|---|
| raw.orders | Table | Internal | 2h ago | data-team |
| Salesforce | API | salesforce_conn | Real-time | sales-ops |
Transformation Chain
Describe how data flows and transforms:
- Raw data lands in
via Salesforce API syncraw.orders - DAG
cleans and dedupes intotransform_ordersstg.orders - DAG
joins with dimensions intobuild_order_factsfct.orders
Data Quality Implications
- Single points of failure?
- Stale upstream sources?
- Complex transformation chains that could break?
Related Skills
- Check source freshness: checking-freshness skill
- Debug source DAG: debugging-dags skill
- Trace downstream impacts: tracing-downstream-lineage skill
- Add manual lineage annotations: annotating-task-lineage skill
- Build custom lineage extractors: creating-openlineage-extractors skill