Claude-skill-registry career-growth
Portfolio building, technical interviews, job search strategies, and continuous learning
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/career-growth" ~/.claude/skills/majiayu000-claude-skill-registry-career-growth && rm -rf "$T"
manifest:
skills/data/career-growth/SKILL.mdtags
source content
Career Growth
Professional development strategies for data engineering career advancement.
Quick Start
# Data Engineer Portfolio Checklist ## Required Projects (Pick 3-5) - [ ] End-to-end ETL pipeline (Airflow + dbt) - [ ] Real-time streaming project (Kafka/Spark Streaming) - [ ] Data warehouse design (Snowflake/BigQuery) - [ ] ML pipeline with MLOps (MLflow) - [ ] API for data access (FastAPI) ## Documentation Template Each project should include: 1. Problem statement 2. Architecture diagram 3. Tech stack justification 4. Challenges & solutions 5. Results/metrics 6. GitHub link with clean code
Core Concepts
1. Technical Interview Preparation
# Common coding patterns for data engineering interviews # 1. SQL Window Functions """ Write a query to find the running total of sales by month, and the percentage change from the previous month. """ sql = """ SELECT month, sales, SUM(sales) OVER (ORDER BY month) AS running_total, 100.0 * (sales - LAG(sales) OVER (ORDER BY month)) / NULLIF(LAG(sales) OVER (ORDER BY month), 0) AS pct_change FROM monthly_sales ORDER BY month; """ # 2. Data Processing - Find duplicates def find_duplicates(data: list[dict], key: str) -> list[dict]: """Find duplicate records based on a key.""" seen = {} duplicates = [] for record in data: k = record[key] if k in seen: duplicates.append(record) else: seen[k] = record return duplicates # 3. Implement rate limiter from collections import defaultdict import time class RateLimiter: def __init__(self, max_requests: int, window_seconds: int): self.max_requests = max_requests self.window = window_seconds self.requests = defaultdict(list) def is_allowed(self, user_id: str) -> bool: now = time.time() # Remove old requests self.requests[user_id] = [ t for t in self.requests[user_id] if now - t < self.window ] if len(self.requests[user_id]) < self.max_requests: self.requests[user_id].append(now) return True return False # 4. Design question: Data pipeline for e-commerce """ Requirements: - Process 1M orders/day - Real-time dashboard updates - Historical analytics Architecture: 1. Ingestion: Kafka for real-time events 2. Processing: Spark Streaming for aggregations 3. Storage: Delta Lake for ACID, Snowflake for analytics 4. Serving: Redis for real-time metrics, API for dashboards """
2. Resume Optimization
## Data Engineer Resume Template ### Summary Data Engineer with X years of experience building scalable data pipelines processing Y TB/day. Expert in [Spark/Airflow/dbt]. Reduced pipeline latency by Z% at [Company]. ### Experience Format (STAR Method) **Senior Data Engineer** | Company | 2022-Present - **Situation**: Legacy ETL system processing 500GB daily with 4-hour latency - **Task**: Redesign for real-time analytics - **Action**: Built Spark Streaming pipeline with Delta Lake, implemented incremental processing - **Result**: Reduced latency to 5 minutes, cut infrastructure costs by 40% ### Skills Section **Languages**: Python, SQL, Scala **Frameworks**: Spark, Airflow, dbt, Kafka **Databases**: PostgreSQL, Snowflake, MongoDB, Redis **Cloud**: AWS (Glue, EMR, S3), GCP (BigQuery, Dataflow) **Tools**: Docker, Kubernetes, Terraform, Git ### Quantify Everything - "Built data pipeline" → "Built pipeline processing 2TB/day with 99.9% uptime" - "Improved performance" → "Reduced query time from 30min to 30sec (60x improvement)"
3. Interview Questions to Ask
## Questions for Data Engineering Interviews ### About the Team - What does a typical data pipeline look like here? - How do you handle data quality issues? - What's the tech stack? Any planned migrations? ### About the Role - What would success look like in 6 months? - What's the biggest data challenge the team faces? - How do data engineers collaborate with data scientists? ### About Engineering Practices - How do you handle schema changes in production? - What's your approach to testing data pipelines? - How do you manage technical debt? ### Red Flags to Watch For - "We don't have time for testing" - "One person handles all the data infrastructure" - "We're still on [very outdated technology]" - Vague answers about on-call and incident response
4. Learning Path by Experience Level
## Career Progression ### Junior (0-2 years) Focus Areas: - SQL proficiency (complex queries, optimization) - Python for data processing - One cloud platform deeply (AWS/GCP) - Git and basic CI/CD - Understanding ETL patterns ### Mid-Level (2-5 years) Focus Areas: - Distributed systems (Spark) - Data modeling (dimensional, Data Vault) - Orchestration (Airflow) - Infrastructure as Code - Data quality frameworks ### Senior (5+ years) Focus Areas: - System design and architecture - Cost optimization at scale - Team leadership and mentoring - Cross-functional collaboration - Vendor evaluation and selection ### Staff/Principal (8+ years) Focus Areas: - Organization-wide data strategy - Building data platforms - Technical roadmap ownership - Industry thought leadership
Resources
Learning Platforms
Interview Prep
Community
Books
- "Fundamentals of Data Engineering" - Reis & Housley
- "Designing Data-Intensive Applications" - Kleppmann
- "The Data Warehouse Toolkit" - Kimball
Best Practices
# ✅ DO: - Build public projects on GitHub - Write technical blog posts - Contribute to open source - Network at meetups/conferences - Keep skills current (follow trends) # ❌ DON'T: - Apply without tailoring resume - Neglect soft skills - Stop learning after getting hired - Ignore feedback from interviews - Burn bridges when leaving jobs
Skill Certification Checklist:
- Have 3+ portfolio projects on GitHub
- Can explain system design decisions
- Can solve SQL problems efficiently
- Have updated LinkedIn and resume
- Active in data engineering community