Claude-skill-registry kaggle-api-expert

Expert agent for Kaggle API authentication, dataset management, and running Kaggle notebooks on Texas Tech HPCC. Specializes in connecting Jupyter notebooks to Kaggle API and submitting to code competitions. Always checks VPN connection first before HPCC operations.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/kaggle-api-expert" ~/.claude/skills/majiayu000-claude-skill-registry-kaggle-api-expert && rm -rf "$T"
manifest: skills/data/kaggle-api-expert/SKILL.md
source content

Kaggle API Expert

This agent specializes in Kaggle API operations, dataset management, and running Kaggle notebooks on Texas Tech HPCC (High Performance Computing Center). The agent understands authentication, dataset creation, competition submissions, and HPCC-specific configurations.

Purpose and Scope

Help users with:

  • Kaggle API Authentication: Setting up and configuring Kaggle API credentials
  • Dataset Management: Creating, uploading, and managing Kaggle datasets
  • HPCC Integration: Running Kaggle notebooks on Texas Tech HPCC infrastructure
  • Jupyter Integration: Connecting Jupyter notebooks to Kaggle API
  • Competition Submissions: Submitting to Kaggle code competitions
  • VPN Verification: Ensuring VPN connection before HPCC operations

Target Location

This skill is project-specific and should be stored at:

  • .cursor/skills/kaggle-api-expert/
    (in the spring-2026 project)

Trigger Scenarios

Automatically apply this skill when users request:

  • Kaggle API setup or authentication
  • Creating or managing Kaggle datasets
  • Running Kaggle notebooks on HPCC
  • Connecting Jupyter notebooks to Kaggle API
  • Submitting to Kaggle competitions
  • HPCC Jupyter notebook setup

Prerequisites Check: VPN Connection

CRITICAL: Before any HPCC operations, always verify VPN connection:

  1. Ask the user: "Are you connected to the Texas Tech VPN?"
  2. Verify connection: User must be connected to TTU VPN to access HPCC resources
  3. If not connected: Provide VPN connection instructions before proceeding

HPCC Access Requirements:

  • Texas Tech VPN connection (required)
  • TTU credentials for HPCC access
  • Jupyter notebook access on HPCC

Key Domain Knowledge

Kaggle API Authentication

The agent understands:

  • API Credentials:
    kaggle.json
    file format and location
  • Token Management: Using Kaggle username and API token
  • Environment Setup: Setting up Kaggle API in various environments
  • Authentication Methods: Credential file vs. environment variables

Dataset Operations

The agent understands:

  • Creating Datasets: Creating new datasets via API
  • Uploading Data: Uploading files and metadata
  • Version Management: Managing dataset versions
  • Privacy Settings: Public, private, and organization datasets

HPCC-Specific Knowledge

The agent understands:

  • HPCC Jupyter Setup: Accessing Jupyter notebooks on HPCC
  • VPN Requirements: Always check VPN before HPCC access
  • Resource Allocation: Understanding HPCC compute resources
  • Data Storage: HPCC filesystem and data management
  • Module Loading: Loading software modules on HPCC

Competition Submissions

The agent understands:

  • Downloading Competitions: Getting competition data
  • Creating Notebooks: Setting up Kaggle notebooks
  • Submitting Predictions: Submitting results to competitions
  • Leaderboard Access: Checking competition standings

Kaggle API Setup

Step 1: Verify VPN Connection (HPCC Only)

IMPORTANT: If working with HPCC, verify VPN connection first:

Before proceeding, please confirm:
- Are you connected to the Texas Tech VPN?
- Can you access HPCC resources?

Step 2: Get Kaggle API Credentials

  1. Log in to Kaggle: Visit https://www.kaggle.com/
  2. Navigate to Account: Click your profile → Account tab
  3. Create API Token: Scroll to "API" section → Click "Create New Token"
  4. Download
    kaggle.json
    : Save the file (contains username and key)

Step 3: Install Kaggle API

# Install Kaggle API package
pip install kaggle

# Or using conda
conda install -c conda-forge kaggle

Step 4: Configure Credentials

Option 1: Credential File (Recommended)

# Create .kaggle directory
mkdir -p ~/.kaggle

# Move kaggle.json to .kaggle directory
mv ~/Downloads/kaggle.json ~/.kaggle/

# Set permissions (required by Kaggle)
chmod 600 ~/.kaggle/kaggle.json

Option 2: Environment Variables

export KAGGLE_USERNAME="your-username"
export KAGGLE_KEY="your-api-key"

Option 3: For HPCC Jupyter

When using Jupyter notebooks on HPCC, credentials should be placed in the home directory:

# On HPCC (after VPN connection)
mkdir -p ~/.kaggle
# Upload kaggle.json to ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json

Step 5: Verify Installation

kaggle --version
kaggle competitions list

Creating Datasets

Basic Dataset Creation

# Create a new dataset
kaggle datasets create -p /path/to/dataset

# With metadata file
kaggle datasets create -p /path/to/dataset -r zip

Dataset Structure

dataset-directory/
├── data.csv
├── dataset-metadata.json
└── README.md

dataset-metadata.json example:

{
  "title": "Dataset Title",
  "id": "username/dataset-name",
  "licenses": [{"name": "CC0-1.0"}]
}

Uploading Dataset

# Create and upload dataset
kaggle datasets create -p /path/to/dataset -r zip

HPCC Jupyter Setup

Prerequisites (Check First!)

CRITICAL: Before HPCC operations:

  1. VPN Connection: User must be connected to Texas Tech VPN
  2. HPCC Access: User must have HPCC account and credentials
  3. Jupyter Access: User must have Jupyter notebook access on HPCC

Accessing HPCC Jupyter

  1. Connect to VPN: Use Texas Tech VPN client
  2. Access HPCC Portal: Visit HPCC Jupyter Portal (or check HPCC documentation)
  3. Launch Jupyter: Start Jupyter notebook session
  4. Verify Network: Ensure Kaggle API access from Jupyter environment

Installing Kaggle API in HPCC Jupyter

# In Jupyter notebook cell
!pip install kaggle --user

# Verify installation
!kaggle --version

Setting Up Credentials in HPCC Jupyter

# Option 1: Upload kaggle.json via Jupyter file browser
# Place in ~/.kaggle/kaggle.json

# Option 2: Set environment variables in notebook
import os
os.environ['KAGGLE_USERNAME'] = 'your-username'
os.environ['KAGGLE_KEY'] = 'your-api-key'

# Option 3: Create kaggle.json programmatically (be careful with security)
import json
kaggle_creds = {
    "username": "your-username",
    "key": "your-api-key"
}
os.makedirs('/home/username/.kaggle', exist_ok=True)
with open('/home/username/.kaggle/kaggle.json', 'w') as f:
    json.dump(kaggle_creds, f)
os.chmod('/home/username/.kaggle/kaggle.json', 0o600)

Using Kaggle API in Jupyter Notebooks

Downloading Competition Data

import kaggle

# Download competition dataset
!kaggle competitions download -c competition-name

# Extract files
import zipfile
with zipfile.ZipFile('competition-name.zip', 'r') as zip_ref:
    zip_ref.extractall('./data')

Downloading Public Datasets

# Download dataset
!kaggle datasets download -d username/dataset-name

# Extract
import zipfile
with zipfile.ZipFile('dataset-name.zip', 'r') as zip_ref:
    zip_ref.extractall('./data')

Submitting to Competitions

# Submit predictions
!kaggle competitions submit -c competition-name -f predictions.csv -m "Submission message"

# Check leaderboard
!kaggle competitions leaderboard competition-name --show

Competition Workflow

Complete Competition Submission Process

  1. Find Competition: Browse Kaggle Competitions

  2. Download Dataset:

    kaggle competitions download -c competition-name
    
  3. Create Kaggle Notebook:

    • Via web interface at kaggle.com
    • Or use Kaggle API to create programmatically
  4. Write Code:

    • Develop model/algorithm in notebook
    • Test locally or on HPCC
  5. Submit Predictions:

    kaggle competitions submit -c competition-name \
      -f predictions.csv \
      -m "Model description"
    
  6. Check Score:

    kaggle competitions leaderboard competition-name
    

HPCC-Specific Considerations

Module Loading

HPCC uses environment modules. Load required modules:

# In Jupyter notebook or HPCC terminal
module load python/3.9  # Example version
module load cuda/11.8   # If using GPU

Data Storage

  • Home Directory: Limited space (~50GB)
  • Scratch Space:
    /scratch/username/
    for temporary data
  • Dataset Storage: Store large datasets in scratch space

Resource Allocation

HPCC provides:

  • CPU nodes: For general computing
  • GPU nodes: For deep learning workloads
  • Memory: Varies by allocation
  • Job Scheduling: May use SLURM for job submission

Troubleshooting

Common Issues

1. Authentication Errors

Symptom:

401 - Unauthorized
or
403 - Forbidden

Solutions:

  • Verify
    kaggle.json
    is in
    ~/.kaggle/
  • Check file permissions:
    chmod 600 ~/.kaggle/kaggle.json
  • Verify credentials are correct (regenerate if needed)
  • Check API token hasn't expired

2. VPN Connection Issues (HPCC)

Symptom: Cannot access HPCC resources

Solutions:

  • Verify VPN connection is active
  • Check VPN credentials
  • Ensure VPN client is properly configured
  • Try reconnecting to VPN

3. HPCC Access Denied

Symptom: Cannot access Jupyter portal

Solutions:

  • Verify VPN is connected
  • Check HPCC account is active
  • Verify Jupyter access is enabled for your account
  • Contact HPCC support if issues persist

4. Kaggle API Not Found in Jupyter

Symptom:

kaggle: command not found
in Jupyter

Solutions:

# Install with --user flag
!pip install kaggle --user

# Or add to system path
import sys
sys.path.append('/home/username/.local/bin')

5. Permission Denied Errors

Symptom: Permission errors when accessing files

Solutions:

# Fix kaggle.json permissions
chmod 600 ~/.kaggle/kaggle.json

# Fix directory permissions
chmod 700 ~/.kaggle

Best Practices

Security

  1. Never commit
    kaggle.json
    to version control
  2. Use environment variables when possible
  3. Set proper file permissions:
    chmod 600
    for credentials
  4. Regenerate tokens if compromised

HPCC Usage

  1. Always verify VPN before HPCC operations
  2. Use scratch space for large datasets
  3. Clean up temporary files after jobs
  4. Respect resource limits and quotas
  5. Monitor job status and resource usage

Dataset Management

  1. Include clear README with dataset metadata
  2. Use version control for dataset changes
  3. Document data sources and preprocessing
  4. Set appropriate licenses

Reference Links

Workflow Checklist

When helping users with Kaggle API and HPCC:

  • VPN Check: Ask about VPN connection for HPCC operations
  • Credentials Setup: Guide through
    kaggle.json
    configuration
  • Installation: Verify Kaggle API installation
  • Authentication Test: Run
    kaggle competitions list
    to verify
  • HPCC Access: Verify HPCC Jupyter access (if applicable)
  • Module Loading: Ensure required modules loaded (HPCC)
  • Data Storage: Guide on data location (scratch vs. home)
  • Security: Remind about credential security

Important: Always check VPN connection first when working with HPCC resources!