Claude-skill-registry kaggle-api-expert
Expert agent for Kaggle API authentication, dataset management, and running Kaggle notebooks on Texas Tech HPCC. Specializes in connecting Jupyter notebooks to Kaggle API and submitting to code competitions. Always checks VPN connection first before HPCC operations.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/kaggle-api-expert" ~/.claude/skills/majiayu000-claude-skill-registry-kaggle-api-expert && rm -rf "$T"
skills/data/kaggle-api-expert/SKILL.mdKaggle API Expert
This agent specializes in Kaggle API operations, dataset management, and running Kaggle notebooks on Texas Tech HPCC (High Performance Computing Center). The agent understands authentication, dataset creation, competition submissions, and HPCC-specific configurations.
Purpose and Scope
Help users with:
- Kaggle API Authentication: Setting up and configuring Kaggle API credentials
- Dataset Management: Creating, uploading, and managing Kaggle datasets
- HPCC Integration: Running Kaggle notebooks on Texas Tech HPCC infrastructure
- Jupyter Integration: Connecting Jupyter notebooks to Kaggle API
- Competition Submissions: Submitting to Kaggle code competitions
- VPN Verification: Ensuring VPN connection before HPCC operations
Target Location
This skill is project-specific and should be stored at:
(in the spring-2026 project).cursor/skills/kaggle-api-expert/
Trigger Scenarios
Automatically apply this skill when users request:
- Kaggle API setup or authentication
- Creating or managing Kaggle datasets
- Running Kaggle notebooks on HPCC
- Connecting Jupyter notebooks to Kaggle API
- Submitting to Kaggle competitions
- HPCC Jupyter notebook setup
Prerequisites Check: VPN Connection
CRITICAL: Before any HPCC operations, always verify VPN connection:
- Ask the user: "Are you connected to the Texas Tech VPN?"
- Verify connection: User must be connected to TTU VPN to access HPCC resources
- If not connected: Provide VPN connection instructions before proceeding
HPCC Access Requirements:
- Texas Tech VPN connection (required)
- TTU credentials for HPCC access
- Jupyter notebook access on HPCC
Key Domain Knowledge
Kaggle API Authentication
The agent understands:
- API Credentials:
file format and locationkaggle.json - Token Management: Using Kaggle username and API token
- Environment Setup: Setting up Kaggle API in various environments
- Authentication Methods: Credential file vs. environment variables
Dataset Operations
The agent understands:
- Creating Datasets: Creating new datasets via API
- Uploading Data: Uploading files and metadata
- Version Management: Managing dataset versions
- Privacy Settings: Public, private, and organization datasets
HPCC-Specific Knowledge
The agent understands:
- HPCC Jupyter Setup: Accessing Jupyter notebooks on HPCC
- VPN Requirements: Always check VPN before HPCC access
- Resource Allocation: Understanding HPCC compute resources
- Data Storage: HPCC filesystem and data management
- Module Loading: Loading software modules on HPCC
Competition Submissions
The agent understands:
- Downloading Competitions: Getting competition data
- Creating Notebooks: Setting up Kaggle notebooks
- Submitting Predictions: Submitting results to competitions
- Leaderboard Access: Checking competition standings
Kaggle API Setup
Step 1: Verify VPN Connection (HPCC Only)
IMPORTANT: If working with HPCC, verify VPN connection first:
Before proceeding, please confirm: - Are you connected to the Texas Tech VPN? - Can you access HPCC resources?
Step 2: Get Kaggle API Credentials
- Log in to Kaggle: Visit https://www.kaggle.com/
- Navigate to Account: Click your profile → Account tab
- Create API Token: Scroll to "API" section → Click "Create New Token"
- Download
: Save the file (contains username and key)kaggle.json
Step 3: Install Kaggle API
# Install Kaggle API package pip install kaggle # Or using conda conda install -c conda-forge kaggle
Step 4: Configure Credentials
Option 1: Credential File (Recommended)
# Create .kaggle directory mkdir -p ~/.kaggle # Move kaggle.json to .kaggle directory mv ~/Downloads/kaggle.json ~/.kaggle/ # Set permissions (required by Kaggle) chmod 600 ~/.kaggle/kaggle.json
Option 2: Environment Variables
export KAGGLE_USERNAME="your-username" export KAGGLE_KEY="your-api-key"
Option 3: For HPCC Jupyter
When using Jupyter notebooks on HPCC, credentials should be placed in the home directory:
# On HPCC (after VPN connection) mkdir -p ~/.kaggle # Upload kaggle.json to ~/.kaggle/kaggle.json chmod 600 ~/.kaggle/kaggle.json
Step 5: Verify Installation
kaggle --version kaggle competitions list
Creating Datasets
Basic Dataset Creation
# Create a new dataset kaggle datasets create -p /path/to/dataset # With metadata file kaggle datasets create -p /path/to/dataset -r zip
Dataset Structure
dataset-directory/ ├── data.csv ├── dataset-metadata.json └── README.md
dataset-metadata.json example:
{ "title": "Dataset Title", "id": "username/dataset-name", "licenses": [{"name": "CC0-1.0"}] }
Uploading Dataset
# Create and upload dataset kaggle datasets create -p /path/to/dataset -r zip
HPCC Jupyter Setup
Prerequisites (Check First!)
CRITICAL: Before HPCC operations:
- ✅ VPN Connection: User must be connected to Texas Tech VPN
- ✅ HPCC Access: User must have HPCC account and credentials
- ✅ Jupyter Access: User must have Jupyter notebook access on HPCC
Accessing HPCC Jupyter
- Connect to VPN: Use Texas Tech VPN client
- Access HPCC Portal: Visit HPCC Jupyter Portal (or check HPCC documentation)
- Launch Jupyter: Start Jupyter notebook session
- Verify Network: Ensure Kaggle API access from Jupyter environment
Installing Kaggle API in HPCC Jupyter
# In Jupyter notebook cell !pip install kaggle --user # Verify installation !kaggle --version
Setting Up Credentials in HPCC Jupyter
# Option 1: Upload kaggle.json via Jupyter file browser # Place in ~/.kaggle/kaggle.json # Option 2: Set environment variables in notebook import os os.environ['KAGGLE_USERNAME'] = 'your-username' os.environ['KAGGLE_KEY'] = 'your-api-key' # Option 3: Create kaggle.json programmatically (be careful with security) import json kaggle_creds = { "username": "your-username", "key": "your-api-key" } os.makedirs('/home/username/.kaggle', exist_ok=True) with open('/home/username/.kaggle/kaggle.json', 'w') as f: json.dump(kaggle_creds, f) os.chmod('/home/username/.kaggle/kaggle.json', 0o600)
Using Kaggle API in Jupyter Notebooks
Downloading Competition Data
import kaggle # Download competition dataset !kaggle competitions download -c competition-name # Extract files import zipfile with zipfile.ZipFile('competition-name.zip', 'r') as zip_ref: zip_ref.extractall('./data')
Downloading Public Datasets
# Download dataset !kaggle datasets download -d username/dataset-name # Extract import zipfile with zipfile.ZipFile('dataset-name.zip', 'r') as zip_ref: zip_ref.extractall('./data')
Submitting to Competitions
# Submit predictions !kaggle competitions submit -c competition-name -f predictions.csv -m "Submission message" # Check leaderboard !kaggle competitions leaderboard competition-name --show
Competition Workflow
Complete Competition Submission Process
-
Find Competition: Browse Kaggle Competitions
-
Download Dataset:
kaggle competitions download -c competition-name -
Create Kaggle Notebook:
- Via web interface at kaggle.com
- Or use Kaggle API to create programmatically
-
Write Code:
- Develop model/algorithm in notebook
- Test locally or on HPCC
-
Submit Predictions:
kaggle competitions submit -c competition-name \ -f predictions.csv \ -m "Model description" -
Check Score:
kaggle competitions leaderboard competition-name
HPCC-Specific Considerations
Module Loading
HPCC uses environment modules. Load required modules:
# In Jupyter notebook or HPCC terminal module load python/3.9 # Example version module load cuda/11.8 # If using GPU
Data Storage
- Home Directory: Limited space (~50GB)
- Scratch Space:
for temporary data/scratch/username/ - Dataset Storage: Store large datasets in scratch space
Resource Allocation
HPCC provides:
- CPU nodes: For general computing
- GPU nodes: For deep learning workloads
- Memory: Varies by allocation
- Job Scheduling: May use SLURM for job submission
Troubleshooting
Common Issues
1. Authentication Errors
Symptom:
401 - Unauthorized or 403 - Forbidden
Solutions:
- Verify
is inkaggle.json~/.kaggle/ - Check file permissions:
chmod 600 ~/.kaggle/kaggle.json - Verify credentials are correct (regenerate if needed)
- Check API token hasn't expired
2. VPN Connection Issues (HPCC)
Symptom: Cannot access HPCC resources
Solutions:
- Verify VPN connection is active
- Check VPN credentials
- Ensure VPN client is properly configured
- Try reconnecting to VPN
3. HPCC Access Denied
Symptom: Cannot access Jupyter portal
Solutions:
- Verify VPN is connected
- Check HPCC account is active
- Verify Jupyter access is enabled for your account
- Contact HPCC support if issues persist
4. Kaggle API Not Found in Jupyter
Symptom:
kaggle: command not found in Jupyter
Solutions:
# Install with --user flag !pip install kaggle --user # Or add to system path import sys sys.path.append('/home/username/.local/bin')
5. Permission Denied Errors
Symptom: Permission errors when accessing files
Solutions:
# Fix kaggle.json permissions chmod 600 ~/.kaggle/kaggle.json # Fix directory permissions chmod 700 ~/.kaggle
Best Practices
Security
- Never commit
to version controlkaggle.json - Use environment variables when possible
- Set proper file permissions:
for credentialschmod 600 - Regenerate tokens if compromised
HPCC Usage
- Always verify VPN before HPCC operations
- Use scratch space for large datasets
- Clean up temporary files after jobs
- Respect resource limits and quotas
- Monitor job status and resource usage
Dataset Management
- Include clear README with dataset metadata
- Use version control for dataset changes
- Document data sources and preprocessing
- Set appropriate licenses
Reference Links
- Kaggle API Documentation: https://github.com/Kaggle/kaggle-api
- HPCC Jupyter Guide: https://www.depts.ttu.edu/hpcc/userguides/general_guides/jupyter-notebooks.php
- Kaggle Tutorials: https://github.com/Kaggle/kaggle-api/blob/main/docs/tutorials.md
- Kaggle Competitions: https://www.kaggle.com/competitions
Workflow Checklist
When helping users with Kaggle API and HPCC:
- VPN Check: Ask about VPN connection for HPCC operations
- Credentials Setup: Guide through
configurationkaggle.json - Installation: Verify Kaggle API installation
- Authentication Test: Run
to verifykaggle competitions list - HPCC Access: Verify HPCC Jupyter access (if applicable)
- Module Loading: Ensure required modules loaded (HPCC)
- Data Storage: Guide on data location (scratch vs. home)
- Security: Remind about credential security
Important: Always check VPN connection first when working with HPCC resources!