Claude-skill-registry apify-actor
Build and deploy Apify actors for web scraping and automation. Use for serverless scraping, data extraction, browser automation, and API integrations with Python.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/apify-actor" ~/.claude/skills/majiayu000-claude-skill-registry-apify-actor && rm -rf "$T"
skills/data/apify-actor/SKILL.md- curl piped into shell
- global npm install
- pip install
- makes HTTP requests (curl)
Apify Actor Development
Build serverless Apify actors for web scraping, browser automation, and data extraction using Python.
Prerequisites & Setup (MANDATORY)
Before creating or modifying actors, verify that
apify CLI is installed:
Run apify --help.
If it is not installed, you can run:
curl -fsSL https://apify.com/install-cli.sh | bash # Or (Mac): brew install apify-cli # Or (Windows): irm https://apify.com/install-cli.ps1 | iex # Or: npm install -g apify-cli
When the apify CLI is installed, check that it is logged in with:
apify info # Should return your username
If it is not logged in, check if the APIFY_TOKEN environment variable is defined (if not, ask the user to generate one on https://console.apify.com/settings/integrations and then define APIFY_TOKEN with it).
Then run:
apify login -t $APIFY_TOKEN
Quick Start Workflow
Creating a New Actor
- Copy template - Copy all files including hidden ones from the skill's
directory to your new actor directory. The template is located atassets/python-template/
where{base_dir}/assets/python-template/
is the skill's base directory.{base_dir} - Setup pre-commit - Run
for automatic quality checksuv run pre-commit install - Add dependencies - Use
for each required dependencyuv add package-name - Implement logic - Write the actor code in
(thesrc/main.py
entry point is already set up)src/__main__.py - Configure schemas - Update input/output schemas in
and.actor/input_schema.json.actor/output_schema.json - Configure platform settings - Update
with actor metadata.actor/actor.json - Write documentation - Create comprehensive
for the marketplace.actor/ACTOR.md - Test locally - Run
to verify functionalityapify run - Deploy - Run
to deploy the actor on the Apify platformapify push
CRITICAL REMINDERS:
- NEVER create
requirements.txt - NEVER use
orpip installuv pip install - ALWAYS use
to add dependenciesuv add - ALWAYS use
to install dependenciesuv sync - ALWAYS format with
after file changesuv run ruff format . - ALWAYS lint with
after file changesuv run ruff check --fix . - ALWAYS check the
output for build errors before considering deployment completeapify push - Input/output schemas should be updated when changing actor functionality
Core Concepts
Input/Output Pattern
Every actor follows this pattern:
- Input: JSON from key-value store (defined by input schema)
- Process: Actor logic extracts/transforms data
- Output: Results pushed to dataset or key-value store
Storage Types
- Dataset: Structured data (arrays of objects) - use for scraping results and tabular data
- Key-Value Store: Arbitrary data (files, objects) - use for screenshots, PDFs, state, and binary files
- Request Queue: URLs to crawl - use for deep web crawling and multi-page scraping workflows
Project Structure
my-actor/ ├── .actor/ │ ├── actor.json # Actor metadata │ ├── input_schema.json # Input schema │ ├── output_schema.json # Output schema │ ├── ACTOR.md # PUBLIC marketplace documentation (CRITICAL) │ └── datasets/ │ └── dataset_schema.json # Dataset schema with views ├── src/ or package_name/ # Source code │ ├── __init__.py │ ├── __main__.py # Entry point for CLI (REQUIRED) │ └── main.py # Main actor logic ├── tests/ # Test files │ └── test_*.py ├── .dockerignore # Docker build exclusions ├── .pre-commit-config.yaml # Pre-commit hooks ├── Dockerfile # Container config ├── pyproject.toml # Python project config ├── uv.lock # Dependency lock file └── README.md # Development docs
Common Patterns
See
references/python-sdk.md for complete examples of:
- Simple HTTP scraping with BeautifulSoup
- Browser automation with Playwright and Selenium
- Deep crawling with Request Queue
- Proxy management and error handling
- Storage APIs (Dataset, Key-Value Store, Request Queue)
Input Schema Design
Input schemas use JSON Schema format to define and validate actor inputs. See
references/input-schema.md for:
- Field types (string, number, boolean, array, object)
- Special editors (requestListSources, globs, pseudoUrls, proxy, json, textarea)
- Validation patterns (regex, length, range, required fields)
- Complete examples with best practices
Key principles:
- Always include descriptions and examples
- Provide examples for all fields
- Set sensible defaults for ease of use
- Use appropriate editors for better UX
- Add units for numeric fields (pages, seconds, MB)
Output Schema Design
Output schemas define where actors store outputs and provide templates for accessing that data. See
references/output-schema.md for:
- Schema structure and template variables (links.apiDefaultDatasetUrl, links.apiDefaultKeyValueStoreUrl, etc.)
- Dataset and key-value store output configurations
- Multiple output types in a single actor
- Integration with Python code
- Complete examples with emojis and descriptions
Key principles:
- Define all outputs explicitly (even if empty)
- Use descriptive titles with emojis for visual clarity
- Include helpful descriptions for users and LLM integrations
- Match templates to actual storage locations in code
ACTOR.md Documentation (CRITICAL)
The
.actor/ACTOR.md file is the public-facing documentation that users see in the Apify marketplace. This is your actor's main sales page and user guide.
Required sections:
- Title & Description - Clear, compelling one-liner
- What it does - Bullet points of key capabilities
- Input - Example JSON with field explanations
- Output - Example JSON showing expected results
- Use Cases - Who benefits and why (with emojis)
- Standby Mode (if applicable) - API usage examples
- Tips & Best Practices - Performance and configuration guidance
See
assets/python-template/.actor/ACTOR.md for a complete template.
Key principles:
- Write for non-technical users - assume no coding knowledge
- Use emojis to make sections scannable (🎯 🔍 ⚡ 🚀)
- Provide copy-paste ready code examples
- Show actual input/output samples, not schemas
- Highlight benefits and use cases clearly
Modifying Existing Actors
When modifying an existing actor:
- Understand current logic - Read
src/main.py - Check input schema - Review
for expected inputs.actor/input_schema.json - Add dependencies with uv - Use
(NEVER pip install)uv add package-name - Make code changes - Implement the requested features
- Format code - Run
(MANDATORY)uv run ruff format . - Lint code - Run
(MANDATORY)uv run ruff check --fix . - Test changes locally - Use
before deployingapify run - Update schema if needed - Add new fields to input schema
- Deploy - Push changes with
apify push
Debugging Actors
- Test locally - Use
to test actor locally before deploymentapify run - Check storage - Inspect
directory for datasets, key-value stores, and request queues./storage/ - Add logging - Use
,Actor.log.info()
,Actor.log.debug()
(see SDK references)Actor.log.error() - View logs on platform - Check actor run logs in Apify Console for production issues
Best Practices
Code Quality
- Validate input - Always check required fields and formats with clear error messages
- Handle errors - Use try/catch with proper error logging and graceful degradation
- Structured logging - Use Actor.log with extra fields for better debugging
- Type hints - Add type annotations for better code clarity and IDE support
- Docstrings - Document functions and modules for maintainability
- Format with ruff - ALWAYS run
before committinguv run ruff format . - Lint with ruff - ALWAYS run
before deployinguv run ruff check --fix .
Performance & Scalability
- Batch processing - Push data in batches (100-1000 items) for large datasets to reduce API calls
- Use proxies - Avoid IP blocking for web scraping with proxy configuration
- Resource limits - Set appropriate memory limits and timeouts in
.actor/actor.json - Optimize Docker - Use multi-stage builds, bytecode compilation, and minimal base images
- Consider Standby mode - For low-latency (<100ms), high-frequency use cases
Security & Configuration
- Environment variables - Never hardcode secrets; use
and environment variablesActor.config - Input validation - Use JSON Schema patterns, required fields, and runtime validation
- Run as non-root - Use
in Dockerfile for container securitymyuser - Minimize image size - Use
to exclude unnecessary files and reduce build time.dockerignore
Development Workflow
- Testing - Write tests with pytest; use coverage and snapshot testing for reliability
- Pre-commit hooks - Use ruff and pre-commit for consistent code quality (MANDATORY)
- Use uv exclusively - NEVER use pip or requirements.txt; only use
anduv add
(MANDATORY)uv sync - Lock dependencies - Always commit
for reproducible builds (MANDATORY)uv.lock - Test locally - Always test with
before deploying to catch issues earlyapify run - Dataset schemas - Define
with views for better Apify Console UIdataset_schema.json - CLI support - Add CLI entry points via
for local testing and development__main__.py
Standby Mode (Real-time API)
Standby mode allows actors to run as persistent HTTP servers, providing instant responses without cold start delays.
Perfect for:
- Real-time APIs requiring <100ms response times
- Webhook endpoints that need immediate processing
- High-frequency requests (multiple requests per second)
- Integration with real-time services (Slack bots, chat applications, webhooks)
- Low-latency scraping APIs and on-demand data extraction
See
references/standby-mode.md for complete implementation patterns, authentication, and examples.
References
Detailed documentation in
references/:
- SDK patterns and complete code examplespython-sdk.md
- Real-time API implementationstandby-mode.md
- Input validation and UI configurationinput-schema.md
- Output configuration and templatesoutput-schema.md
Troubleshooting
If you need information not covered in this skill, use the WebFetch tool with https://docs.apify.com/llms.txt to access the complete official documentation.