Claude-skill-registry apify-actor

Build and deploy Apify actors for web scraping and automation. Use for serverless scraping, data extraction, browser automation, and API integrations with Python.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/apify-actor" ~/.claude/skills/majiayu000-claude-skill-registry-apify-actor && rm -rf "$T"

manifest: skills/data/apify-actor/SKILL.md

Apify Actor Development

Build serverless Apify actors for web scraping, browser automation, and data extraction using Python.

Prerequisites & Setup (MANDATORY)

Before creating or modifying actors, verify that

apify

CLI is installed: Run

apify --help

If it is not installed, you can run:

curl -fsSL https://apify.com/install-cli.sh | bash

# Or (Mac): brew install apify-cli
# Or (Windows): irm https://apify.com/install-cli.ps1 | iex
# Or: npm install -g apify-cli

When the apify CLI is installed, check that it is logged in with:

apify info  # Should return your username

If it is not logged in, check if the APIFY_TOKEN environment variable is defined (if not, ask the user to generate one on https://console.apify.com/settings/integrations and then define APIFY_TOKEN with it).

Then run:

apify login -t $APIFY_TOKEN

Quick Start Workflow

Creating a New Actor

Copy template - Copy all files including hidden ones from the skill's
```
assets/python-template/
```
directory to your new actor directory. The template is located at
```
{base_dir}/assets/python-template/
```
where
```
{base_dir}
```
is the skill's base directory.
Setup pre-commit - Run
```
uv run pre-commit install
```
for automatic quality checks
Add dependencies - Use
```
uv add package-name
```
for each required dependency
Implement logic - Write the actor code in
```
src/main.py
```
(the
```
src/__main__.py
```
entry point is already set up)
Configure schemas - Update input/output schemas in
```
.actor/input_schema.json
```
and
```
.actor/output_schema.json
```
Configure platform settings - Update
```
.actor/actor.json
```
with actor metadata
Write documentation - Create comprehensive
```
.actor/ACTOR.md
```
for the marketplace
Test locally - Run
```
apify run
```
to verify functionality
Deploy - Run
```
apify push
```
to deploy the actor on the Apify platform

CRITICAL REMINDERS:

NEVER create
```
requirements.txt
```
NEVER use
```
pip install
```
or
```
uv pip install
```
ALWAYS use
```
uv add
```
to add dependencies
ALWAYS use
```
uv sync
```
to install dependencies
ALWAYS format with
```
uv run ruff format .
```
after file changes
ALWAYS lint with
```
uv run ruff check --fix .
```
after file changes
ALWAYS check the
```
apify push
```
output for build errors before considering deployment complete
Input/output schemas should be updated when changing actor functionality

Core Concepts

Input/Output Pattern

Every actor follows this pattern:

Input: JSON from key-value store (defined by input schema)
Process: Actor logic extracts/transforms data
Output: Results pushed to dataset or key-value store

Storage Types

Dataset: Structured data (arrays of objects) - use for scraping results and tabular data
Key-Value Store: Arbitrary data (files, objects) - use for screenshots, PDFs, state, and binary files
Request Queue: URLs to crawl - use for deep web crawling and multi-page scraping workflows

Project Structure

my-actor/
├── .actor/
│   ├── actor.json                    # Actor metadata
│   ├── input_schema.json             # Input schema
│   ├── output_schema.json            # Output schema
│   ├── ACTOR.md                      # PUBLIC marketplace documentation (CRITICAL)
│   └── datasets/
│       └── dataset_schema.json       # Dataset schema with views
├── src/ or package_name/             # Source code
│   ├── __init__.py
│   ├── __main__.py                   # Entry point for CLI (REQUIRED)
│   └── main.py                       # Main actor logic
├── tests/                            # Test files
│   └── test_*.py
├── .dockerignore                     # Docker build exclusions
├── .pre-commit-config.yaml           # Pre-commit hooks
├── Dockerfile                        # Container config
├── pyproject.toml                    # Python project config
├── uv.lock                          # Dependency lock file
└── README.md                         # Development docs

Common Patterns

See

references/python-sdk.md

for complete examples of:

Simple HTTP scraping with BeautifulSoup
Browser automation with Playwright and Selenium
Deep crawling with Request Queue
Proxy management and error handling
Storage APIs (Dataset, Key-Value Store, Request Queue)

Input Schema Design

Input schemas use JSON Schema format to define and validate actor inputs. See

references/input-schema.md

for:

Field types (string, number, boolean, array, object)
Special editors (requestListSources, globs, pseudoUrls, proxy, json, textarea)
Validation patterns (regex, length, range, required fields)
Complete examples with best practices

Key principles:

Always include descriptions and examples
Provide examples for all fields
Set sensible defaults for ease of use
Use appropriate editors for better UX
Add units for numeric fields (pages, seconds, MB)

Output Schema Design

Output schemas define where actors store outputs and provide templates for accessing that data. See

references/output-schema.md

for:

Schema structure and template variables (links.apiDefaultDatasetUrl, links.apiDefaultKeyValueStoreUrl, etc.)
Dataset and key-value store output configurations
Multiple output types in a single actor
Integration with Python code
Complete examples with emojis and descriptions

Key principles:

Define all outputs explicitly (even if empty)
Use descriptive titles with emojis for visual clarity
Include helpful descriptions for users and LLM integrations
Match templates to actual storage locations in code

ACTOR.md Documentation (CRITICAL)

The

.actor/ACTOR.md

file is the public-facing documentation that users see in the Apify marketplace. This is your actor's main sales page and user guide.

Required sections:

Title & Description - Clear, compelling one-liner
What it does - Bullet points of key capabilities
Input - Example JSON with field explanations
Output - Example JSON showing expected results
Use Cases - Who benefits and why (with emojis)
Standby Mode (if applicable) - API usage examples
Tips & Best Practices - Performance and configuration guidance

See

assets/python-template/.actor/ACTOR.md

for a complete template.

Key principles:

Write for non-technical users - assume no coding knowledge
Use emojis to make sections scannable (🎯 🔍 ⚡ 🚀)
Provide copy-paste ready code examples
Show actual input/output samples, not schemas
Highlight benefits and use cases clearly

Modifying Existing Actors

When modifying an existing actor:

Understand current logic - Read
```
src/main.py
```
Check input schema - Review
```
.actor/input_schema.json
```
for expected inputs
Add dependencies with uv - Use
```
uv add package-name
```
(NEVER pip install)
Make code changes - Implement the requested features
Format code - Run
```
uv run ruff format .
```
(MANDATORY)
Lint code - Run
```
uv run ruff check --fix .
```
(MANDATORY)
Test changes locally - Use
```
apify run
```
before deploying
Update schema if needed - Add new fields to input schema
Deploy - Push changes with
```
apify push
```

Debugging Actors

Test locally - Use
```
apify run
```
to test actor locally before deployment
Check storage - Inspect
```
./storage/
```
directory for datasets, key-value stores, and request queues

Add logging - Use

Actor.log.info()

Actor.log.debug()

Actor.log.error()

(see SDK references)

View logs on platform - Check actor run logs in Apify Console for production issues

Best Practices

Code Quality

Validate input - Always check required fields and formats with clear error messages
Handle errors - Use try/catch with proper error logging and graceful degradation
Structured logging - Use Actor.log with extra fields for better debugging
Type hints - Add type annotations for better code clarity and IDE support
Docstrings - Document functions and modules for maintainability
Format with ruff - ALWAYS run
```
uv run ruff format .
```
before committing
Lint with ruff - ALWAYS run
```
uv run ruff check --fix .
```
before deploying

Performance & Scalability

Batch processing - Push data in batches (100-1000 items) for large datasets to reduce API calls
Use proxies - Avoid IP blocking for web scraping with proxy configuration
Resource limits - Set appropriate memory limits and timeouts in
```
.actor/actor.json
```
Optimize Docker - Use multi-stage builds, bytecode compilation, and minimal base images
Consider Standby mode - For low-latency (<100ms), high-frequency use cases

Security & Configuration

Environment variables - Never hardcode secrets; use
```
Actor.config
```
and environment variables
Input validation - Use JSON Schema patterns, required fields, and runtime validation
Run as non-root - Use
```
myuser
```
in Dockerfile for container security
Minimize image size - Use
```
.dockerignore
```
to exclude unnecessary files and reduce build time

Development Workflow

Testing - Write tests with pytest; use coverage and snapshot testing for reliability
Pre-commit hooks - Use ruff and pre-commit for consistent code quality (MANDATORY)
Use uv exclusively - NEVER use pip or requirements.txt; only use
```
uv add
```
and
```
uv sync
```
(MANDATORY)
Lock dependencies - Always commit
```
uv.lock
```
for reproducible builds (MANDATORY)
Test locally - Always test with
```
apify run
```
before deploying to catch issues early
Dataset schemas - Define
```
dataset_schema.json
```
with views for better Apify Console UI
CLI support - Add CLI entry points via
```
__main__.py
```
for local testing and development

Standby Mode (Real-time API)

Standby mode allows actors to run as persistent HTTP servers, providing instant responses without cold start delays.

Perfect for:

Real-time APIs requiring <100ms response times
Webhook endpoints that need immediate processing
High-frequency requests (multiple requests per second)
Integration with real-time services (Slack bots, chat applications, webhooks)
Low-latency scraping APIs and on-demand data extraction

See

references/standby-mode.md

for complete implementation patterns, authentication, and examples.

References

Detailed documentation in

references/

```
python-sdk.md
```
- SDK patterns and complete code examples
```
standby-mode.md
```
- Real-time API implementation
```
input-schema.md
```
- Input validation and UI configuration
```
output-schema.md
```
- Output configuration and templates

Troubleshooting

If you need information not covered in this skill, use the WebFetch tool with https://docs.apify.com/llms.txt to access the complete official documentation.