Claude-skill-registry fiftyone-dataset-inference
Run ML model inference (YOLO, YOLOv8, CLIP, SAM, Detectron2, etc.) on FiftyOne datasets. Use when running models, applying detection, classification, segmentation, embeddings, or any model prediction task. Also use for end-to-end workflows that include importing data then running inference.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/fiftyone-dataset-inference" ~/.claude/skills/majiayu000-claude-skill-registry-fiftyone-dataset-inference && rm -rf "$T"
skills/data/fiftyone-dataset-inference/SKILL.mdRun Model Inference on FiftyOne Datasets
Key Directives
ALWAYS follow these rules:
1. Check if dataset exists first
list_datasets()
If the dataset doesn't exist, use the fiftyone-dataset-import skill to load it first.
2. Set context before operations
set_context(dataset_name="my-dataset")
3. Launch App for inference
The App must be running to execute inference operators:
launch_app(dataset_name="my-dataset")
4. Ask user for field names
Always confirm with the user:
- Which model to use
- Label field name for predictions (e.g.,
,predictions
,detections
)embeddings
5. Close app when done
close_app()
Workflow
Step 1: Verify Dataset Exists
list_datasets()
If the dataset is not in the list:
- Ask the user for the data location
- Use the fiftyone-dataset-import skill to import the data first
- Return to this workflow after import completes
Step 2: Load Dataset and Review
set_context(dataset_name="my-dataset") dataset_summary(name="my-dataset")
Review:
- Sample count
- Media type
- Existing label fields
Step 3: Launch App
launch_app(dataset_name="my-dataset")
Step 4: Apply Model Inference
Ask user for:
- Model name (see Available Zoo Models below)
- Label field for predictions
execute_operator( operator_uri="@voxel51/zoo/apply_zoo_model", params={ "tab": "BUILTIN", "model": "yolov8n-coco-torch", "label_field": "predictions" } )
Step 5: View Results
set_view(exists=["predictions"])
Step 6: Clean Up
close_app()
Available Zoo Models
Some models require additional packages. If a model fails with a dependency error, the response includes
install_command. Offer to run it for the user.
Detection Models
| Model | Description | Extra Deps |
|---|---|---|
| Faster R-CNN | None |
| RetinaNet | None |
| YOLOv8 nano (fast) | ultralytics |
| YOLOv8 small | ultralytics |
| YOLOv8 medium | ultralytics |
| YOLOv8 large | ultralytics |
| YOLOv8 extra-large | ultralytics |
Classification Models
| Model | Description | Extra Deps |
|---|---|---|
| ResNet-50 | None |
| MobileNet v2 | None |
| Vision Transformer | None |
Segmentation Models
| Model | Description | Extra Deps |
|---|---|---|
| Segment Anything (base) | segment-anything |
| Segment Anything (large) | segment-anything |
| Segment Anything (huge) | segment-anything |
| DeepLabV3 | None |
Embedding Models
| Model | Description | Extra Deps |
|---|---|---|
| CLIP embeddings | open-clip-torch |
| DINOv2 small | None |
| DINOv2 base | None |
| DINOv2 large | None |
Common Use Cases
Use Case 1: Run Object Detection
# Verify dataset exists list_datasets() # Set context and launch set_context(dataset_name="my-dataset") launch_app(dataset_name="my-dataset") # Apply detection model execute_operator( operator_uri="@voxel51/zoo/apply_zoo_model", params={ "tab": "BUILTIN", "model": "faster-rcnn-resnet50-fpn-coco-torch", "label_field": "predictions" } ) # View results set_view(exists=["predictions"])
Use Case 2: Run Classification
set_context(dataset_name="my-dataset") launch_app(dataset_name="my-dataset") execute_operator( operator_uri="@voxel51/zoo/apply_zoo_model", params={ "tab": "BUILTIN", "model": "resnet50-imagenet-torch", "label_field": "classification" } ) set_view(exists=["classification"])
Use Case 3: Generate Embeddings
set_context(dataset_name="my-dataset") launch_app(dataset_name="my-dataset") execute_operator( operator_uri="@voxel51/zoo/apply_zoo_model", params={ "tab": "BUILTIN", "model": "clip-vit-base32-torch", "label_field": "clip_embeddings" } )
Use Case 4: Compare Ground Truth with Predictions
If dataset has existing labels:
set_context(dataset_name="my-dataset") dataset_summary(name="my-dataset") # Check existing fields launch_app(dataset_name="my-dataset") # Run inference with different field name execute_operator( operator_uri="@voxel51/zoo/apply_zoo_model", params={ "tab": "BUILTIN", "model": "yolov8m-coco-torch", "label_field": "predictions" # Different from ground_truth } ) # View both fields to compare set_view(exists=["ground_truth", "predictions"])
Use Case 5: Run Multiple Models
set_context(dataset_name="my-dataset") launch_app(dataset_name="my-dataset") # Run detection execute_operator( operator_uri="@voxel51/zoo/apply_zoo_model", params={ "tab": "BUILTIN", "model": "yolov8n-coco-torch", "label_field": "detections" } ) # Run classification execute_operator( operator_uri="@voxel51/zoo/apply_zoo_model", params={ "tab": "BUILTIN", "model": "resnet50-imagenet-torch", "label_field": "classification" } ) # Run embeddings execute_operator( operator_uri="@voxel51/zoo/apply_zoo_model", params={ "tab": "BUILTIN", "model": "clip-vit-base32-torch", "label_field": "embeddings" } )
Troubleshooting
Error: "Dataset not found"
- Use
to see available datasetslist_datasets() - Use the fiftyone-dataset-import skill to import data first
Error: "Model not found"
- Check model name spelling
- Use
to see available modelsget_operator_schema("@voxel51/zoo/apply_zoo_model")
Error: "Missing dependency" (e.g., ultralytics, segment-anything)
- The MCP server detects missing dependencies
- Response includes
andmissing_packageinstall_command - Install the required package:
pip install <package> - Restart MCP server after installing
Inference is slow
- Use smaller model variant (e.g.,
instead ofyolov8n
)yolov8x - Use delegated execution for large datasets
- Consider filtering to a view first
Out of memory
- Reduce batch size
- Use smaller model variant
- Process dataset in chunks using views
Best Practices
- Use descriptive field names -
,predictions
,yolo_detectionsclip_embeddings - Don't overwrite ground truth - Use different field names for predictions
- Start with fast models - Use nano/small variants first, upgrade if needed
- Check existing fields - Use
before running inferencedataset_summary() - Filter first for testing - Test on a small view before processing full dataset