Marketplace fiftyone-dataset-inference
Create a FiftyOne dataset from a directory of media files (images, videos, point clouds), optionally import labels in common formats (COCO, YOLO, VOC), run model inference, and store predictions. Use when users want to load local files into FiftyOne, apply ML models for detection, classification, or segmentation, or build end-to-end inference pipelines.
git clone https://github.com/aiskillstore/marketplace
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiskillstore/marketplace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/adonaivera/fiftyone-dataset-inference" ~/.claude/skills/aiskillstore-marketplace-fiftyone-dataset-inference && rm -rf "$T"
skills/adonaivera/fiftyone-dataset-inference/SKILL.mdCreate Dataset and Run Inference
Overview
Create FiftyOne datasets from local directories, import labels in standard formats, and run model inference to generate predictions.
Use this skill when:
- Loading images, videos, or point clouds from a directory
- Importing labeled datasets (COCO, YOLO, VOC, CVAT, etc.)
- Running model inference on media files
- Building end-to-end ML pipelines
Prerequisites
- FiftyOne MCP server installed and running
plugin for importing data@voxel51/io
plugin for model inference@voxel51/zoo
plugin for dataset management@voxel51/utils
Key Directives
ALWAYS follow these rules:
1. Explore directory first
Scan the user's directory before importing to detect media types and label formats.
2. Confirm with user
Present findings and get confirmation before creating datasets or running inference.
3. Set context before operations
set_context(dataset_name="my-dataset")
4. Launch App for inference
launch_app(dataset_name="my-dataset")
5. User specifies field names
Always ask the user for:
- Dataset name
- Label field for predictions
6. Close app when done
close_app()
Workflow
Step 1: Explore the Directory
Use Bash to scan the user's directory:
ls -la /path/to/directory find /path/to/directory -type f | head -20
Identify media files and label files. See Supported Dataset Types section for format detection.
Step 2: Present Findings to User
Before creating the dataset, confirm with the user:
I found the following in /path/to/directory: - 150 image files (.jpg, .png) - Labels: COCO format (annotations.json) Proposed dataset name: "my-dataset" Label field: "ground_truth" Should I proceed with these settings?
Step 3: Create Dataset
execute_operator( operator_uri="@voxel51/utils/create_dataset", params={ "name": "my-dataset", "persistent": true } )
Step 4: Set Context
Set context to the newly created dataset before importing:
set_context(dataset_name="my-dataset")
Step 5: Import Samples
For media only (no labels):
execute_operator( operator_uri="@voxel51/io/import_samples", params={ "import_type": "MEDIA_ONLY", "style": "DIRECTORY", "directory": {"absolute_path": "/path/to/images"} } )
For media with labels:
execute_operator( operator_uri="@voxel51/io/import_samples", params={ "import_type": "MEDIA_AND_LABELS", "dataset_type": "COCO", "data_path": {"absolute_path": "/path/to/images"}, "labels_path": {"absolute_path": "/path/to/annotations.json"}, "label_field": "ground_truth" } )
Step 6: Validate Import
Verify samples imported correctly by comparing with source:
load_dataset(name="my-dataset")
Compare
num_samples with the file count from Step 1. Report any discrepancy to the user.
Step 7: Launch App
launch_app(dataset_name="my-dataset")
Step 8: Apply Model Inference
Ask user for model name and label field for predictions.
execute_operator( operator_uri="@voxel51/zoo/apply_zoo_model", params={ "tab": "BUILTIN", "model": "yolov8n-coco-torch", "label_field": "predictions" } )
Step 9: View Results
set_view(exists=["predictions"])
Step 10: Clean Up
close_app()
Supported Media Types
| Extensions | Media Type |
|---|---|
, , , , , | image |
, , , , | video |
| point-cloud |
| 3d |
Supported Dataset Types
| Value | File Pattern | Label Types |
|---|---|---|
| Folder per class | classification |
| Folder per class | classification |
| | detections, segmentations, keypoints |
| per image | detections |
| per image | detections |
| + | detections |
| + | detections |
| Single file | classifications, detections, polylines, keypoints |
| XML directory | frame labels |
| TFRecords | classification |
| TFRecords | detections |
Common Zoo Models
Popular models for
apply_zoo_model. Some models require additional packages - if a model fails with a dependency error, the response includes the install_command. Offer to run it for the user.
Detection (PyTorch only):
- Faster R-CNN (no extra deps)faster-rcnn-resnet50-fpn-coco-torch
- RetinaNet (no extra deps)retinanet-resnet50-fpn-coco-torch
Detection (requires ultralytics):
- YOLOv8 nano (fast)yolov8n-coco-torch
- YOLOv8 smallyolov8s-coco-torch
- YOLOv8 mediumyolov8m-coco-torch
Classification:
- ResNet-50resnet50-imagenet-torch
- MobileNet v2mobilenet-v2-imagenet-torch
Segmentation:
- Segment Anythingsam-vit-base-hq-torch
- DeepLabV3deeplabv3-resnet101-coco-torch
Embeddings:
- CLIP embeddingsclip-vit-base32-torch
- DINOv2 embeddingsdinov2-vits14-torch
Common Use Cases
Use Case 1: Load Images and Run Detection
execute_operator( operator_uri="@voxel51/utils/create_dataset", params={"name": "my-images", "persistent": true} ) set_context(dataset_name="my-images") execute_operator( operator_uri="@voxel51/io/import_samples", params={ "import_type": "MEDIA_ONLY", "style": "DIRECTORY", "directory": {"absolute_path": "/path/to/images"} } ) load_dataset(name="my-images") # Validate import launch_app(dataset_name="my-images") execute_operator( operator_uri="@voxel51/zoo/apply_zoo_model", params={ "tab": "BUILTIN", "model": "faster-rcnn-resnet50-fpn-coco-torch", "label_field": "predictions" } ) set_view(exists=["predictions"])
Use Case 2: Import COCO Dataset and Add Predictions
execute_operator( operator_uri="@voxel51/utils/create_dataset", params={"name": "coco-dataset", "persistent": true} ) set_context(dataset_name="coco-dataset") execute_operator( operator_uri="@voxel51/io/import_samples", params={ "import_type": "MEDIA_AND_LABELS", "dataset_type": "COCO", "data_path": {"absolute_path": "/path/to/images"}, "labels_path": {"absolute_path": "/path/to/annotations.json"}, "label_field": "ground_truth" } ) load_dataset(name="coco-dataset") # Validate import launch_app(dataset_name="coco-dataset") execute_operator( operator_uri="@voxel51/zoo/apply_zoo_model", params={ "tab": "BUILTIN", "model": "faster-rcnn-resnet50-fpn-coco-torch", "label_field": "predictions" } ) set_view(exists=["predictions"])
Use Case 3: Import YOLO Dataset
execute_operator( operator_uri="@voxel51/utils/create_dataset", params={"name": "yolo-dataset", "persistent": true} ) set_context(dataset_name="yolo-dataset") execute_operator( operator_uri="@voxel51/io/import_samples", params={ "import_type": "MEDIA_AND_LABELS", "dataset_type": "YOLOv5", "dataset_dir": {"absolute_path": "/path/to/yolo/dataset"}, "label_field": "ground_truth" } ) load_dataset(name="yolo-dataset") launch_app(dataset_name="yolo-dataset")
Use Case 4: Classification with Directory Tree
For a folder structure like:
/dataset/ /cats/ cat1.jpg cat2.jpg /dogs/ dog1.jpg dog2.jpg
execute_operator( operator_uri="@voxel51/utils/create_dataset", params={"name": "classification-dataset", "persistent": true} ) set_context(dataset_name="classification-dataset") execute_operator( operator_uri="@voxel51/io/import_samples", params={ "import_type": "MEDIA_AND_LABELS", "dataset_type": "Image Classification Directory Tree", "dataset_dir": {"absolute_path": "/path/to/dataset"}, "label_field": "ground_truth" } ) load_dataset(name="classification-dataset") launch_app(dataset_name="classification-dataset")
Troubleshooting
Error: "Dataset already exists"
- Use a different dataset name
- Or delete existing dataset first with
@voxel51/utils/delete_dataset
Error: "No samples found"
- Verify the directory path is correct
- Check file extensions are supported
- Ensure files are not in nested subdirectories (use
if needed)recursive=true
Error: "Labels path not found"
- Verify the labels file/directory exists
- Check the path is absolute, not relative
Error: "Model not found"
- Check model name spelling
- Verify model exists in FiftyOne Zoo
- Use
andlist_operators()
to discover available modelsget_operator_schema()
Error: "Missing dependency" (e.g., torch, ultralytics)
- The MCP server detects missing dependencies
- Response includes
andmissing_packageinstall_command - Install the required package and restart MCP server
Slow inference
- Use smaller model variant (e.g.,
instead ofyolov8n
)yolov8x - Reduce batch size
- Consider delegated execution for large datasets
Best Practices
- Explore before importing - Always scan the directory first to understand the data
- Confirm with user - Present findings and get confirmation before creating datasets
- Use descriptive names - Dataset names and label fields should be meaningful
- Separate ground truth from predictions - Use different field names (e.g.,
vsground_truth
)predictions - Start with fast models - Use lightweight models first, then upgrade if needed
- Check operator schemas - Use
to discover available parametersget_operator_schema()
Resources
License
Copyright 2017-2025, Voxel51, Inc. Apache 2.0 License