install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/TerminalSkills/skills/senior-computer-vision" ~/.claude/skills/comeonoliver-skillshub-senior-computer-vision && rm -rf "$T"
manifest:
skills/TerminalSkills/skills/senior-computer-vision/SKILL.mdsource content
Senior Computer Vision
Overview
Build and deploy computer vision pipelines for object detection, image segmentation, and visual AI. Supports YOLO (v8/v11), Faster R-CNN, SAM (Segment Anything Model), and Mask R-CNN. Includes TensorRT optimization for production deployment with real-time inference.
Instructions
When a user asks for computer vision help, determine the task:
Task A: Object detection with YOLO
- Install ultralytics:
pip install ultralytics
- Run inference:
from ultralytics import YOLO # Load a pretrained model model = YOLO("yolo11n.pt") # nano (fastest) | s | m | l | x (most accurate) # Detect objects in an image results = model("image.jpg") # Process results for result in results: boxes = result.boxes for box in boxes: cls = int(box.cls[0]) conf = float(box.conf[0]) label = model.names[cls] x1, y1, x2, y2 = box.xyxy[0].tolist() print(f"{label}: {conf:.2f} at [{x1:.0f},{y1:.0f},{x2:.0f},{y2:.0f}]") # Save annotated image results[0].save("result.jpg")
- Run on video:
# Process video with tracking results = model.track("video.mp4", show=False, save=True, tracker="bytetrack.yaml")
- Train a custom YOLO model:
model = YOLO("yolo11n.pt") model.train( data="dataset.yaml", # Path to dataset config epochs=100, imgsz=640, batch=16, device=0, # GPU index patience=20, # Early stopping )
Dataset YAML format:
path: ./dataset train: images/train val: images/val names: 0: cat 1: dog 2: bird
Task B: Image segmentation with SAM
from segment_anything import sam_model_registry, SamPredictor, SamAutomaticMaskGenerator # Load SAM model sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth") sam.to(device="cuda") # Point-based segmentation predictor = SamPredictor(sam) predictor.set_image(image) # numpy array (H, W, 3) # Segment with a point prompt masks, scores, logits = predictor.predict( point_coords=np.array([[500, 375]]), # (x, y) coordinates point_labels=np.array([1]), # 1 = foreground, 0 = background multimask_output=True, ) # Automatic mask generation (segment everything) mask_generator = SamAutomaticMaskGenerator(sam) masks = mask_generator.generate(image) # Each mask: {segmentation, area, bbox, predicted_iou, stability_score} print(f"Found {len(masks)} segments")
Task C: Faster R-CNN and Mask R-CNN with torchvision
import torchvision from torchvision.models.detection import fasterrcnn_resnet50_fpn_v2, maskrcnn_resnet50_fpn_v2 from torchvision.transforms import functional as F from PIL import Image # Object detection with Faster R-CNN det_model = fasterrcnn_resnet50_fpn_v2(weights="DEFAULT") det_model.eval().cuda() img = Image.open("image.jpg") img_tensor = F.to_tensor(img).unsqueeze(0).cuda() with torch.no_grad(): predictions = det_model(img_tensor)[0] # Filter by confidence threshold = 0.7 for i in range(len(predictions["scores"])): if predictions["scores"][i] > threshold: label = predictions["labels"][i].item() score = predictions["scores"][i].item() box = predictions["boxes"][i].tolist() print(f"Class {label}: {score:.2f} at {box}") # Instance segmentation with Mask R-CNN seg_model = maskrcnn_resnet50_fpn_v2(weights="DEFAULT") seg_model.eval().cuda() with torch.no_grad(): predictions = seg_model(img_tensor)[0] # predictions["masks"] contains per-instance binary masks
Task D: TensorRT optimization for deployment
# Export YOLO to TensorRT from ultralytics import YOLO model = YOLO("yolo11n.pt") model.export(format="engine", device=0, half=True) # Creates yolo11n.engine # Run inference with TensorRT engine trt_model = YOLO("yolo11n.engine") results = trt_model("image.jpg")
For custom models:
import tensorrt as trt import torch # Export PyTorch model to ONNX first torch.onnx.export( model, dummy_input, "model.onnx", opset_version=17, input_names=["input"], output_names=["output"], dynamic_axes={"input": {0: "batch"}, "output": {0: "batch"}}, ) # Convert ONNX to TensorRT # trtexec --onnx=model.onnx --saveEngine=model.engine --fp16
Task E: Image classification
from torchvision.models import efficientnet_v2_s, EfficientNet_V2_S_Weights from torchvision.transforms import functional as F from PIL import Image weights = EfficientNet_V2_S_Weights.DEFAULT model = efficientnet_v2_s(weights=weights).eval().cuda() preprocess = weights.transforms() img = Image.open("image.jpg") batch = preprocess(img).unsqueeze(0).cuda() with torch.no_grad(): logits = model(batch) probs = torch.softmax(logits, dim=1)[0] top5 = torch.topk(probs, 5) categories = weights.meta["categories"] for score, idx in zip(top5.values, top5.indices): print(f"{categories[idx]}: {score:.2%}")
Examples
Example 1: Count products on a shelf
User request: "Count how many bottles are on each shelf in this image"
model = YOLO("yolo11m.pt") results = model("shelf.jpg", conf=0.5) bottles = [b for b in results[0].boxes if model.names[int(b.cls[0])] == "bottle"] print(f"Detected {len(bottles)} bottles") results[0].save("shelf_annotated.jpg")
Example 2: Segment and extract a foreground object
User request: "Remove the background from this product photo"
Use SAM with a center-point prompt to segment the main object, then apply the mask to create a transparent PNG background.
Example 3: Real-time detection on a webcam
User request: "Run object detection on my webcam feed"
model = YOLO("yolo11n.pt") results = model(source=0, show=True, conf=0.5) # source=0 for webcam
Guidelines
- Start with the smallest model variant (nano/small) and scale up only if accuracy is insufficient.
- Use TensorRT or ONNX Runtime for production deployments; they provide 2-5x speedup over PyTorch.
- For custom detection tasks, fine-tune YOLO on your dataset rather than training from scratch.
- Set confidence thresholds based on the application: 0.5 for general use, 0.7+ for high-precision needs.
- Use half-precision (FP16) inference on GPUs for nearly 2x speedup with minimal accuracy loss.
- Pre-process images to the model's expected resolution before inference for best results.
- For video processing, use batch inference and tracking (ByteTrack) for temporal consistency.
- Benchmark inference speed with
(YOLO) before committing to a model size.model.benchmark()