Claude-skill-registry axiom-vision-ref
Vision framework API, VNDetectHumanHandPoseRequest, VNDetectHumanBodyPoseRequest, person segmentation, face detection, VNImageRequestHandler, recognized points, joint landmarks, VNRecognizeTextRequest, VNDetectBarcodesRequest, DataScannerViewController, VNDocumentCameraViewController, RecognizeDocumentsRequest
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/axiom-vision-ref" ~/.claude/skills/majiayu000-claude-skill-registry-axiom-vision-ref && rm -rf "$T"
skills/data/axiom-vision-ref/SKILL.mdVision Framework API Reference
Comprehensive reference for Vision framework computer vision: subject segmentation, hand/body pose detection, person detection, face analysis, text recognition (OCR), barcode detection, and document scanning.
When to Use This Reference
- Implementing subject lifting using VisionKit or Vision
- Detecting hand/body poses for gesture recognition or fitness apps
- Segmenting people from backgrounds or separating multiple individuals
- Face detection and landmarks for AR effects or authentication
- Combining Vision APIs to solve complex computer vision problems
- Looking up specific API signatures and parameter meanings
- Recognizing text in images (OCR) with VNRecognizeTextRequest
- Detecting barcodes and QR codes with VNDetectBarcodesRequest
- Building live scanners with DataScannerViewController
- Scanning documents with VNDocumentCameraViewController
- Extracting structured document data with RecognizeDocumentsRequest (iOS 26+)
Related skills: See
axiom-vision for decision trees and patterns, axiom-vision-diag for troubleshooting
Vision Framework Overview
Vision provides computer vision algorithms for still images and video:
Core workflow:
- Create request (e.g.,
)VNDetectHumanHandPoseRequest() - Create handler with image (
)VNImageRequestHandler(cgImage: image) - Perform request (
)try handler.perform([request]) - Access observations from
request.results
Coordinate system: Lower-left origin, normalized (0.0-1.0) coordinates
Performance: Run on background queue - resource intensive, blocks UI if on main thread
Subject Segmentation APIs
VNGenerateForegroundInstanceMaskRequest
Availability: iOS 17+, macOS 14+, tvOS 17+, axiom-visionOS 1+
Generates class-agnostic instance mask of foreground objects (people, pets, buildings, food, shoes, etc.)
Basic Usage
let request = VNGenerateForegroundInstanceMaskRequest() let handler = VNImageRequestHandler(cgImage: image) try handler.perform([request]) guard let observation = request.results?.first as? VNInstanceMaskObservation else { return }
InstanceMaskObservation
allInstances:
IndexSet containing all foreground instance indices (excludes background 0)
instanceMask:
CVPixelBuffer with UInt8 labels (0 = background, 1+ = instance indices)
instanceAtPoint(_:): Returns instance index at normalized point
let point = CGPoint(x: 0.5, y: 0.5) // Center of image let instance = observation.instanceAtPoint(point) if instance == 0 { print("Background tapped") } else { print("Instance \(instance) tapped") }
Generating Masks
createScaledMask(for:croppedToInstancesContent:)
Parameters:
:for
of instances to includeIndexSet
:croppedToInstancesContent
= Output matches input resolution (for compositing)false
= Tight crop around selected instancestrue
Returns: Single-channel floating-point
CVPixelBuffer (soft segmentation mask)
// All instances, full resolution let mask = try observation.createScaledMask( for: observation.allInstances, croppedToInstancesContent: false ) // Single instance, cropped let instances = IndexSet(integer: 1) let croppedMask = try observation.createScaledMask( for: instances, croppedToInstancesContent: true )
Instance Mask Hit Testing
Access raw pixel buffer to map tap coordinates to instance labels:
let instanceMask = observation.instanceMask CVPixelBufferLockBaseAddress(instanceMask, .readOnly) defer { CVPixelBufferUnlockBaseAddress(instanceMask, .readOnly) } let baseAddress = CVPixelBufferGetBaseAddress(instanceMask) let width = CVPixelBufferGetWidth(instanceMask) let bytesPerRow = CVPixelBufferGetBytesPerRow(instanceMask) // Convert normalized tap to pixel coordinates let pixelPoint = VNImagePointForNormalizedPoint( CGPoint(x: normalizedX, y: normalizedY), width: imageWidth, height: imageHeight ) // Calculate byte offset let offset = Int(pixelPoint.y) * bytesPerRow + Int(pixelPoint.x) // Read instance label let label = UnsafeRawPointer(baseAddress!).load( fromByteOffset: offset, as: UInt8.self ) let instances = label == 0 ? observation.allInstances : IndexSet(integer: Int(label))
VisionKit Subject Lifting
ImageAnalysisInteraction (iOS)
Availability: iOS 16+, iPadOS 16+
Adds system-like subject lifting UI to views:
let interaction = ImageAnalysisInteraction() interaction.preferredInteractionTypes = .imageSubject // Or .automatic imageView.addInteraction(interaction)
Interaction types:
: Subject lifting + Live Text + data detectors.automatic
: Subject lifting only (no interactive text).imageSubject
ImageAnalysisOverlayView (macOS)
Availability: macOS 13+
let overlayView = ImageAnalysisOverlayView() overlayView.preferredInteractionTypes = .imageSubject nsView.addSubview(overlayView)
Programmatic Access
ImageAnalyzer
let analyzer = ImageAnalyzer() let configuration = ImageAnalyzer.Configuration([.text, .visualLookUp]) let analysis = try await analyzer.analyze(image, configuration: configuration)
ImageAnalysis
subjects:
[Subject] - All subjects in image
highlightedSubjects:
Set<Subject> - Currently highlighted (user long-pressed)
subject(at:): Async lookup of subject at normalized point (returns
nil if none)
// Get all subjects let subjects = analysis.subjects // Look up subject at tap if let subject = try await analysis.subject(at: tapPoint) { // Process subject } // Change highlight state analysis.highlightedSubjects = Set([subjects[0], subjects[1]])
Subject Struct
image:
UIImage/NSImage - Extracted subject with transparency
bounds:
CGRect - Subject boundaries in image coordinates
// Single subject image let subjectImage = subject.image // Composite multiple subjects let compositeImage = try await analysis.image(for: [subject1, subject2])
Out-of-process: VisionKit analysis happens out-of-process (performance benefit, image size limited)
Person Segmentation APIs
VNGeneratePersonSegmentationRequest
Availability: iOS 15+, macOS 12+
Returns single mask containing all people in image:
let request = VNGeneratePersonSegmentationRequest() // Configure quality level if needed try handler.perform([request]) guard let observation = request.results?.first as? VNPixelBufferObservation else { return } let personMask = observation.pixelBuffer // CVPixelBuffer
VNGeneratePersonInstanceMaskRequest
Availability: iOS 17+, macOS 14+
Returns separate masks for up to 4 people:
let request = VNGeneratePersonInstanceMaskRequest() try handler.perform([request]) guard let observation = request.results?.first as? VNInstanceMaskObservation else { return } // Same InstanceMaskObservation API as foreground instance masks let allPeople = observation.allInstances // Up to 4 people (1-4) // Get mask for person 1 let person1Mask = try observation.createScaledMask( for: IndexSet(integer: 1), croppedToInstancesContent: false )
Limitations:
- Segments up to 4 people
- With >4 people: may miss people or combine them (typically background people)
- Use
to count faces if you need to handle crowded scenesVNDetectFaceRectanglesRequest
Hand Pose Detection
VNDetectHumanHandPoseRequest
Availability: iOS 14+, macOS 11+
Detects 21 hand landmarks per hand:
let request = VNDetectHumanHandPoseRequest() request.maximumHandCount = 2 // Default: 2, increase if needed let handler = VNImageRequestHandler(cgImage: image) try handler.perform([request]) for observation in request.results as? [VNHumanHandPoseObservation] ?? [] { // Process each hand }
Performance note:
maximumHandCount affects latency. Pose computed only for hands ≤ maximum. Set to lowest acceptable value.
Hand Landmarks (21 points)
Wrist: 1 landmark
Thumb (4 landmarks):
.thumbTip
(interphalangeal joint).thumbIP
(metacarpophalangeal joint).thumbMP
(carpometacarpal joint).thumbCMC
Fingers (4 landmarks each):
- Tip (
,.indexTip
,.middleTip
,.ringTip
).littleTip - DIP (distal interphalangeal joint)
- PIP (proximal interphalangeal joint)
- MCP (metacarpophalangeal joint)
Group Keys
Access landmark groups:
| Group Key | Points |
|---|---|
| All 21 landmarks |
| 4 thumb joints |
| 4 index finger joints |
| 4 middle finger joints |
| 4 ring finger joints |
| 4 little finger joints |
// Get all points let allPoints = try observation.recognizedPoints(.all) // Get index finger points only let indexPoints = try observation.recognizedPoints(.indexFinger) // Get specific point let thumbTip = try observation.recognizedPoint(.thumbTip) let indexTip = try observation.recognizedPoint(.indexTip) // Check confidence guard thumbTip.confidence > 0.5 else { return } // Access location (normalized coordinates, lower-left origin) let location = thumbTip.location // CGPoint
Gesture Recognition Example (Pinch)
let thumbTip = try observation.recognizedPoint(.thumbTip) let indexTip = try observation.recognizedPoint(.indexTip) guard thumbTip.confidence > 0.5, indexTip.confidence > 0.5 else { return } let distance = hypot( thumbTip.location.x - indexTip.location.x, thumbTip.location.y - indexTip.location.y ) let isPinching = distance < 0.05 // Normalized threshold
Chirality (Handedness)
let chirality = observation.chirality // .left or .right or .unknown
Body Pose Detection
VNDetectHumanBodyPoseRequest (2D)
Availability: iOS 14+, macOS 11+
Detects 18 body landmarks (2D normalized coordinates):
let request = VNDetectHumanBodyPoseRequest() try handler.perform([request]) for observation in request.results as? [VNHumanBodyPoseObservation] ?? [] { // Process each person }
Body Landmarks (18 points)
Face (5 landmarks):
,.nose
,.leftEye
,.rightEye
,.leftEar.rightEar
Arms (6 landmarks):
- Left:
,.leftShoulder
,.leftElbow.leftWrist - Right:
,.rightShoulder
,.rightElbow.rightWrist
Torso (7 landmarks):
(between shoulders).neck
,.leftShoulder
(also in arm groups).rightShoulder
,.leftHip.rightHip
(between hips).root
Legs (6 landmarks):
- Left:
,.leftHip
,.leftKnee.leftAnkle - Right:
,.rightHip
,.rightKnee.rightAnkle
Note: Shoulders and hips appear in multiple groups
Group Keys (Body)
| Group Key | Points |
|---|---|
| All 18 landmarks |
| 5 face landmarks |
| shoulder, elbow, wrist |
| shoulder, elbow, wrist |
| neck, shoulders, hips, root |
| hip, knee, ankle |
| hip, knee, ankle |
// Get all body points let allPoints = try observation.recognizedPoints(.all) // Get left arm only let leftArmPoints = try observation.recognizedPoints(.leftArm) // Get specific joint let leftWrist = try observation.recognizedPoint(.leftWrist)
VNDetectHumanBodyPose3DRequest (3D)
Availability: iOS 17+, macOS 14+
Returns 3D skeleton with 17 joints in meters (real-world coordinates):
let request = VNDetectHumanBodyPose3DRequest() try handler.perform([request]) guard let observation = request.results?.first as? VNHumanBodyPose3DObservation else { return } // Get 3D joint position let leftWrist = try observation.recognizedPoint(.leftWrist) let position = leftWrist.position // simd_float4x4 matrix let localPosition = leftWrist.localPosition // Relative to parent joint
3D Body Landmarks (17 points): Same as 2D except no ears (15 vs 18 2D landmarks)
3D Observation Properties
bodyHeight: Estimated height in meters
- With depth data: Measured height
- Without depth data: Reference height (1.8m)
heightEstimation:
.measured or .reference
cameraOriginMatrix:
simd_float4x4 camera position/orientation relative to subject
pointInImage(_:): Project 3D joint back to 2D image coordinates
let wrist2D = try observation.pointInImage(leftWrist)
3D Point Classes
VNPoint3D: Base class with
simd_float4x4 position matrix
VNRecognizedPoint3D: Adds identifier (joint name)
VNHumanBodyRecognizedPoint3D: Adds
localPosition and parentJoint
// Position relative to skeleton root (center of hip) let modelPosition = leftWrist.position // Position relative to parent joint (left elbow) let relativePosition = leftWrist.localPosition
Depth Input
Vision accepts depth data alongside images:
// From AVDepthData let handler = VNImageRequestHandler( cvPixelBuffer: imageBuffer, depthData: depthData, orientation: orientation ) // From file (automatic depth extraction) let handler = VNImageRequestHandler(url: imageURL) // Depth auto-fetched
Depth formats: Disparity or Depth (interchangeable via AVFoundation)
LiDAR: Use in live capture sessions for accurate scale/measurement
Face Detection & Landmarks
VNDetectFaceRectanglesRequest
Availability: iOS 11+
Detects face bounding boxes:
let request = VNDetectFaceRectanglesRequest() try handler.perform([request]) for observation in request.results as? [VNFaceObservation] ?? [] { let faceBounds = observation.boundingBox // Normalized rect }
VNDetectFaceLandmarksRequest
Availability: iOS 11+
Detects face with detailed landmarks:
let request = VNDetectFaceLandmarksRequest() try handler.perform([request]) for observation in request.results as? [VNFaceObservation] ?? [] { if let landmarks = observation.landmarks { let leftEye = landmarks.leftEye let nose = landmarks.nose let leftPupil = landmarks.leftPupil // Revision 2+ } }
Revisions:
- Revision 1: Basic landmarks
- Revision 2: Detects upside-down faces
- Revision 3+: Pupil locations
Person Detection
VNDetectHumanRectanglesRequest
Availability: iOS 13+
Detects human bounding boxes (torso detection):
let request = VNDetectHumanRectanglesRequest() try handler.perform([request]) for observation in request.results as? [VNHumanObservation] ?? [] { let humanBounds = observation.boundingBox // Normalized rect }
Use case: Faster than pose detection when you only need location
CoreImage Integration
CIBlendWithMask Filter
Composite subject on new background using Vision mask:
// 1. Get mask from Vision let observation = request.results?.first as? VNInstanceMaskObservation let visionMask = try observation.createScaledMask( for: observation.allInstances, croppedToInstancesContent: false ) // 2. Convert to CIImage let maskImage = CIImage(cvPixelBuffer: axiom-visionMask) // 3. Apply filter let filter = CIFilter(name: "CIBlendWithMask")! filter.setValue(sourceImage, forKey: kCIInputImageKey) filter.setValue(maskImage, forKey: kCIInputMaskImageKey) filter.setValue(newBackground, forKey: kCIInputBackgroundImageKey) let output = filter.outputImage // Composited result
Parameters:
- Input image: Original image to mask
- Mask image: Vision's soft segmentation mask
- Background image: New background (or empty image for transparency)
HDR preservation: CoreImage preserves high dynamic range from input (Vision/VisionKit output is SDR)
Text Recognition APIs
VNRecognizeTextRequest
Availability: iOS 13+, macOS 10.15+
Recognizes text in images with configurable accuracy/speed trade-off.
Basic Usage
let request = VNRecognizeTextRequest() request.recognitionLevel = .accurate // Or .fast request.recognitionLanguages = ["en-US", "de-DE"] // Order matters request.usesLanguageCorrection = true let handler = VNImageRequestHandler(cgImage: image) try handler.perform([request]) for observation in request.results as? [VNRecognizedTextObservation] ?? [] { // Get top candidates let candidates = observation.topCandidates(3) let bestText = candidates.first?.string ?? "" }
Recognition Levels
| Level | Performance | Accuracy | Best For |
|---|---|---|---|
| Real-time | Good | Camera feed, large text, signs |
| Slower | Excellent | Documents, receipts, handwriting |
Fast path: Character-by-character recognition (Neural Network → Character Detection)
Accurate path: Full-line ML recognition (Neural Network → Line/Word Recognition)
Properties
| Property | Type | Description |
|---|---|---|
| | or |
| | BCP 47 language codes, order = priority |
| | Use language model for correction |
| | Domain-specific vocabulary |
| | Auto-detect language (iOS 16+) |
| | Min text height as fraction of image (0-1) |
| | API version (affects supported languages) |
Language Support
// Check supported languages for current settings let languages = try VNRecognizeTextRequest.supportedRecognitionLanguages( for: .accurate, revision: VNRecognizeTextRequestRevision3 )
Language correction: Improves accuracy but takes processing time. Disable for codes/serial numbers.
Custom words: Add domain-specific vocabulary for better recognition (medical terms, product codes).
VNRecognizedTextObservation
boundingBox: Normalized rect containing recognized text
topCandidates(_:): Returns
[VNRecognizedText] ordered by confidence
VNRecognizedText
| Property | Type | Description |
|---|---|---|
| | Recognized text |
| | 0.0-1.0 |
| | Box for substring range |
// Get bounding box for substring let text = candidate.string if let range = text.range(of: "invoice") { let box = try candidate.boundingBox(for: range) }
Barcode Detection APIs
VNDetectBarcodesRequest
Availability: iOS 11+, macOS 10.13+
Detects and decodes barcodes and QR codes.
Basic Usage
let request = VNDetectBarcodesRequest() request.symbologies = [.qr, .ean13, .code128] // Specific codes let handler = VNImageRequestHandler(cgImage: image) try handler.perform([request]) for barcode in request.results as? [VNBarcodeObservation] ?? [] { let payload = barcode.payloadStringValue let type = barcode.symbology let bounds = barcode.boundingBox }
Symbologies
1D Barcodes:
(iOS 15+).codabar
,.code39
,.code39Checksum
,.code39FullASCII.code39FullASCIIChecksum
,.code93.code93i.code128
,.ean8.ean13
,.gs1DataBar
,.gs1DataBarExpanded
(iOS 15+).gs1DataBarLimited
,.i2of5.i2of5Checksum.itf14.upce
2D Codes:
.aztec.dataMatrix
(iOS 15+).microPDF417
(iOS 15+).microQR.pdf417.qr
Performance: Specifying fewer symbologies = faster detection
Revisions
| Revision | iOS | Features |
|---|---|---|
| 1 | 11+ | Basic detection, one code at a time |
| 2 | 15+ | Codabar, GS1, MicroPDF, MicroQR, better ROI |
| 3 | 16+ | ML-based, multiple codes, better bounding boxes |
VNBarcodeObservation
| Property | Type | Description |
|---|---|---|
| | Decoded content |
| | Barcode type |
| | Normalized bounds |
| | Corner points |
VisionKit Scanner APIs
DataScannerViewController
Availability: iOS 16+
Camera-based live scanner with built-in UI for text and barcodes.
Check Availability
// Hardware support DataScannerViewController.isSupported // Runtime availability (camera access, parental controls) DataScannerViewController.isAvailable
Configuration
import VisionKit let dataTypes: Set<DataScannerViewController.RecognizedDataType> = [ .barcode(symbologies: [.qr, .ean13]), .text(textContentType: .URL), // Or nil for all text // .text(languages: ["ja"]) // Filter by language ] let scanner = DataScannerViewController( recognizedDataTypes: dataTypes, qualityLevel: .balanced, // .fast, .balanced, .accurate recognizesMultipleItems: true, isHighFrameRateTrackingEnabled: true, isPinchToZoomEnabled: true, isGuidanceEnabled: true, isHighlightingEnabled: true ) scanner.delegate = self present(scanner, animated: true) { try? scanner.startScanning() }
RecognizedDataType
| Type | Description |
|---|---|
| Specific barcode types |
| All text |
| Text filtered by language |
| Text filtered by type (URL, phone, email) |
Delegate Protocol
protocol DataScannerViewControllerDelegate { func dataScanner(_ dataScanner: DataScannerViewController, didTapOn item: RecognizedItem) func dataScanner(_ dataScanner: DataScannerViewController, didAdd addedItems: [RecognizedItem], allItems: [RecognizedItem]) func dataScanner(_ dataScanner: DataScannerViewController, didUpdate updatedItems: [RecognizedItem], allItems: [RecognizedItem]) func dataScanner(_ dataScanner: DataScannerViewController, didRemove removedItems: [RecognizedItem], allItems: [RecognizedItem]) func dataScanner(_ dataScanner: DataScannerViewController, becameUnavailableWithError error: DataScannerViewController.ScanningUnavailable) }
RecognizedItem
enum RecognizedItem { case text(RecognizedItem.Text) case barcode(RecognizedItem.Barcode) var id: UUID { get } var bounds: RecognizedItem.Bounds { get } } // Text item struct Text { let transcript: String } // Barcode item struct Barcode { let payloadStringValue: String? let observation: VNBarcodeObservation }
Async Stream
// Alternative to delegate for await items in scanner.recognizedItems { // Current recognized items }
Custom Highlights
// Add custom views over recognized items scanner.overlayContainerView.addSubview(customHighlight) // Capture still photo let photo = try await scanner.capturePhoto()
VNDocumentCameraViewController
Availability: iOS 13+
Document scanning with automatic edge detection, perspective correction, and lighting adjustment.
Basic Usage
import VisionKit let camera = VNDocumentCameraViewController() camera.delegate = self present(camera, animated: true)
Delegate Protocol
protocol VNDocumentCameraViewControllerDelegate { func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan) func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController) func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFailWithError error: Error) }
VNDocumentCameraScan
| Property | Type | Description |
|---|---|---|
| | Number of scanned pages |
| | Get page image at index |
| | User-editable title |
func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan) { controller.dismiss(animated: true) for i in 0..<scan.pageCount { let pageImage = scan.imageOfPage(at: i) // Process with VNRecognizeTextRequest } }
Document Analysis APIs
VNDetectDocumentSegmentationRequest
Availability: iOS 15+, macOS 12+
Detects document boundaries for custom camera UIs or post-processing.
let request = VNDetectDocumentSegmentationRequest() let handler = VNImageRequestHandler(ciImage: image) try handler.perform([request]) guard let observation = request.results?.first as? VNRectangleObservation else { return // No document found } // Get corner points (normalized) let corners = [ observation.topLeft, observation.topRight, observation.bottomLeft, observation.bottomRight ]
vs VNDetectRectanglesRequest:
- Document: ML-based, trained specifically on documents
- Rectangle: Edge-based, finds any quadrilateral
RecognizeDocumentsRequest (iOS 26+)
Availability: iOS 26+, macOS 26+
Structured document understanding with semantic parsing.
Basic Usage
let request = RecognizeDocumentsRequest() let observations = try await request.perform(on: imageData) guard let document = observations.first?.document else { return }
DocumentObservation Hierarchy
DocumentObservation └── document: DocumentObservation.Document ├── text: TextObservation ├── tables: [Container.Table] ├── lists: [Container.List] └── barcodes: [Container.Barcode]
Table Extraction
for table in document.tables { for row in table.rows { for cell in row { let text = cell.content.text.transcript let detectedData = cell.content.text.detectedData } } }
Detected Data Types
for data in document.text.detectedData { switch data.match.details { case .emailAddress(let email): let address = email.emailAddress case .phoneNumber(let phone): let number = phone.phoneNumber case .link(let url): let link = url case .address(let address): let components = address case .date(let date): let dateValue = date default: break } }
TextObservation Hierarchy
TextObservation ├── transcript: String ├── lines: [TextObservation.Line] ├── paragraphs: [TextObservation.Paragraph] ├── words: [TextObservation.Word] └── detectedData: [DetectedDataObservation]
API Quick Reference
Subject Segmentation
| API | Platform | Purpose |
|---|---|---|
| iOS 17+ | Class-agnostic subject instances |
| iOS 17+ | Up to 4 people separately |
| iOS 15+ | All people (single mask) |
(VisionKit) | iOS 16+ | UI for subject lifting |
Pose Detection
| API | Platform | Landmarks | Coordinates |
|---|---|---|---|
| iOS 14+ | 21 per hand | 2D normalized |
| iOS 14+ | 18 body joints | 2D normalized |
| iOS 17+ | 17 body joints | 3D meters |
Face & Person Detection
| API | Platform | Purpose |
|---|---|---|
| iOS 11+ | Face bounding boxes |
| iOS 11+ | Face with detailed landmarks |
| iOS 13+ | Human torso bounding boxes |
Text & Barcode
| API | Platform | Purpose |
|---|---|---|
| iOS 13+ | Text recognition (OCR) |
| iOS 11+ | Barcode/QR detection |
| iOS 16+ | Live camera scanner (text + barcodes) |
| iOS 13+ | Document scanning with perspective correction |
| iOS 15+ | Programmatic document edge detection |
| iOS 26+ | Structured document extraction |
Observation Types
| Observation | Returned By |
|---|---|
| Foreground/person instance masks |
| Person segmentation (single mask) |
| Hand pose |
| Body pose (2D) |
| Body pose (3D) |
| Face detection/landmarks |
| Human rectangles |
| Text recognition |
| Barcode detection |
| Document segmentation |
| Structured document (iOS 26+) |
Resources
WWDC: 2019-234, 2021-10041, 2022-10024, 2022-10025, 2025-272, 2023-10176, 2023-111241, 2023-10048, 2020-10653, 2020-10043, 2020-10099
Docs: /vision, /visionkit, /vision/vnrecognizetextrequest, /vision/vndetectbarcodesrequest
Skills: axiom-vision, axiom-vision-diag