Gsd-skill-creator augmented-reality-tracking
Tracking and registration fundamentals for augmented reality. Covers the six classes of tracking (marker, markerless, inertial, SLAM, outside-in, inside-out), registration error sources (calibration, latency, drift, distortion), the Azuma definition of AR, display configurations (optical see-through, video see-through, projection, handheld), and the interaction between tracking precision and user perception of "being there." Use when designing AR experiences, diagnosing registration issues, or selecting tracking technology.
git clone https://github.com/Tibsfox/gsd-skill-creator
T=$(mktemp -d) && git clone --depth=1 https://github.com/Tibsfox/gsd-skill-creator "$T" && mkdir -p ~/.claude/skills && cp -r "$T/examples/skills/spatial-computing/augmented-reality-tracking" ~/.claude/skills/tibsfox-gsd-skill-creator-augmented-reality-tracking && rm -rf "$T"
examples/skills/spatial-computing/augmented-reality-tracking/SKILL.mdAugmented Reality Tracking
Augmented reality succeeds or fails on registration — the precise alignment of virtual content with the physical world. A perfectly rendered virtual object floating six inches from where it should be is worse than no augmentation at all. This skill catalogs the tracking techniques, error sources, and design heuristics that determine whether AR content feels anchored to reality or simply pasted over it.
Agent affinity: azuma (registration, tracking, AR definition), sutherland (first HMD), furness (head tracking and displays)
Concept IDs: spatial-coordinate-navigation, spatial-reasoning-3d, spatial-signal-propagation
The Azuma Definition of AR
Azuma (1997) defined AR as any system that:
- Combines real and virtual
- Is interactive in real time
- Is registered in 3D
This is the minimum bar. Systems that meet fewer than all three are not AR in the technical sense — a video feed with overlaid text is not AR (not 3D-registered); a prerendered overlay is not AR (not real-time); a VR headset is not AR (no real content). The definition scopes what this skill addresses.
Tracking Classes
| Class | Mechanism | Precision | Range | Cost | Failure modes |
|---|---|---|---|---|---|
| Marker-based | Recognized fiducial (ARTag, ARUCO, QR) | High | Line-of-sight to marker | Low | Occlusion, lighting |
| Markerless feature | Track natural features (corners, edges) | Medium-high | Feature-rich scenes | Medium | Textureless surfaces |
| Inertial (IMU) | Gyroscope + accelerometer | Medium (drifts) | Unlimited | Low | Drift over seconds |
| SLAM | Simultaneous localization and mapping | Medium-high | Unlimited | High | Dynamic scenes, low texture |
| Outside-in | External cameras watch trackers | Very high | Room-scale | High | Occlusion, range |
| Inside-out | Cameras on HMD watch environment | High | Unlimited | Medium | Feature loss in empty rooms |
Most modern AR uses a combination: visual-inertial SLAM combines IMU (high frequency, drifts slowly) with visual features (lower frequency, no drift) to get the best of both.
Sources of Registration Error
Getting the virtual object to appear in the right place in the real world is hard because errors compound across the pipeline. Azuma identified six major sources:
- Optical distortion — lenses distort the view; the distortion must be measured and compensated.
- Tracker error — the tracker's estimate of pose has some uncertainty.
- Mechanical misalignment — cameras, displays, and trackers are not perfectly aligned on the headset.
- Tracker-to-eye error — the tracker measures the headset position, but the user's eye is somewhere else; the offset must be calibrated per user.
- Latency — the time between the tracker reading and the rendered frame. At 100ms latency and 1 m/s head motion, content is 10 cm off.
- Incorrect calibration — any parameter that is wrong (lens curvature, IPD, camera extrinsics) propagates through all the above.
Latency is the dominant problem
For handheld AR (phones and tablets), static latency of 50ms is tolerable because the world moves slowly relative to the frame. For HMD AR, latency under 20ms is usually required for content to feel anchored. Higher frame rates (90-120 Hz) and predictive tracking help but do not eliminate the problem.
Display Configurations
Different AR hardware configurations trade off transparency, field of view, brightness, and social presence.
Optical see-through (OST)
Real world passes through a partially reflective combiner. Virtual content is projected into the optical path. Advantages: real world is unmediated, no camera latency. Disadvantages: virtual content cannot occlude real content, low contrast in bright conditions, narrow field of view in current hardware.
Video see-through (VST)
Real world is captured by cameras and composited with virtual content on a display. Advantages: virtual content can fully occlude real content, color correction possible. Disadvantages: all real content suffers camera latency, limited dynamic range, resolution cap.
Projection
Virtual content is projected onto real surfaces. Advantages: users do not need to wear headsets, many users can see the same augmentation. Disadvantages: surfaces must be prepared, projectors need line of sight, content is geometry-dependent.
Handheld (phone/tablet)
Most common AR today. The device is both sensor and display. Advantages: ubiquitous hardware, low friction, easy sharing. Disadvantages: user must hold the device, small field of view, one-handed operation.
Registration Design Heuristics
Given that registration is never perfect, how should designers mitigate the consequences?
Tolerate drift in anchored content
If the virtual object is allowed to drift slightly when the user is not looking at it, errors can be smoothed. Looking away and back "re-anchors" content at the current pose estimate.
Use natural fiducials
Tables, corners, and walls are usually stable and texture-rich. Content anchored to these tends to register better than content floating in empty space.
Shadow grounding
A virtual object casting a shadow on a real surface is more convincingly placed than one floating without a shadow. Even approximate shadows help.
Contact-rich placement
Content that touches multiple real surfaces (a virtual chair whose legs all contact the floor) is registered using multiple constraints. If any leg is off, the user notices.
Hide occlusion mismatches
In OST systems that cannot occlude, design content that does not require occlusion for interpretation. Holographic-style ghosting is a common style choice.
AR vs Mixed Reality vs Extended Reality
Industry terminology has proliferated:
- AR — Azuma's definition; real world is primary, virtual is overlay
- MR — usually means AR with mutual interaction (virtual objects react to real geometry)
- VR — real world is replaced, virtual is primary
- XR — umbrella term for all of the above
These terms are marketed aggressively and used loosely. This skill uses Azuma's strict definition of AR and notes where extensions (occlusion, physics interaction, shared multi-user) apply.
When to Use This Skill
- Designing AR applications with specific registration requirements
- Selecting tracking hardware for a use case
- Diagnosing registration drift, jitter, or misalignment
- Explaining AR limitations to stakeholders or users
When NOT to Use This Skill
- Pure VR design (no registration to real world)
- Non-interactive overlay graphics (not AR by Azuma's definition)
- Hardware selection for pure 3D scanning (use a photogrammetry skill)
Common Mistakes
| Mistake | Why it fails | Fix |
|---|---|---|
| Ignoring latency budget | Content lags head motion, user notices | Target 20ms or below for HMD |
| Testing only in ideal lighting | SLAM fails in real use | Test in diverse lighting |
| Assuming markers work everywhere | Occlusion, distance, angle fail them | Add markerless fallback |
| Not calibrating per user | IPD and eye offset vary | Provide a calibration step |
| Placing content mid-air without anchors | Nothing to register against, drift obvious | Anchor to natural fiducials |
| Ignoring occlusion mismatch in OST | Ghostly content, breaks illusion | Design for the display type |
Cross-References
- azuma agent: Registration, tracking, and AR foundational survey
- sutherland agent: First head-mounted display, 1968
- furness agent: Military HMDs, Super Cockpit, high-stakes AR
- 3d-interaction-design skill: Interaction techniques that AR applications use
- immersive-environment-design skill: Environment design for AR-compatible spaces
References
- Azuma, R. (1997). "A survey of augmented reality." Presence, 6(4), 355-385.
- Azuma, R., Baillot, Y., Behringer, R., Feiner, S., Julier, S., & MacIntyre, B. (2001). "Recent advances in augmented reality." IEEE CGA, 21(6), 34-47.
- Milgram, P., & Kishino, F. (1994). "A taxonomy of mixed reality visual displays." IEICE Transactions, E77-D(12), 1321-1329.
- Sutherland, I. E. (1968). "A head-mounted three-dimensional display." AFIPS '68 Fall Joint Computer Conference.
- Welch, G., & Foxlin, E. (2002). "Motion tracking: no silver bullet." IEEE CGA, 22(6), 24-38.