Skillshub ros2-engineering-skills
git clone https://github.com/ComeOnOliver/skillshub
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/dbwls99706/ros2-engineering-skills/SKILL.md" ~/.claude/skills/comeonoliver-skillshub-ros2-engineering-skills && rm -rf "$T"
skills/dbwls99706/ros2-engineering-skills/SKILL.md/SKILL.mdROS 2 Engineering Skills
Single responsibility: This skill is an API reference & code template guide for ROS 2 development. It tells you how to use ROS 2 APIs correctly and what mistakes to avoid. It does NOT do CI/CD orchestration, incident response, data analysis, or deployment automation — those are separate skill categories.
A progressive-disclosure skill for ROS 2 development — from first workspace to production fleet deployment. Each section below gives you the essential decision framework; detailed patterns, code templates, and anti-patterns live in the
references/ directory. Read the relevant reference file before writing code.
How to use this skill
Progressive disclosure — do NOT read everything at once. This skill is structured in layers. Only load what you need for the current task:
- This file (SKILL.md) — always loaded. Contains decision routing, core principles, pitfalls, and anti-patterns. Sufficient for answering quick questions and making architectural decisions.
— load on demand. Use the Decision Router below to pick the 1–2 files relevant to the user's current task. Do NOT read all 20 reference files — that wastes context and causes confusion.references/*.md
— run only when the user needs code generation, QoS checking, or launch validation. These are tools, not reading material.scripts/
Steps:
- If
exists in the workspace, read the last few lines to understand what was done and what issues occurred in previous sessions..skill-runs.log - Identify what the user is building (see Decision Router below).
- Read only the matching
file(s) for detailed guidance.references/*.md - Check the AI pitfalls table before generating any code.
- Apply the Core Engineering Principles in every artifact you produce.
- When multiple domains intersect (e.g. Nav2 + ros2_control), read both files but favor safety > determinism > simplicity when recommendations conflict.
Execution log: The Stop hook automatically appends a session summary to
.skill-runs.log in the workspace. This lets you see what was validated last
time and what issues were found — check it to avoid repeating past mistakes.
Decision router
| User is doing... | Read |
|---|---|
| Creating a workspace, package, or build config | |
| Writing nodes, executors, callback groups | |
| Topics, services, actions, custom interfaces, QoS | |
| Lifecycle nodes, component loading, composition | |
| Launch files, conditional logic, event handlers | |
| tf2, URDF, xacro, robot_state_publisher | |
| ros2_control, hardware interfaces, controllers | |
| Real-time constraints, PREEMPT_RT, memory, jitter | |
| Nav2, SLAM, costmaps, behavior trees | |
| MoveIt 2, planning scene, grasp pipelines | |
| Camera, LiDAR, PCL, cv_bridge, depth processing | |
| Unit tests, integration tests, launch_testing, CI | |
| ros2 doctor, tracing, profiling, rosbag2 | |
| Docker, cross-compile, fleet deployment, OTA | |
| Gazebo, Isaac Sim, sim-to-real, use_sim_time | |
| SROS2, DDS security, certificates, supply chain | |
| micro-ROS, MCU/RTOS, XRCE-DDS, rclc | |
| Multi-robot fleet, Open-RMF, DDS discovery scale | |
| Message types, units, covariance, frame conventions | |
| ROS 1 migration, ros1_bridge, hybrid operation | |
Cross-cutting concerns: Security, error handling, and QoS are not isolated to single reference files — apply them whenever the data path crosses a trust boundary, a node owns hardware, or communication reliability matters. Use your judgment about which cross-cutting concerns apply to the user's specific situation.
Core engineering principles
These apply to every ROS 2 artifact you produce, regardless of domain.
1. Distro awareness
Always ask which ROS 2 distribution the user targets. Key differences:
| Feature | Foxy (EOL) | Humble (LTS) | Jazzy (LTS) | Kilted (non-LTS) | Rolling |
|---|---|---|---|---|---|
| EOL | Jun 2023 (ended) | May 2027 | May 2029 | Nov 2025 | Rolling |
| Ubuntu | 20.04 | 22.04 | 24.04 | 24.04 | Latest |
| Default DDS | Fast DDS | Fast DDS | Fast DDS | Fast DDS | Fast DDS |
| Zenoh support | — | — | — | Tier 1 | Tier 1 |
| Type description support | No | No | Yes | Yes | Yes |
| Service introspection | No | No | Yes | Yes | Yes |
| EventsExecutor | No | No | Experimental | Stable (+ rclpy) | Stable (+ rclpy) |
| Default bag format | sqlite3 | sqlite3 | MCAP | MCAP | MCAP |
| ros2_control interface | N/A (separate) | 2.x | 4.x | 4.x | Latest |
| CMake recommendation | ament_target_deps | ament_target_deps | either | target_link_libs | target_link_libs |
When the user does not specify, default to the latest LTS (Jazzy). Pin the exact distro in Dockerfile, CI, and documentation so builds are reproducible.
2. C++ vs Python decision
Choose the language based on the node's role, not personal preference.
Use rclcpp (C++) when:
- The node sits in a control loop running ≥100 Hz
- Deterministic memory allocation matters (real-time path)
- The node is a hardware driver or controller plugin
- Intra-process zero-copy communication is required
Use rclpy (Python) when:
- The node is orchestration, monitoring, or parameter management
- Rapid prototyping with frequent iteration
- Heavy use of ML frameworks (PyTorch, TensorFlow) that are Python-native
- The node does not sit in a latency-critical path
Mixed stacks are normal. A typical robot has C++ drivers/controllers and Python orchestration/monitoring. Note:
component_container (composition) only loads
C++ components via pluginlib. Python nodes run as separate processes, but can
share a launch file and communicate via zero-overhead intra-host DDS.
Intra-process communication works for any nodes sharing a process — not only composable components. Any nodes instantiated in the same process with
use_intra_process_comms(true) can use zero-copy transfer.
3. Package structure conventions
Every package should follow this layout. Consistency across a workspace reduces onboarding time and makes CI scripts portable.
my_package/ ├── CMakeLists.txt # or setup.py for pure Python ├── package.xml # format 3, with <depend> tags ├── config/ │ └── params.yaml # default parameters ├── launch/ │ └── bringup.launch.py # Python launch file ├── include/my_package/ # C++ public headers (if library) ├── src/ # C++ source files ├── my_package/ # Python modules (if ament_python or mixed) ├── test/ # gtest, pytest, launch_testing ├── urdf/ # URDF/xacro (if applicable) ├── msg/ srv/ action/ # custom interfaces (dedicated _interfaces package preferred) └── README.md
Separate interface definitions into a
*_interfaces package so downstream
packages can depend on interfaces without pulling in implementation.
4. Parameter discipline
- Declare every parameter with a type, description, range, and default in the node constructor — never use undeclared parameters.
- Use
withParameterDescriptor
orFloatingPointRange
for numeric bounds. The parameter server rejects out-of-range values at set time.IntegerRange - Group related parameters under a namespace prefix:
,controller.kp
,controller.ki
.controller.kd - Load defaults from a
; allow launch-time overrides.config/params.yaml - For dynamic reconfiguration, register a
and validate new values atomically before accepting.set_parameters_callback
5. Error handling philosophy
- Nodes must not silently swallow errors. Log at the appropriate severity, then take a safe action (stop motion, request help, transition to error state).
- Prefer lifecycle node error transitions over ad-hoc boolean flags.
- When calling a service, always handle the "service not available" and "future timed out" cases explicitly.
- For hardware drivers, distinguish transient errors (retry with backoff)
from fatal errors (transition to
and alert the operator).FINALIZED
6. Quality of Service defaults
Start from these profiles and adjust per use case:
| Use case | Reliability | Durability | History | Depth | Deadline | Lifespan |
|---|---|---|---|---|---|---|
| Sensor stream | BEST_EFFORT | VOLATILE | KEEP_LAST | 5 | — | — |
| Command velocity | RELIABLE | VOLATILE | KEEP_LAST | 1 | 100 ms | 200 ms |
| Map (latched) | RELIABLE | TRANSIENT_LOCAL | KEEP_LAST | 1 | — | — |
| Diagnostics | RELIABLE | VOLATILE | KEEP_LAST | 10 | — | — |
| Parameter events | RELIABLE | VOLATILE | KEEP_LAST | 1000 | — | — |
| Action feedback | RELIABLE | VOLATILE | KEEP_LAST | 1 | — | — |
| Safety heartbeat | RELIABLE | VOLATILE | KEEP_LAST | 1 | 500 ms | 1 s |
QoS mismatches are the #1 cause of "I published but nobody receives." Always check compatibility with
ros2 topic info -v when debugging.
DEADLINE and LIFESPAN are critical for safety-critical systems. DEADLINE fires an event when no message arrives within the specified period (detect stale data). LIFESPAN discards messages older than the specified duration before delivery (prevent acting on stale data). See
references/communication.md section 9 for full API and examples.
7. Naming conventions
| Entity | Convention | Example |
|---|---|---|
| Package | | |
| Node | | |
| Topic | with ns | |
| Service | | |
| Action | | |
| Parameter | with dot ns | |
| Frame | | , |
| Interface | | |
8. Thread safety and callbacks
- A
serializes its callbacks — safe for shared state without locks, but limits throughput.MutuallyExclusiveCallbackGroup - A
allows parallel execution — you must protect shared state withReentrantCallbackGroup
(C++) orstd::mutex
(Python).threading.Lock - Calling a service from a callback: The service client must be in a
separate
from the calling callback. Otherwise the executor deadlocks — the callback waits for the response while the executor cannot deliver it. Always useMutuallyExclusiveCallbackGroup
with a response callback; never useasync_send_request
inside an executor callback.spin_until_future_complete - Never do blocking work (file I/O, long computation,
) inside a timer or subscription callback on the default executor. Offload to a dedicated thread or use asleep
with a reentrant group.MultiThreadedExecutor - In rclcpp, prefer
in subscription callbacks to avoid unnecessary copies and enable zero-copy intra-process.std::shared_ptr<const MessageT>
9. Lifecycle-first design
Default to lifecycle (managed) nodes for anything that owns resources: hardware drivers, sensor pipelines, planners, controllers.
┌──────────────┐ create() ──► │ Unconfigured │ └──────┬───────┘ on_configure │ ┌──────▼───────┐ │ Inactive │ └──────┬───────┘ on_activate │ ┌──────▼───────┐ │ Active │ └──────┬───────┘ on_deactivate │ ┌──────▼───────┐ │ Inactive │ └──────┬───────┘ on_cleanup │ ┌──────▼───────┐ │ Unconfigured │ └──────┬───────┘ on_shutdown │ ┌──────▼───────┐ │ Finalized │ └───────────────┘
This gives the system manager (launch file, orchestrator, or operator) explicit control over when resources are allocated, when the node starts processing, and how it shuts down. It also makes error recovery predictable.
10. Build and CI hygiene
- Use
for development;colcon build --cmake-args -DCMAKE_BUILD_TYPE=RelWithDebInfo
for deployment.Release - Enable
and treat warnings as errors in CI.-Wall -Wextra -Wpedantic - Run
withcolcon test
so test output groups by package.--event-handlers console_cohesion+ - Pin rosdep keys in
for reproducible dependency resolution.rosdep.yaml - Cache
,/opt/ros/
, and.ccache/
/build/
in CI to cut build times by 60–80%.install/
Common anti-patterns
| Anti-pattern | Why it hurts | Fix |
|---|---|---|
| Global variables for node state | Breaks composition, untestable | Store state as class members |
in for multi-node processes | Starves other nodes | Use or component composition |
| Hardcoded topic names | Breaks reuse across robots | Use relative names + namespace remapping |
history with no bound | Memory grows unbounded on slow subscribers | Use with explicit depth |
Using / | Blocks the executor thread | Use or a dedicated thread |
| Monolithic launch file for everything | Unmanageable past 10 nodes | Compose launch files with |
Skipping dependencies | Builds locally, breaks CI and Docker | Declare every dependency explicitly |
| Publishing in constructor | Subscribers may not be ready, messages lost | Publish in or after a short timer |
| Ignoring QoS compatibility | Silent communication failure | Match publisher/subscriber QoS or check with |
| Creating timers/subs in callbacks | Resource leak, unpredictable behavior | Create all entities in constructor or |
| Synchronous service call in callback | Deadlocks the executor thread | Use with a callback or dedicated thread |
| Service client in same callback group as caller | Deadlocks even with async in | Put service client in a separate |
| No safe command on shutdown | Motors hold last velocity after node exits | Send zero-velocity in AND destructor (see ) |
Dynamic subscriptions with | New subs are never picked up after | Use or for dynamic entities |
CPU frequency governor left on / | 10-100 ms latency spikes in RT path | Set governor, disable turbo boost (see ) |
AI pitfalls — traps this skill has learned from
These are mistakes AI agents repeatedly make when generating ROS 2 code. Add a new line here every time a failure is discovered in practice.
| # | Pitfall | What goes wrong | Correct approach |
|---|---|---|---|
| 1 | Using inside a callback | Deadlocks the executor — the callback blocks waiting for a response that can never be delivered | Use with a response callback; put the service client in a separate |
| 2 | Generating Foxy-era API for Jazzy/Kilted | is deprecated, signature changed in ros2_control 4.x | Always check the distro feature matrix above before generating code |
| 3 | Omitting QoS in publisher/subscriber creation | Defaults silently mismatch — publisher sends but subscriber receives nothing | Always specify QoS explicitly; use the QoS defaults table in Principle 6 |
| 4 | Creating a directory inside a non-interfaces package | Builds locally but fails in CI — interface packages need | Put messages in a dedicated package |
| 5 | Hardcoding paths in launch files | Breaks on any other distro or install prefix | Use , , or environment substitutions |
| 6 | Forgetting tags in | works in overlay but and Docker builds fail | Declare every / as in package.xml |
| 7 | Using for rate control in rclpy | Blocks the executor thread; timers and subscriptions stop firing | Use or with a |
| 8 | Not sending zero-velocity on deactivate/shutdown | Robot holds last commanded velocity when the node crashes | Send zero-command in both and the destructor |
| 9 | Mixing and | Kilted deprecated — mixing causes link errors | Use with modern CMake targets for Kilted+; for Humble/Jazzy |
| 10 | Generating / code instead of / | ROS 1 patterns in a ROS 2 context — nothing compiles | This skill is ROS 2 only — always use / APIs |
| 11 | Ignoring parameter in simulation | Real clock diverges from Gazebo clock — tf lookups fail, controllers drift | Set in launch and pass to |
| 12 | Publishing before subscribers connect (no TRANSIENT_LOCAL) | First N messages lost — map, URDF, or initial config never received | Use durability for latched-style data, or publish in with a startup delay |
Maintenance rule: When you encounter a new AI failure pattern while using this skill, append it to this table with the next sequential number. The pitfall list is the single most valuable section for preventing repeated mistakes.
Distro-specific migration notes
When upgrading between distributions, check these breaking changes first:
Foxy → Humble:
- Complete API overhaul. Foxy packages require significant rework.
was not bundled in Foxy — must be built separately.ros2_control- Lifecycle node API stabilized in Humble.
- Action server/client API changed significantly.
Humble → Jazzy:
API changed from 2.x to 4.x —ros2_control
andexport_state_interfaces()
are now auto-generated by the framework. Manual overrides useexport_command_interfaces()
. Seeon_export_state_interfaces()
.references/hardware-interface.md- Handle
deprecated → useget_value()
onget_optional<T>()
/LoanedStateInterface
(controller side). Hardware interfaces useLoanedCommandInterface
/set_state()
/get_state()
/set_command()
helpers with fully qualified names.get_command() - All joints in
tag must exist in the URDF.<ros2_control> - Controller parameter loading changed — use
with spawner.--param-file - Default bag format changed from sqlite3 to MCAP. Use
.storage_id='mcap' - Default middleware changed internal config paths. Regenerate DDS profiles.
schema changes —nav2_params.yaml
renamed torecoveries_server
.behavior_server
replacesROS_AUTOMATIC_DISCOVERY_RANGE
(values:ROS_LOCALHOST_ONLY
,LOCALHOST
,SUBNET
,OFF
).SYSTEM_DEFAULT
actions have new parameter handling — test launch files explicitly.launch_ros
Jazzy → Kilted (non-LTS):
- Zenoh promoted to Tier 1 middleware —
is production-ready. Install:rmw_zenoh
, setsudo apt install ros-kilted-rmw-zenoh-cpp
. Supports router/peer/client modes.RMW_IMPLEMENTATION=rmw_zenoh_cpp - EventsExecutor graduated from experimental — available in
(norclcpp::executors
namespace). Also ported to rclpy.experimental
deprecated — useament_target_dependencies()
with modern CMake targets (e.g.target_link_libraries()
,rclcpp::rclcpp
).std_msgs::std_msgs__rosidl_typesupport_cpp- Multi-bag replay support in
.ros2 bag play - Gazebo Ionic is the paired simulator (Harmonic was Jazzy; Ionic is the Kilted pairing).
ROS 1 → ROS 2:
- See
for a step-by-step strategy.references/migration-ros1.md
Quick reference — ros2 CLI
# Workspace colcon build --symlink-install --packages-select my_pkg colcon test --packages-select my_pkg colcon graph --dot # dependency graph (DOT format) source install/setup.bash # Introspection ros2 node list ros2 topic list -t ros2 topic info /topic_name -v # shows QoS details ros2 topic hz /topic_name ros2 topic bw /topic_name ros2 service list -t ros2 action list -t ros2 param list /node_name ros2 param describe /node_name param ros2 interface show std_msgs/msg/String # ros2_control ros2 control list_controllers ros2 control list_hardware_interfaces ros2 control list_hardware_components # Debugging ros2 doctor --report # alias: ros2 wtf ros2 run tf2_tools view_frames ros2 bag record -a -o my_bag ros2 bag info my_bag ros2 bag play my_bag --clock # Lifecycle ros2 lifecycle list /node_name ros2 lifecycle set /node_name configure ros2 lifecycle set /node_name activate