UBC Rover - Autonomous Arm Control

Table of Contents

This is a single project focused on autonomous arm control for keypad/keyboard manipulation. The sections below are different workstreams inside the same effort. All work documented here is still actively in progress; policies are under training and results will be updated as they mature.

Overview
#

I built an autonomy stack for a rover-mounted robotic arm with one clear target: make sim-trained policies that can survive contact with real hardware.

Workstreams (one project):

What I Built
#

A Windows + WSL2 Docker workflow for reproducible sim + training.
SB3 PPO/SAC baselines for early policy experiments.
An imitation-learning bootstrap from teleop demonstrations.
RKLB as a cleaner framework extraction with the Omega decider.
RLKit trials plus rl_sar runtime integration work.
Alpha + Omega: the integrated navigation + press policy flow.

Windows Docker
#

This exists to kill “works on my machine” problems. I wired the stack to run cleanly under Windows + WSL2 with a Docker-first workflow so new contributors can get to a running simulator without weeks of setup pain. The goal is a reliable, documented path from fresh machine to a working training run.

SB3 PPO and SAC
#

This is the baseline RL track. I used SB3 PPO and SAC to build early policies, validate reward shaping, and get clean training/eval loops before moving to the more custom RLKit stack. It is intentionally practical: quick iteration, straightforward checkpoints, and easy comparison between runs.

Imitation Learning Trial
#

Imitation learning is the bootstrap. I collected teleop demonstrations and trained a behavior-cloned policy to give RL a reasonable starting point for contact-rich manipulation. The current dataset includes a real teleop capture (teleop_20260302_014016.parquet) with 2,850 rows.

Dashboard showing dataset stats (2850 rows, 101s span) and end-effector XY coverage scatter plot — **Imitation dataset overview.** 2,850 teleop rows with end-effector XY coverage from the bootstrap pass.

Starting RKLB
#

RKLB is the framework extraction. It pulls the Omega decider state machine and backend abstraction into a cleaner, reusable package so the decision logic is separate from the low-level control plumbing. This is early-stage but already useful as a stable base for integrating Alpha/Omega style policies.

RLKit Trial and rl_sar Trial
#

This is the exploratory branch: custom reach/keypad environments and RLKit training in rover2026_rlkit, plus rl_sar used as a runtime/deployment layer rather than the primary trainer. RLKit is where the stronger simulator integration lives today; rl_sar is about making those policies deployable in a rover stack.

Alpha and Omega
#

Alpha + Omega is the integrated system that makes the behavior feel intelligent. Alpha handles navigation to the correct key. Omega executes the press with stabilized end-effector control. The split keeps the system understandable and makes failures diagnosable.

Six-frame contact sheet showing the robotic arm navigating to and pressing a key in simulation — **Alpha + Omega contact sheet.** Six frames from a training evaluation: navigate, stabilize, and press.

Current Status
#

Working in simulation with growing policy reliability.
Hardware validation is the next checkpoint once the simulator policies are consistent enough.
Main gaps are robustness across varied starting poses and contact sensitivity during the press.
This project targeted the URC task, but we didn’t make it into the competition. I may update this page later with a working version.

Stack
#

Python
ROS2
MuJoCo
Stable-Baselines3 (PPO, SAC)
RLKit (trial branch)
rl_sar (runtime/deployment layer)
PyTorch
Docker + WSL2

Overview #

What I Built #

Windows Docker #

SB3 PPO and SAC #

Imitation Learning Trial #

Starting RKLB #

RLKit Trial and rl_sar Trial #

Alpha and Omega #

Current Status #

Stack #

Related