Follow-Ahead Human-Robot Navigation via MCTS and DRL

Abstract

We replicate the MCTS-DRL framework of Leisi azar et al. for robotic follow-ahead navigation. The method integrates Monte Carlo Tree Search with a pretrained RL value function, using learned value estimates in place of random rollouts to generate consistent short-term navigational goals, with an LSTM-based human action predictor biasing tree expansion toward more probable human futures. We re-implement the full framework from scratch in ROS2, including the MCTS planner, RL value function, and LSTM predictor, and deploy the system on a QBot2e using VICON for pose estimation and Cartographer for mapping. Experiments on a range of trajectories probe follow ahead behavior under controlled conditions.

Key Contributions

Re-implements the MCTS-DRL framework of Leisi Azar et al. from scratch in ROS2, including the MCTS planner, RL value function, and LSTM human action predictor.
Replaces random rollouts in MCTS with a pretrained RL value function, using learned value estimates to generate consistent short-term navigational goals.
Incorporates an LSTM-based human action predictor that biases tree expansion toward more probable human futures during planning.
Deploys the full system on a QBot2e platform using VICON for pose estimation and Cartographer for mapping, validating follow-ahead behavior across a range of trajectories.

Method Overview

Represent robot-human state in 2D pose space and evaluate three discrete actions at each 0.5 s planning step.
Train DDQN in obstacle-free simulation so Q-values capture follow-ahead quality from relative pose observations.
Expand MCTS over a 3-second receding horizon while pruning candidate nodes that collide or create occlusions.
Select the highest-value leaf as the short-term goal and execute with the ROS navigation stack.

Experimental Setup & Results

We evaluate in a ROS2 Humble simulation with fake_vicon, fake_odom, and rviz2. The MCTS planner runs with UCB node selection over a 3-second horizon.

Poster-reported directional prediction accuracy:

Trajectory Type	Accuracy
Straight	96.1%
Left Turn	75.8%
Right Turn	77.4%
Overall	90.1%

Paper-reported comparison: MCTS-DRL outperforms standalone MCTS and DRL in circular and S-shaped simulation trajectories.
In obstacle-free real-world tests, MCTS-DRL achieves comparable follow-ahead distance/orientation behavior to LBGP.
In obstacle-present settings, the robot adapts path choices to avoid both collisions and occlusions.

ROS1 to ROS2 Migration

This replication ports a ROS1-oriented methodology into a ROS2 Humble simulation workflow.
The core planning logic (MCTS expansion, DRL value lookup, collision/occlusion pruning) is preserved while interfaces are adapted to ROS2 nodes and topics.
Hardware deployment is validated on a QBot robot using VICON pose estimation and Cartographer mapping

Quanser QBot platform used for planned deployment

Quanser QBot platform for planned real-world validation.

Follow-Ahead Human-Robot Navigation via MCTS and Deep Reinforcement Learning

Abstract

Key Contributions

Method Overview

Experimental Setup & Results

ROS1 to ROS2 Migration

Demo Scenarios

Video