Follow-Ahead Human-Robot Navigation via MCTS and Deep Reinforcement Learning

A hybrid planner that combines MCTS search with DRL value estimation to maintain follow-ahead behavior while handling obstacles and occlusions.

Abstract

We replicate the MCTS-DRL framework of Leisi azar et al.  for robotic follow-ahead navigation. The method integrates Monte Carlo Tree Search with a pretrained RL value function, using learned value estimates in place of random rollouts to generate consistent short-term navigational goals, with an LSTM-based human action predictor biasing tree expansion toward more probable human futures. We re-implement the full framework from scratch in ROS2, including the MCTS planner, RL value function, and LSTM predictor, and deploy the system on a QBot2e using VICON for pose estimation and Cartographer for mapping. Experiments on a range of trajectories probe follow ahead behavior under controlled conditions.

Key Contributions

  • Re-implements the MCTS-DRL framework of Leisi Azar et al. from scratch in ROS2, including the MCTS planner, RL value function, and LSTM human action predictor.
  • Replaces random rollouts in MCTS with a pretrained RL value function, using learned value estimates to generate consistent short-term navigational goals.
  • Incorporates an LSTM-based human action predictor that biases tree expansion toward more probable human futures during planning.
  • Deploys the full system on a QBot2e platform using VICON for pose estimation and Cartographer for mapping, validating follow-ahead behavior across a range of trajectories.
MCTS diagram
Pseudocode

Method Overview

  1. Represent robot-human state in 2D pose space and evaluate three discrete actions at each 0.5 s planning step.
  2. Train DDQN in obstacle-free simulation so Q-values capture follow-ahead quality from relative pose observations.
  3. Expand MCTS over a 3-second receding horizon while pruning candidate nodes that collide or create occlusions.
  4. Select the highest-value leaf as the short-term goal and execute with the ROS navigation stack.

Experimental Setup & Results

We evaluate in a ROS2 Humble simulation with fake_vicon, fake_odom, and rviz2. The MCTS planner runs with UCB node selection over a 3-second horizon.

Poster-reported directional prediction accuracy:

Trajectory Type Accuracy
Straight 96.1%
Left Turn 75.8%
Right Turn 77.4%
Overall 90.1%
  • Paper-reported comparison: MCTS-DRL outperforms standalone MCTS and DRL in circular and S-shaped simulation trajectories.
  • In obstacle-free real-world tests, MCTS-DRL achieves comparable follow-ahead distance/orientation behavior to LBGP.
  • In obstacle-present settings, the robot adapts path choices to avoid both collisions and occlusions.

ROS1 to ROS2 Migration

  • This replication ports a ROS1-oriented methodology into a ROS2 Humble simulation workflow.
  • The core planning logic (MCTS expansion, DRL value lookup, collision/occlusion pruning) is preserved while interfaces are adapted to ROS2 nodes and topics.
  • Hardware deployment is validated on a QBot robot using VICON pose estimation and Cartographer mapping
Quanser QBot platform used for planned deployment

Quanser QBot platform for planned real-world validation.

Demo Scenarios

Video