<strong>Paper Title</strong><br>
Bio-Inspired Hyper-Redundant Robotic Arm Control with Hierarchical Deep Reinforcement Learning<br>
<br>

<strong>Abstract</strong><br>
In addition to performing sophisticated locomo-tion, robotic arms with hyper-redundant DOFs can
more effectively circumvent obstacles and more robustly avoid me-chanical failure. Unfortunately, for such a
hyper-redundant robotic arm, self-learning objective-driven behavior with con-ventional reinforcement learning
algorithms proves to be quite challenging. This difficulty stems from extremely large state and action spaces that
often render robust learning of value functions highly ineffective, consequently leading to insufficient policy
exploration. This challenge is reminiscent of the so-called “Curse of Dimensionality” problem due to
exponential explosion of states and actions that entail exponentially more data and computation. In this work,
we draw the inspiration from how an octopus achieves extremely dexterous maneuverability that controls
virtually infinite DOFs. In particular, for an octopus, unlike the centralized encephalization found in humans, its
central brain doesn’t “mico-manage” and issue continuous signals to control each of its arms. Instead, each of
these arms enjoys high degree of autonomy, i.e., they operate on their own volition, thus completely deviating
from human’s centralized and brain-directly limb movement. As such, we devise and implement a layered
learning algorithm that integrates global deep Q-learning and local Q-learning algorithms collaboratively to
effectively control a robotic arm with huge DOFs. Specifically, we construct a global deep Q-network to learn a
policy that generate local objectives over a global objective. Simultaneously, multiple local agents learn a local
policy given each individual local objective. To illustrate the effectiveness of our layered learning scheme, we
implemented a 24-DOF robotic arm that learns its control policy autonomously. We compare the learning
performance of this hyper-redundant robotic arm with our new scheme against the conventional learning
algorithm without hierarchical struc-ture. Our results have shown that, with the same amount of computational
effort, our new scheme have significantly higher learning success rates and much better award convergence.