Paper Title
Bio-Inspired Hyper-Redundant Robotic Arm Control with Hierarchical Deep Reinforcement Learning

Abstract
In addition to performing sophisticated locomo-tion, robotic arms with hyper-redundant DOFs can more effectively circumvent obstacles and more robustly avoid me-chanical failure. Unfortunately, for such a hyper-redundant robotic arm, self-learning objective-driven behavior with con-ventional reinforcement learning algorithms proves to be quite challenging. This difficulty stems from extremely large state and action spaces that often render robust learning of value functions highly ineffective, consequently leading to insufficient policy exploration. This challenge is reminiscent of the so-called “Curse of Dimensionality” problem due to exponential explosion of states and actions that entail exponentially more data and computation. In this work, we draw the inspiration from how an octopus achieves extremely dexterous maneuverability that controls virtually infinite DOFs. In particular, for an octopus, unlike the centralized encephalization found in humans, its central brain doesn’t “mico-manage” and issue continuous signals to control each of its arms. Instead, each of these arms enjoys high degree of autonomy, i.e., they operate on their own volition, thus completely deviating from human’s centralized and brain-directly limb movement. As such, we devise and implement a layered learning algorithm that integrates global deep Q-learning and local Q-learning algorithms collaboratively to effectively control a robotic arm with huge DOFs. Specifically, we construct a global deep Q-network to learn a policy that generate local objectives over a global objective. Simultaneously, multiple local agents learn a local policy given each individual local objective. To illustrate the effectiveness of our layered learning scheme, we implemented a 24-DOF robotic arm that learns its control policy autonomously. We compare the learning performance of this hyper-redundant robotic arm with our new scheme against the conventional learning algorithm without hierarchical struc-ture. Our results have shown that, with the same amount of computational effort, our new scheme have significantly higher learning success rates and much better award convergence.