Paper Title
Opportunistic Routing In Cognitive Radio Networks Using Reinforcement Learning

Cognitive radio (CR) technology is rapidly developing these days due to its capability of adaptive learning and reconfiguration. Thus, using Cognitive Radio Networks (CRNs) spectrum efficiency can be increased by allowing the secondary users (SUs) to access the licensed band dynamically and opportunistically without interfering the primary users (PUs). Daniel H. and Ryan W. Thomas, define the CRNs in the context of machine learning as the network which improves its performance through experience gained over a period of time without complete information about the environment in which it operates. Thus, the dynamism and opportunism can be learnt by reinforcement learning, which is concerned with how software agents or learning agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The paper proposes a routing scheme that uses Q-learning, which is the most widely used RL approach in wireless networks. In Q-learning, the learnt action value or Q-value, Q (state, event, action) is updated using the reward and is recorded. For each state-event pair, an appropriate action is rewarded and its Q-value is increased. Hence, the Q-value indicates the appropriateness of an action selection in a state-event pair. At any time instant, an action is chosen by the agent in such a way that it maximizes its Q-value. The reward corresponds to performance metric such as throughput.