Like other machine learning algorithms, a reinforcement learning model needs to be trained before it can be used. The training phase centers on exploring the environment and receiving feedback, given specific actions performed in specific circumstances or states.
The life cycle of training a reinforcement learning model is based on the Markov Decision Process, which provides a mathematical framework for modeling decisions. Let's use an autonomous car parking as an example.
A simulator needs to model the environment, the actions of the agent, and the rewards received after each action. A reinforcement learning algorithm will use the simulator to learn through practice by taking actions in the simulated environment and measuring the outcome.
Initialize the environment: This function involves resetting the environment, including the agent, to the starting state. Get the current state of the environment: This function provides the current state of the environment, which will change after each action is performed.
Apply an action to the environment: This function involves having the agent apply an action to the environment - the environment is affected by the action, which may result in a reward.
Calculate the reward of the action: This function is related to applying the action to the environment. The reward for the action and effect on the environment need to be calculated.
Determine whether the goal is achieved: This function determines whether the agent has achieved the goal. In an environment in which the goal cannot be achieved, the simulator needs to signal completion when it deems necessary.
After many cycles of simulation, and the agent taking actions, the Q-table will be trained with sequences of actions that are favourable to get to the destination. Note: Without varying environment data, the model will be overfitted and perform poorly in real-world situations.
Learn more about reinforcement learning in Grokking AI Algorithms with Manning Publications: http://bit.ly/gaia-book, consider following me - @RishalHurbans, or join my mailing list for infrequent knowledge drops: https://rhurbans.com/subscribe.