you're trying to use reinforcement learning to build a path planning system for an indoor autonomous robot. You want it to enter a specific room the end-user specifies, so you define a reward function to give a huge positive reward when it enters that room. After training, you notice some strange behaviour… what do you notice?
a. nothing, everything works as intended.
b. the robot avoids the room
c. once the robot enters the room, it never leaves.
d. once it gets to the room, the robot enters and exits the room endlessly
which of the following is false about reinforcement learning?
a. find a model which yields the greatest average expected reward
b. reinforcement learning is a award based learning
c. reinforcement learning is a type of supervised learning
d. reinforcement learning is an online learning