A type of machine learning where an agent learns to make decisions by receiving rewards or penalties for its actions, aiming to maximize cumulative reward.
Last updated 10 months ago