A type of machine learning where an agent learns to make decisions by receiving rewards or penalties for actions.