mapping that assigns a scalar value (reward) to each state-action pair; \gls{fnReward} induces the sequence of rewards , where is a \gls{rv} with expected value