total discounted reward obtained from time t onwards, given \glsxtrshort{mdp} and policy \gls{policy}