A very simple property from calculus appears frequently in machine learning, statistics, reinforcement learning and other fields. It is the statement that for any function \(f(x)\), the following equality holds:
\[\nabla_x f(x) = f(x) \nabla_x \log f(x)\]The chain rule tells us that for any function \(f(x)\):
\[\nabla_x \log f(x) = \frac{1}{f(x)} \nabla_x f(x)\]Rearranging, we obtain the desired equality:
\[f(x) \nabla_x \log f(x) = \nabla_x f(x)\]In reinforcement learning, policy-based RL relies on the policy-gradient theorem, which uses the log derivative trick.