Member-only story

Reinforcement Learning — TD(λ) Introduction(2)

TD(λ) with eligibility trace

Jeremy Zhang
4 min readSep 6, 2019

In last post, we talked about offline-λ, which is an forward view update and we applied it on random walk example. The reason it is called offline is that the update happens at the end of each episode, and the algorithm requires to know the full trace in order to the value function. In this post, we are going to

  1. Introduce semi-gradient TD(λ) which is a more efficient update method
  2. Apply it on random walk example and compare it with offline-λ

TD(λ)

The biggest limitation of offline-λ is the update happens at the end of an episode, considering cases where there is tremendous number of states , or more even, continuing tasks, where there is no ending of an episode. According to Sutton’s book, there are at least 3 advantages of TD(λ) over offline-λ:

  1. it updates the weight vector on every step of an episode rather than only at the end, and thus its estimates may be better sooner.
  2. its computations are equally distributed in time rather than all at the end of the episode.
  3. it can be applied to continuing problems rather than just to episodic problems.

--

--

Jeremy Zhang
Jeremy Zhang

Written by Jeremy Zhang

Hmm…I am a data scientist looking to catch up the tide…

No responses yet