Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Tengyu Xu, Shaofeng Zou, and Yingbin Liang. NeurIPS 2019. (Poster PDF · Paper)