In which we cover DeepMind's new paper on AlphaCode, a paper showing the Bellman error is not a good surrogate for value error, and more!
Gradient Update #18: DeepMind's AlphaCode is an Average Programmer, Bellman Error is bad for RL
Gradient Update #18: DeepMind's AlphaCode is…
Gradient Update #18: DeepMind's AlphaCode is an Average Programmer, Bellman Error is bad for RL
In which we cover DeepMind's new paper on AlphaCode, a paper showing the Bellman error is not a good surrogate for value error, and more!