Leon Lang
Leon Lang
Home
Publications
Blog
Contact
Light
Dark
Automatic
3
Modeling Human Beliefs about AI Behavior for Scalable Oversight
We explain how modeling human evaluator beliefs about AI behavior can help to better interpret their feedback.
Leon Lang
,
Patrick Forré
Cite
arXiv
Factored space models: Towards causality between levels of abstraction
We develop a new foundation for a theory of causality, based on factored space models
Scott Garrabrant
,
Matthias Georg Mayer
,
Magdalena Wache
,
Leon Lang
,
Sam Eisenstat
,
Holger Dell
Cite
arXiv
The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret
We theoretically analyze to what extent an error in a learned reward function translates into regret of resulting policies
Lukas Fluri
,
Leon Lang
,
Allesandro Abate
,
Patrick Forré
,
David Krueger
,
Joar Skalse
Cite
arXiv
Abstract Markov Random Fields
We use the recently generalized Hu Theorem to develop a theory of purely abstract Markov random fields.
Leon Lang
,
Clélia de Mulatier
,
Rick Quax
,
Patrick Forré
Cite
arXiv
Cite
×