Publications

Modeling Human Beliefs about AI Behavior for Scalable Oversight

We explain how modeling human evaluator beliefs about AI behavior can help to better interpret their feedback.

Leon Lang, Patrick Forré

Factored space models: Towards causality between levels of abstraction

We develop a new foundation for a theory of causality, based on factored space models

Scott Garrabrant, Matthias Georg Mayer, Magdalena Wache, Leon Lang, Sam Eisenstat, Holger Dell

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

We theoretically analyze to what extent an error in a learned reward function translates into regret of resulting policies

Lukas Fluri, Leon Lang, Allesandro Abate, Patrick Forré, David Krueger, Joar Skalse

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback

We theoretically and empirically study safety issues of using RLHF with human evaluators that have limited information

Leon Lang, Davis Foote, Stuart Russell, Anca Dragan, Erik Jenner, Scott Emmons

When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback

Abstract Markov Random Fields

We use the recently generalized Hu Theorem to develop a theory of purely abstract Markov random fields.

Leon Lang, Clélia de Mulatier, Rick Quax, Patrick Forré

Abstract Markov Random Fields

Information Decomposition Diagrams Applied beyond Shannon Entropy: A Generalization of Hu's Theorem

We generalize information diagrams to functions beyond Shannon entropy, including Kolmogorov complexity and the generalization error from machine learning.

Leon Lang, Pierre Baudot, Rick Quax, Patrick Forré

Information Decomposition Diagrams Applied beyond Shannon Entropy: A Generalization of Hu's Theorem

A Program to Build E(N)-Equivariant Steerable CNNs

We propose a general method to implement equivariant convolutional neural networks and demonstrate it for 3D equivariant tasks. The implementation is based on the Wigner-Eckart theorem for steerable kernels.

Gabriele Cesa, Leon Lang, Maurice Weiler

A Wigner-Eckart Theorem for Group Equivariant Convolution Kernels

We generalize the famous Wigner-Eckart theorem from quantum mechanics in order to characterize steerable kernel spaces in representation theoretic terms.

Leon Lang, Maurice Weiler

A Wigner-Eckart Theorem for Group Equivariant Convolution Kernels