1

When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback

We theoretically and empirically study safety issues of using RLHF with human evaluators that have limited information

Leon Lang, Davis Foote, Stuart Russell, Anca Dragan, Erik Jenner, Scott Emmons

When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback

Evaluating Shutdown Avoidance of Language Models in Textual Scenarios

We analyze in textual scenarios whether language models show the instrumental reasoning to avoid shutdown

Teun van der Weij, Simon Lermen, Leon Lang

Last updated on Jul 3, 2023

A Program to Build E(N)-Equivariant Steerable CNNs

We propose a general method to implement equivariant convolutional neural networks and demonstrate it for 3D equivariant tasks. The implementation is based on the Wigner-Eckart theorem for steerable kernels.

Gabriele Cesa, Leon Lang, Maurice Weiler

A Wigner-Eckart Theorem for Group Equivariant Convolution Kernels

We generalize the famous Wigner-Eckart theorem from quantum mechanics in order to characterize steerable kernel spaces in representation theoretic terms.

Leon Lang, Maurice Weiler

A Wigner-Eckart Theorem for Group Equivariant Convolution Kernels

Learning to Request Guidance in Emergent Communication

We analyze the training behaviour of an agent that can ask for help. Doing this is costly, and so the agent learns to become more independent in familiar situations.

Benjamin Kolb, Leon Lang, Henning Bartsch, Arwin Gansekoele, Raymond Koopmanschap, Leonardo Romor, David Speck, Mathijs Mul, Elia Bruni

Last updated on Feb 27, 2022

Learning to Request Guidance in Emergent Communication