What functions do self-attention blocks prefer to represent?

Date

Monday, December 13, 2021 16:00 - 18:00

Speaker

Surbhi Goel (Microsoft)

Location

Zoom Link: https://istaustria.zoom.us/j/94397239114?pwd=Q0JDSTg1bkpVUDc5TXlZWG1paWpUdz09 Meeting ID: 943 9723 9114 Passcode: 621023

Series

Seminar/Talk

Tags

ELLIS talk

Host

Marco Mondelli

Contact

Ksenja Harpprecht

Self-attention, an architectural motif designed to model long-range interactions in sequential data, has driven numerous recent breakthroughs in natural language processing and beyond. In this talk, we will focus on studying the inductive bias of self-attention blocks by rigorously establishing which functions and long-range dependencies they statistically represent. Our main result shows that bounded-norm Transformer layers can represent sparse functions of the input sequence, with sample complexity scaling only logarithmically with the context length. Furthermore, we propose new experimental protocols to support this analysis, built around the large body of work on provably learning sparse Boolean functions.

Based on joint work with Benjamin L. Edelman, Sham Kakade and Cyril Zhang.

Download ICS Download invitation

Back to eventlist

Upcoming Talks

What functions do self-attention blocks prefer to represent?