BEGIN:VCALENDAR
VERSION:2.0
PRODID:icalendar-ruby
CALSCALE:GREGORIAN
METHOD:PUBLISH
BEGIN:VTIMEZONE
TZID:Europe/Vienna
BEGIN:DAYLIGHT
DTSTART:20190331T030000
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:20181028T020000
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260705T223206Z
UID:5c6a9f104f1b5454928765@ist.ac.at
DTSTART:20190311T090000
DTEND:20190311T100000
DESCRIPTION:Speaker: Marco Mondelli\nhosted by Dan Alistarh\nAbstract: In t
 his talk\, I present an information-theoretic view of data science based o
 n the study of the following fundamental questions: What is the minimal am
 ount of information necessary to solve an assigned inference problem? Give
 n this minimal amount of information\, is it possible to design a low-comp
 lexity algorithm? What are the trade-offs between the parameters at play (
 e.g.\, dimensionality of the problem\, size of the data sample\, complexit
 y)?As a case study\, I consider the inference problem of fitting data with
  a linear combination of a large number of simple components. This idea li
 es at the core of a variety of methods\, most notably two-layer neural net
 works\, and also kernel regression\, sparse deconvolution and boosting. Mo
 re specifically\, we want to learn a concave function by using bump-like n
 eurons and fitting the centers of the bumps. The resulting optimization pr
 oblem is highly non-convex\, and little is known about convergence propert
 ies of gradient descent methods. Our main result is to show that stochasti
 c gradient descent (SGD) converges to the global optimum at exponential ra
 tes. As a by-product of our analysis\, we accurately track the dynamics of
  the SGD algorithm for a wide class of two-layer neural networks. To do so
 \, we prove that\, as the number of neurons goes large\, the evolution of 
 SGD converges to a gradient flow in the space of probability distributions
 . Furthermore\, as the bump width vanishes\, this gradient flow solves a v
 iscous porous medium equation. The associated risk function exhibits a spe
 cial property\, known as displacement convexity\, and recent breakthroughs
  in optimal transport theory guarantee exponential convergence for this li
 mit dynamics. Numerical simulations demonstrate that the asymptotic theory
  captures well the behavior of stochastic gradient descent for moderate va
 lues of the parameters.
LOCATION:Mondi Seminar Room 2\, Central Building\, ISTA
ORGANIZER:tguggenb@ist.ac.at
SUMMARY:Marco Mondelli: Fundamental Limits and Practical Algorithms in Infe
 rence: From Communication to Learning
URL:https://talks-calendar.ista.ac.at/events/1814
END:VEVENT
END:VCALENDAR