The widespread use of powerful edge devices, such as smartphones, has led to large scale decentralized data generation. Since this data is often sensitive, it cannot be centrally collected, posing challenges to traditional machine learning, which relies on centralized datasets. Federated learning (FL) addresses this by training models locally on devices and only sharing updates, preserving privacy. However, FL faces key challenges including data and system heterogeneity, high communication costs, and limited device resources. This thesis presents a range of methods to improve federated learning, with a primary focus on handling data heterogeneity under realistic computational and communication constraints. In this talk we present approaches that explicitly model and adapt to client diversity, as well as methods that personalize models to individual clients using hypernetworks.