In many real-world scenarios, data arise from diverse and complex environments that challenge traditional prediction and estimation methods. This talk explores potential limitations and benefits when trying to integrate heterogeneous data sources to enhance predictive accuracy and treatment effect estimation. In the first part, we discuss the robustness of invariance-based methods when conditions for identifiability are not satisfied and what we can achieve in that setting. In the second part, we show how one can safely use e.g. foundation models trained on external data to achieve efficiency gains. The derived methods in both parts suggest effectiveness on real-world data, and we're hoping to further explore their usefulness in other settings in future collaborations.