Polycyclic aromatic systems (PASs) present a seemingly insurmountable challenge: vast chemical spaces, complex electronic structures, and elusive aromatic properties. Our mission, should we choose to accept it, is to harness the power of deep learning to decode these molecular mysteries. In this talk, we embark on a journey through this complex chemical space, combining traditional computational methods with cutting-edge artificial intelligence tools. We demonstrate how neural networks can be trained to predict electronic properties with unprecedented speed and accuracy. More importantly, we show how they can be used interpretably to extract chemical insight.
To enable the application of such techniques to the design of novel functional PASs, we established the COMPAS Project -- a COMputational database of Polycyclic Aromatic Systems – which already contains ~1 million molecules in three datasets.1–4 We also developed two new types of molecular representation to enable efficient and effective machine- and deep-learning models to train on the new data: a) a text-based representation5 and b) a graph-based representation.6 By analyzing thousands of PAS structures with our dedicated representations, our AI agents not only achieve higher predictive ability with fewer data but have also uncovered hidden patterns and structure-property relationships that traditional methods might have missed.
Finally, we implemented the first guided diffused-based model for inverse design of PASs: GaUDI.7 Our model generates new PASs with defined target properties. In addition to its flexible target function and high validity scores, GaUDI also accomplishes design of molecules with properties beyond the distribution of the training data.
From small benzenoid systems to extended graphene-like structures, we show how deep learning can navigate this complex chemical landscape, offering new insights into molecular design and property prediction. This talk will not self-destruct in five seconds, but it will revolutionize how we think about combining artificial intelligence with molecular science.
(1) Wahab, A.; Pfuderer, L.; Paenurk, E.; Gershoni-Poranne, R. The COMPAS Project: A Computational Database of Polycyclic Aromatic Systems. Phase 1: Cata-Condensed Polybenzenoid Hydrocarbons. J. Chem. Inf. Model. 2022, 62 (16), 3704. https://doi.org/10.1021/acs.jcim.2c00503.
(2) Mayo Yanes, E.; Chakraborty, S.; Gershoni-Poranne, R. COMPAS-2: A Dataset of Cata-Condensed Hetero-Polycyclic Aromatic Systems. Sci. Data 2024, 11 (1), 97. https://doi.org/10.1038/s41597-024-02927-8.
(3) Chakraborty, S.; Yanes, E. M.; Gershoni-Poranne, R. Hetero-Polycyclic Aromatic Systems: A Data-Driven Investigation of Structure–Property Relationships. Beilstein J. Org. Chem. 2024, 20 (1), 1817–1830. https://doi.org/10.3762/bjoc.20.160.
(4) Wahab, A.; Gershoni-Poranne, R. COMPAS-3: A Dataset of Peri-Condensed Polybenzenoid Hydrocarbons. Phys. Chem. Chem. Phys. 2024, 26 (21), 15344–15357. https://doi.org/10.1039/D4CP01027B.
(5) Fite, S.; Wahab, A.; Paenurk, E.; Gross, Z.; Gershoni-Poranne, R. Text-Based Representations with Interpretable Machine Learning Reveal Structure-Property Relationships of Polybenzenoid Hydrocarbons. J. Phys. Org. Chem. 2022, e4458. https://doi.org/10.1002/poc.4458.
(6) Weiss, T.; Wahab, A.; Bronstein, A. M.; Gershoni-Poranne, R. Interpretable Deep-Learning Unveils Structure–Property Relationships in Polybenzenoid Hydrocarbons. J. Org. Chem. 2023, 88 (14), 9645–9656. https://doi.org/10.1021/acs.joc.2c02381.
(7) Weiss, T.; Mayo Yanes, E.; Chakraborty, S.; Cosmo, L.; Bronstein, A. M.; Gershoni-Poranne, R. Guided Diffusion for Inverse Molecular Design. Nat. Comput. Sci. 2023, 3 (10), 873–882. https://doi.org/10.1038/s43588-023-00532-0.