Katachi

Katachi logo

Katachi (Alfonzo, Iyer et al. 2024) is a project with the fantastic Juan Alfonzo aimed at understanding the relation between how galaxies look (i.e., their morphology) and how they formed (i.e., their star formation histories).

It does this by training a neural network to predict the star formation histories of galaxies in the SDSS-MaNGA survey from their gri imaging (as opposed to fitting their full spectra). We use MaNGA because they represent some of the most pristine data we have for recovering stellar populations in galaxies, with resolved IFU spectroscopy that allows us to get past traditional concerns like outshining (by being able to resolve the young and old light coming from different parts of the galaxy) and parameter degeneracies (by having extremely high-resolution and high-SNR spectra).

The fact that this works at all means that the morphologies of galaxies contain equivalent information about stellar assembly to what we can get from galaxy spectra. And while this is scientifically interesting, it is also very encouraging for future telescopes like Rubin and Euclid that will have a really wide footprint (i.e. will see many many galaxies) in a small number of filters, since it will enable us to do science with star formation histories using the data from these telescopes.

But the point of Katachi was not to just predict the SFHs, but to understand what the morphological imprints are that allowed the network to make this prediction in the first place. This motivation comes from a place of not wanting methods like Katachi to be pure predictive black boxes, but instead ML-enabled tools to perform model building utilizing all the information available from current and upcoming surveys (as opposed to throwing away information using summary statistics).

To build in the explainable (XAI) aspect into Katachi, we experimented with a whole bunch of gradient based and other saliency methods to map the dependence of the final output back to its input images, and is perhaps the step that took the longest amount of time in the project. Ultimately, we settled on Shapley Additive Explanations (SHAP, see an excellent explanation in Christoph Molnar's book), which essentially produces a map for each galaxy that shows which parts of the input image drive the prediction higher or lower. As an example, it will correctly identify clumps as contributing to higher SFR and younger ages in galaxies, and bulges as contributing to lower SFR / older ages.