Image Details
Caption: Figure 5.
Flow chart of the SB generative process. The main ViT takes the partially generated patch from the previous time step as input to yield the next-time-step patch (see Appendix B for details of the VAE). Three pieces of conditional information are incorporated: the encoded generative process time step, an embedding of the a priori topography via another ViT, and an embedding of the set of images via a set convolutional encoder and a ViT. This conditional information (denoted “context”) is processed by the ViT in each time step (see Appendix C for a detailed description of the network architecture).
© 2026. The Author(s). Published by the American Astronomical Society.