Image Details
Caption: Figure 5.
The convergence of the total reward per episode in reinforcement training. The solid blue line shows the mean reward of the training batch, which serves as the baseline. The dashed red and green lines represent the maximum and minimum rewards within the batch, respectively.
© 2025. The Author(s). Published by the American Astronomical Society.