Among equally performing models the simples one is the best.
If you want more theory look at statistical learning, eg “Understanding machine learning by shai ben-david”. There the idea is that we have data {(x_1, y_1), …, (x_n, y_n)}, where y_i is given by h(x_i), and we don’t know h, so we want to approximate it using the data. The approximation is selected from a family of functions (hypothesis class) H using a learning algorithm (typically ERM).
Given infinite data, perhaps the best hypothesis class is the one which has the smallest VC dimension and contains the true function h. Then, you can estimate h pretty much perfectly.
Given finite data, the best hypothesis class is perhaps the one whose complexity is just right for the given amount of data and its complexity.
Make Anki cards and repeat them every day. You’d be surprised how much your brain can retain and use if you revise regularly.