Often when I read ML papers the authors compare their results against a benchmark (e.g. using RMSE, accuracy, …) and say “our results improved with our new method by X%”. Nobody makes a significance test if the new method Y outperforms benchmark Z. Is there a reason why? Especially when you break your results down e.g. to the anaylsis of certain classes in object classification this seems important for me. Or do I overlook something?

  • Jurph@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    I’d be grateful if a paper ran experiments using 5-10 different random seeds and provided the mean and variance.

    Unfortunately most papers are generated using stochastic grad student descent, where the seed keeps being re-rolled until a SOTA result is achieved.