Often when I read ML papers the authors compare their results against a benchmark (e.g. using RMSE, accuracy, …) and say “our results improved with our new method by X%”. Nobody makes a significance test if the new method Y outperforms benchmark Z. Is there a reason why? Especially when you break your results down e.g. to the anaylsis of certain classes in object classification this seems important for me. Or do I overlook something?

  • chief167@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    a combination of different factors:

    • it is not thought in most self-educated programs.
    • therefore most actually don’t know that 1) it exists 2) how to do it 3) how to do power calculations
    • since most don’t know it, there is no demand for it
    • costs compute time and resources, as well as human time, so it’s skipped if nobody asks for it
    • there is no standardized approach for ML models. Do you vary only the training, how to partition your dataset? there is no sklearn prebuilt stuff either