Often when I read ML papers the authors compare their results against a benchmark (e.g. using RMSE, accuracy, …) and say “our results improved with our new method by X%”. Nobody makes a significance test if the new method Y outperforms benchmark Z. Is there a reason why? Especially when you break your results down e.g. to the anaylsis of certain classes in object classification this seems important for me. Or do I overlook something?

  • bethebunny@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    While it’s not super common in academia, it’s actually really useful in industry. I use statistical bootstrapping – poisson resampling of the input dataset – to train many runs on financial fraud models and estimate variance of my experiments as a function of sampling bias.

    Having a measure of the variance of your results is critical when you’re deciding whether to ship models whose decisions have direct financial impact :P

    • Lanky_Product4249@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Does it actually work? I.e. if you construct a 95% confidence interval with that variance, are your model predictions within the interval 95% of the time?