I have a dataset of two column values something like the one shown below. I need to predict the values of y for values of x greater than 60. The curve must follow the increasing trend it is shown till x=60.

I have tried polynomial regression and SVR but it declines for values greater than 60. I have tried to fit the curve y = alnx + b to this curve but the R2score is 0.94. What model can I train for this purpose, or how can I improve the R2score but regressing over an appropriate logarithmic function?

https://preview.redd.it/f9oxc20zga2c1.png?width=1208&format=png&auto=webp&s=b7918c9d9dd2bb930a2e903483d5a230f2dcfce5

  • PM_ME_YOUR_BAYES@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    The issue here is that you want to extrapolate values outside of the training set (for x>60). You can even get to 0 error, R2=1 on the training data, but it would be meaningless, because you are going to predict outside of this range. If you don’t have data for the range that interests you the best thing you could do is to rely on domain knowledge.

    For example, if you have reason to believe that the function is going to approach an asymptote, you can exploit this knowledge by limiting the class of fitting functions to e.g. parametric sigmoids.

    Or if you know that the process you are modeling has a specific functional type, like logarithmic or squate root, then limit the function space accordingly.

    If you have any other kind of knowledge about your function, it could be used as a prior distribution in a bayesian approach, like bayesian regression or gaussian process

    Bottom line is, there is no magic button “make it work” i ml/statistical modeling, you have to embed your domain knowledge in. The modeling process is not a blind one.

    • ninadsutrave@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Can’t the training set provide data to how the curve seems to be rising (the change in value y for every corresponding change in value x)? And this change is carried forward to all future values of x to go with the trend and obtain the predicted results?

      Thinking out loud with me intuition here. Is there is any model that resembles the above logic?

      • PM_ME_YOUR_BAYES@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        That can be informative, but as I was saying, you have to limit the function space to those compatible with your hypothesis.

        I repeat my question in a clearer way: do you know (or have a guess of) what the function would look like after x=60?

        Since you mentioned the rate of change, have you ever plotted the numerical derivative of this function? Maybe it’s shape has a recognizable shape that could help you in identifying the right class