Hey guys, begginers doubt:

I am preparing a dataframe for a machine learning model. The purpose of the model is to predict whether people infected with COVID will die or not.

To do this, I am looking for some conditions and symptoms, such as sore throat, cough, comorbidities, gender, and others, and binarizing them into “yes” or “no” or “male” and “female”.

I have a problem. One of the variables is “pregnant”, but only individuals of the female sex can be pregnant. How can I deal with this variable?

Can I keep it in the dataframe and assign the value “not pregnant” to all male individuals? Or could this harm the model?

  • tw_f@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I’d just set it to False when sex=“M” as men can’t get pregnant and if you have a True value anywhere in that column it’s probably an annotation error.

    This is the most pragmatic solution I guess.