• 0 Posts
  • 1 Comment
Joined 1 year ago
cake
Cake day: November 26th, 2023

help-circle
  • It likely won’t matter. Most models (e.g. I’m guessing something like xgboost) can deal robustly with these types of correlations.

    If you like, you can combine the two into a single variable and may get slightly improved performance (0 for male, 1 for female and 2 for pregnant female) assuming the dataset can fit the rule (e.g. trans men). This way, a tree-based model could draw a boundary between 0 and 1 based on gender or 1 and 2 based on pregnancy.