Adjusting Model Probabilities for Imbalanced Datasets

Imagine you're working on a binary classification problem where the dataset is highly imbalanced: 99.8% of the samples have an outcome of 0, while only 0.2% have an outcome of 1. To address this imbalance, you decide to down-sample the data by randomly selecting 1% of the samples with an outcome of 0 and retaining all samples with an outcome of 1. After training your model on this reduced dataset, you need to predict the probability of an individual having an outcome of 1. How would you recalibrate these probabilities to apply the model effectively to the original, imbalanced dataset?

Adjusting Model Probabilities for Imbalanced Datasets

Problem Description

Answer Panel