I have always wondered how robust XGBoost is to correlation among independent variables. Should one check for multicollinearity before building an XGBoost model? In this post I will cover the impact of correlation on XGboost by using two datasets from Kaggle — Credit Fraud Data and BNP Paribas Cardif Claims Management

Correlation is a statistical measure that expresses the extent to which two variables are linearly related (i.e. they change together at a constant rate). It’s a common tool for describing simple relationships without making a statement about cause and effect. …

