Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Machine Learning or Econometrics for Credit Scoring: Let's Get the Best of Both Worlds

Abstract : Decision trees and related ensemble methods like random forest are state-of-the-art tools in the field of machine learning for credit scoring. Although they are shown to outperform logistic regression, they lack interpretability and this drastically reduces their use in the credit risk management industry, where decision-makers and regulators need transparent score functions. This paper proposes to get the best of both worlds, introducing a new, simple and interpretable credit scoring method which uses information from decision trees to improve the performance of logistic regression. Formally, rules extracted from various short-depth decision trees built with couples of predictive variables are used as predictors in a penalized or regularized logistic regression. By modeling such univariate and bivariate threshold effects, we achieve significant improvement in model performance for the logistic regression while preserving its simple interpretation. Applications using simulated and four real credit defaults datasets show that our new method outperforms traditional logistic regressions. Moreover, it compares competitively to random forest, while providing an interpretable scoring function. JEL Classification: G10 C25, C53
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

Cited literature [69 references]  Display  Hide  Download
Contributor : Sullivan Hué <>
Submitted on : Monday, November 2, 2020 - 8:15:51 AM
Last modification on : Tuesday, January 19, 2021 - 3:28:14 AM


Files produced by the author(s)


  • HAL Id : hal-02507499, version 2


Elena Dumitrescu, Sullivan Hué, Christophe Hurlin, Sessi Tokpavi. Machine Learning or Econometrics for Credit Scoring: Let's Get the Best of Both Worlds. 2020. ⟨hal-02507499v2⟩



Record views


Files downloads