Credit decision support based on real set of cash loans using integrated machine learning algorithms
Artykuł - publikacja recenzowana
Abstrakt
en
One of the important research problems in the context of financial institutions is the
assessment of credit risk and the decision to whether grant or refuse a loan. Recently, machine
learning based methods are increasingly employed to solve such problems. However, the selection of
appropriate feature selection technique, sampling mechanism, and/or classifiers for credit decision
support is very challenging, and can affect the quality of the loan recommendations. To address this
challenging task, this article examines the effectiveness of various data science techniques in issue of
credit decision support. In particular, processing pipeline was designed, which consists of methods
for data resampling, feature discretization, feature selection, and binary classification. We suggest
building appropriate decision models leveraging pertinent methods for binary classification, feature
selection, as well as data resampling and feature discretization. The selected models’ feasibility
analysis was performed through rigorous experiments on real data describing the client’s ability for
loan repayment. During experiments, we analyzed the impact of feature selection on the results of
binary classification, and the impact of data resampling with feature discretization on the results of
feature selection and binary classification. After experimental evaluation, we found that correlation-
based feature selection technique and random forest classifier yield the superior performance in
solving underlying problem.
assessment of credit risk and the decision to whether grant or refuse a loan. Recently, machine
learning based methods are increasingly employed to solve such problems. However, the selection of
appropriate feature selection technique, sampling mechanism, and/or classifiers for credit decision
support is very challenging, and can affect the quality of the loan recommendations. To address this
challenging task, this article examines the effectiveness of various data science techniques in issue of
credit decision support. In particular, processing pipeline was designed, which consists of methods
for data resampling, feature discretization, feature selection, and binary classification. We suggest
building appropriate decision models leveraging pertinent methods for binary classification, feature
selection, as well as data resampling and feature discretization. The selected models’ feasibility
analysis was performed through rigorous experiments on real data describing the client’s ability for
loan repayment. During experiments, we analyzed the impact of feature selection on the results of
binary classification, and the impact of data resampling with feature discretization on the results of
feature selection and binary classification. After experimental evaluation, we found that correlation-
based feature selection technique and random forest classifier yield the superior performance in
solving underlying problem.