Meet new GiniMachine
Get a free 30-day trial
Meet new GiniMachine
Get a free 30-day trial
Machine Learning Model Evaluation

Machine Learning Model Evaluation: Gini Index | ROC-AUC | Kolmogorov – Smirnov Score

Say you have built your first machine-learning model and are ready to use it for your business. But how to evaluate its quality and performance? How to be sure it will make accurate predictions?

Let’s take a look at how to find answers to these questions in the GiniMachine interface.

But first…

Why Does Model Performance Matter?

The final model quality has a direct impact on the business decisions you make. Let’s say you want to prioritize your potential clients using a data-driven approach. You build a machine-learning model and use it to score incoming requests. Thus, your sales managers first reach out to the most valuable ones. If your model has high quality and predictive power, this process helps you optimize workflow and save time and money. The higher the model performance is, the more trustworthy predictions are. 

What is Accuracy in Machine Learning?

Accuracy or error rate in machine learning is a metric that is used to evaluate algorithms splitting data into two or more categories, also known as classes. 

For example, you build a model to determine bad and good loan applications. Such a model is a binary classification as it splits all the applications into two groups. And the performance of this model is determined by how good it can guess which borrowers will repay in time and which won’t.

Accuracy is one of the simplest ways to measure model performance. Mathematically it represents the percentage of right predictions and is calculated by the formula:

Accuracy = Correct Predictions / Total Predictions

For binary classification, when you have only 2 categories, you can also use the confusion matrix as a bit more precise metric:

Accuracy = (TP + TN)/(TP+TN+FP+FN)


TP – True Positives – right prediction of trustworthy borrowers;

TN – True Negatives – right prediction of loans that won’t pay off;

FP – False Positives – mistake when the model classified the loan as good, but in reality, it was not returned;

FN – False Negatives – mistake when the model classified the loan as bad, but in reality, it was trustworthy.

In GiniMachine, we use the chart representation of this matrix:

Density distribution 
by classes

Using this graph, you can check how well the model can predict the desired results and how often it makes mistakes. The blue curve represents all the positive records, and the gray one shows negative records. As you can see in the example above, all the bad cases had scores mainly between 0 and 0.2, while the good cases had scores from 0.7 to 0.9. However, you can also notice that some parts of these curves overlap each other. That means that the model assigned high scores to the cases that weren’t reliable in reality and low scores for trustworthy ones.

Evaluating model performance with the help of supervised learning is very common. However, this metric alone usually is not enough to be 100% sure you can trust the model predictions. There are many other measures of accuracy in machine learning that are used to evaluate the prediction power of the model, for example:

  • Precision
  • Recall
  • AUC/ROC curve
  • F-score
  • Kolmogorov–Smirnov test
  • Gini Index, and more.

Today we won’t cover them all but dive a bit more into details of the AUC/ROC curve, Gini Index, and Kolmogorov–Smirnov score, as these metrics are used to evaluate binary classification algorithms. Moreover, you can find them in the GiniMachine interface.

ROC & AUC Curve

The Receiver Operating Characteristic Curve is also known as the ROC curve. This plot represents the dependency between True Positive (TPR) and False Positive Rates(FPR).

These rates are calculated using the formulas:

TPR = TP / (TP + FN)

FPR = FP / (TN + FP) 


TP – True Positives, 

TN – True Negatives,

FP – False Positives,

FN – False Negatives.

AUC stands for the area under the curve of ROC and can be anything between 0 and 1. The results are usually interpreted as follows:

  • AUC 0 – the model made a mistake in 100% of cases and predicted exactly the opposite results than it was in reality; 
  • AUC 0.5 – the model has no predictive power, it makes random predictions;
  • AUC 1.0 – the ideal model with the ideal measure of separability.

So by standards, if the model has AUC above 0.5, then it has sufficiently high quality and has a potential for implementation. And the closer this index is to 1, the better.

Working with GiniMachine, you don’t need to make all these calculations yourself. After the model is built, you can simply expand the model details and see the AUC and the chart:

roc-auc chart

The dotted line is a plot for the AUC value of 0.5. If the сhart is exactly on this line, it means that the model has the same predictive power as flipping a coin. If the ROC curve of your model is above the dotted line and bent to the upper left corner, it can be considered a good model. For commercial implementation, we recommend using models that have at least AUC 0.75.

Kolmogorov – Smirnov Score

Another way to evaluate the accuracy of the machine learning model is by running a Kolmogorov-Smirnov (K-S) test.

This test is more complicated but more powerful and in combination with the ROC curve allows us to evaluate model performance more precisely.

In a nutshell, after the model is built, we run tests to see the probability of getting a good and a bad prediction using testing data. Then, we build graphs (CDFs – cumulative distribution functions) and calculate the biggest distance between a good prediction and a bad prediction curve. This number, also called the K-S score, can be anything between 0 and 1:

Kolmogorov – Smirnov score

It is enough to know that the higher this score is, the better to interpret the results.

When you evaluate the prediction power of the model built by GiniMachine, it’s important to analyze the scores you get together. For example, if you see the ROC AUC score of 0.7, you may think the model is good enough. But at the same time, you can get a K-S score of 0.21, which points to a problem with the dataset.

By analyzing these three graphs in the GiniMachine interface altogether, you can get a deep understanding of the model quality and prediction power. Also, you can determine at what level the cutoff selection should be set to minimize risks for your business when you implement decision-making automation.

Gini Index

Last but not least, the metric we want to mention is the Gini Index. This index was previously used in economics only, but it was discovered that it has a close connection with the ROC AUC score. Gini Index can be anything between -1 and 1, and a 0 score equals 0.5 ROC AUC score, which makes it easier to understand. For commercial purposes, it is recommended to use models that have a Gini Index of 0.6 and higher, which equals to 0.8 ROC AUC score.

Due to the ease of interpretation, this index is currently the main parameter for determining the model quality.

The Accuracy Paradox

When talking about accuracy scores, it’s important to mention the accuracy paradox. Even though the indexes we use to evaluate ML model performance can give a general idea of model prediction power, they can also be misleading. This happens when we use imbalanced data for building a model. 

In the ideal case, you should use a balanced dataset that has an equal amount of positive and negative values. Unfortunately, it’s rarely possible to get enough data to compose such a dataset. Especially since, in some cases, this disbalance is quite natural. For example, if you try to guess what transaction looks suspicious, you end up with thousands of transactions, and only a few of them are really dangerous. 

Even though collecting more data is still the most efficient way to deal with an imbalanced dataset, there is something you can do. There are systems (and GiniMachine is one of them) that can build reliable models even with low data heterogeneity. So if collecting more data is time-consuming and expensive for your business, you can start building ML models with the information you already have using these advanced platforms.

Today we only briefly touched on the topic of model efficiency, but this knowledge will already be enough for you to build your first model using GiniMachine. The system will do all the calculations, and provide you with all the charts, so you can easily and quickly evaluate the results thanks to the user-friendly interface. Sign in and start a free trial to build your first model with GiniMachine.

Related Articles