Wynn Tee for Quantitative Risk Solutions PLT

Posted on Nov 15, 2021 • Edited on Jan 20, 2022 • Originally published at addin.qrstoolbox.com

How to calculate accuracy ratio in Excel using only a formula

#excel #risk #statistics #datascience

Risk practitioners often use accuracy ratio (AR) to measure the discriminatory power of binary classification models, such as models of credit default and insurance fraud. The closer AR is to 1, the higher the discriminatory power of the model.

Definition

The above diagram shows the cumulative accuracy profiles of a realistic model, a random model, and a perfect model.

As the proportion of observations increases, a perfect model would correctly classify all events before all non-events, but a random model would indiscriminately classify events and non-events together. A realistic model would be somewhere in between.

AR is the ratio of the area between the cumulative accuracy profiles of the realistic and random models (B) to the area between the cumulative accuracy profiles of the perfect and random models (A+B).

Unfortunately, Excel does not come with a native function for calculating AR. It is possible to calculate AR in Excel manually, but the process involves auxiliary rows and columns with complicated formulas that have to be adjusted as observations are added or removed. The problem is exacerbated with the standard error of AR.

QRS.DISC.AR

Fortunately, QRS Toolbox for Excel includes the QRS.DISC.AR function for calculating AR. It is applicable to both grouped and ungrouped data.

To try QRS.DISC.AR yourself, first add QRS Toolbox to your instance of Excel and then open the example workbook.

The workbook contains 2 worksheets. In the UNGROUPED worksheet:

Cells A2–A2001 contain credit scores for 2000 borrowers. The scores range between 0 for least creditworthy and 100 for most creditworthy.
Cells B2–B2001 contain ones if credit default occurred and zeros otherwise.

In the GROUPED worksheet, cells A2–E8 contain data from the previous worksheet grouped into 7 score ranges, each with an alphabetical rating grade.

Ungrouped data example

=QRS.DISC.AR(A2:A2001, B2:B2001)

To calculate AR of the ungrouped data, open the UNGROUPED worksheet and enter the formula =QRS.DISC.AR(A2:A2001, B2:B2001) in cell D1. The result is -0.794, which is generally considered to be a large AR in absolute terms.

The result is negative-valued, because the credit scores and credit default events in this example are negatively correlated by design. In a perfect model, score=0 corresponds to event=1, and score=100 corresponds to event=0.

Significance test

=QRS.DISC.AR(A2:A2001, B2:B2001, "TEST", "RAG")

To determine the statistical significance of the AR, add "TEST", "RAG" to the formula. The result now contains a second row with a red/amber/green rating that summarizes the significance test.

The AR in this example has a green rating. A green/amber rating means the AR is significant at the 5%/10% significance level. A red rating means the AR is not significant at the 10% significance level.

Please read the documentation to learn how to return the p-value and other useful information about the significance test, as well as how to change the significance levels of the ratings.

Labels and transpose

=QRS.DISC.AR(A2:A2001, B2:B2001, "TEST", "RAG", "LABELS", TRUE)

To add labels to the result, add "LABELS", TRUE to the formula. To swap the rows and columns of the result, add "TRANSPOSE", TRUE to the formula.

Grouped data example

=QRS.DISC.AR(B2:B8, D2:E8)

To calculate AR of the grouped data, switch to the GROUPED worksheet and enter the formula =QRS.DISC.AR(B2:B8, D2:E8) in cell F9. The result is -0.797, which is similar to the AR of the ungrouped data.

The TEST, LABELS, and TRANSPOSE options can be used as before.

Final remarks

If you find QRS.DISC.AR useful, please share this page with other potential users.

DEV Community