The Example of ROC Curve in Performance Evaluation

May 21, 2025

9 min read

Welcome back to the AI Bayeslab Statistics series.

Based on the likelihood, Report Criterion, and d'(d prime), let's establish the "ROC Curve" decision-making space. All we discussed previously pertains to the parameters of signal detection theory. Therefore, we can visualize the signal detection theory "ROC Curve." In other words, our work still falls within the exam's focus on applying signal detection theory in performance evaluation.

So today, we will explore the ROC curve. Following our custom, we will begin by summarizing the key points. Thus, the primary definition we will highlight is the ROC curve.

1.ROC Curve explains

1)ROC Curve Graph

Caption: The above image shows an ROC curve for (d' = 3), with the X-axis representing the False Alarm Rate and the Y-axis representing the Hit Rate.

2)What is the ROC Curve

ROC Curve definition:

The ROC curve, or receiver operating characteristic curve, demonstrates the relationship between the chances of a true positive (probability of Hit) and a false positive (probability of false alarm) using different thresholds for the same stimulus.
ROC curve, also known as the sensitivity curve, means that each point on the same curve reflects the same sensitivity; they all react to the same stimulus but have different thresholds.

First, let's understand the assumption of independence between two metrics in detection theory:

(1) Variations in the β value, reflecting subject response bias, occur independently of d' (which measures object perceptual sensitivity).

This independence enables us to examine how changes in the β value influence the probability of a hit and the chances of a false alarm by creating an ROC curve.

(2) When P(H) matches P(FA), it signifies that there is almost zero objective discriminability (d' = 0).

Let's consider an example from a previous article, fixing (d') at a constant value of 3. At (d' = 3), choosing different thresholds (C) results in a ROC curve, where each point indicates uniform sensitivity, even though the judgment criteria differ.

It's important to note that the previous article aimed to optimize bonus allocation (minimize cost) by adjusting threshold C. Therefore, the definition of C was directly linked to performance scores (e.g., C = 70 or 90 points), based on the following distributions:

High performance: Mean = 85 (σ = 5)
Average performance = 65 (σ = 8)

The corresponding error rate formulas are-

Type I error (false alarm) = 1 - Φ((C-65)/8)
Type II error (miss) ）= Φ((C-85)/5)

In other words, the "C" threshold units used in the previous article were in terms of scores. To plot an ROC curve here, these must be converted into a threshold shift using distribution parameters according to signal detection theory.

In signal detection theory, when plotting ROC curves, the criterion is set to a fixed value (d' = 3) to illustrate the variations in eta and C with differing prior probabilities. In this context, C denotes a standardized decision threshold, measured in (d') units, represented by the formula:

$$ C = -\frac{\ln(\beta)}{d'} $$ ,

At this stage, the "C" is not connected to performance scores; it solely serves as a signal detection theory (SDT) parameter. We can examine the tables prior to and following the conversion:

Pre-conversion Table: Policy Simulation Effects (when (d' = 3.0))

Post-conversion Table: Policy Simulation Effects (when (d' = 3.0))

Threshold(C)	Z_N (Noise)	Z_SN (Signal)	P(Hit)	P(FA)	C_sdt	β (O(SN)/O(N))	Policy Type
55	-1.25	-6.00	1.000	0.894	-3.63	69.63	Extremely Lenient (High ß Error)
60	-0.63	-5.00	1.000	0.736	-2.82	20.42	Very Lenient
65	0.00	-4.00	0.999	0.500	-2.00	7.39	Lenient
70	0.63	-3.00	0.933	0.264	-1.19	2.72	Slightly Lenient
75	1.25	-2.00	0.691	0.106	-0.38	1.44	Neutral
80	1.88	-1.00	0.309	0.030	0.44	0.69	Slightly Strict
85	2.50	0.00	0.067	0.006	1.25	0.27	Strict
90	3.13	1.00	0.001	0.001	2.06	0.10	Very Strict

Note: To facilitate plotting the ROC curve, we have automatically added multiple points here. These data points will form a standard ROC curve for (d' = 3).

3)Visualization: ROC curve Python with AI Agent

We still utilize Bayeslab's AI block and direct prompts to generate the necessary Python code through the AI agent. Moreover, to create a more favorable impression, We have labeled the curve directly according to the leniency of the decision criterion, ranging from the bottom left to the top right. Below, you will find an initial outline of the shape.

4)ROC curve analysis

The visualization of the ROC chart is shown as follows:

Use the false alarm rate (\alpha) as the X-axis and 1-(\beta) (hit rate) as the Y-axis, plotting points corresponding to different criteria (C).
Strict policy → Close to the bottom left (low α Error, low hit rate)
Lenient policy → Close to the top right (high α Error, high hit rate)

In this context, the slope of the curve can be regarded as different likelihood(β values). Observing the ROC curve reveals that:

When β is close to 0, P(H) and P(FA) are almost 1.
When β approaches infinity, P(H) and P(FA) are nearly 0.
As β transitions from close to 0 to approaching infinity, it forms a complete ROC curve.
The curve reaches an optimal point at a specific β value ($$β_{opt}$$), achieving maximum benefit. (The red point we marked in the blow chart)

$$β_{opt}$$: The optimal decision criterion ß_opt

(1) Key Calculation Descriptions

Z-score computation (considering performance distribution):
- Average employees (noise): Mean = 65, σ = 8 →$$z_N = \frac{C - 65}{8} $$
- High performance (signal): Mean = 85, σ = 5 → $$ z_{SN} = \frac{C - 85}{5}$$
Signal Detection Criterion ($$C_{sdt}$$):
- $$C_{sdt} = \frac{z_N + z_{SN}}{2}$$
Hit Rate (P(Hit)) and False Alarm Rate (P(FA)):
- $$P(Hit) = 1 - \Phi(z_{SN})$$
- $$P(FA) = 1 - \Phi(z_N) )$$
- (where $$\Phi$$ denotes the standard normal cumulative distribution function)
Likelihood Ratio (β):
- $$ \beta = e^{-d' \cdot C_{sdt}}$$ (with (d' = 3))
Response Bias Classification:
- β > 1: Lenient criterion (more inclined to be identified as a signal)
- β = 1: Neutral criterion
- β < 1: Strict criterion (more likely to be recognized as noise)

(2) Characteristics of the ROC Curve

Curve Shape: - With False Alarm Rate on the x-axis and Hit Rate on the y-axis, plot points such as (0.006, 0.067) and (0.106, 0.691) to form a convex curve. - AUC ≈ 0.98 (theoretical value for d'=3).
Equal Discriminability: - All points have a d' of 3, indicating constant sensitivity, with only β (or C_sdt) varying.
β and Curve Slope: - The slope of the tangent at each point on the curve = the current β value (e.g., when β=2.72, the slope is 2.72).

Note: AUC stands for "Area Under the Curve." It is the measure of the ability of a classifier to distinguish between classes and is used to summarize the ROC curve's performance.

2.How to apply SDT in Performance Evaluation

1)From Decision Matrix to ROC Curve

Based on the previous article and the first half of this discussion, we have already explored how to derive a Payoff Matrix and subsequently plot the corresponding ROC Curve.

Constructing ROC Space: Plot points corresponding to different criteria (C) with the false alarm rate (α) on the x-axis and hit rate (1-β) on the y-axis.

Strict policy → Points cluster near the bottom-left (low α, low hit rate).

Lenient policy → Points cluster near the top-right (high α, high hit rate).

Managerial Implications:

Curve shape reflects d'→ Higher discriminability (larger d') results in a more convex curve toward the top-left.

Optimal threshold selection → Determine the tangent point based on the company's cost-benefit ratio.

Early-stage companies might choose α=15%, β=10% (rapid expansion phase).

Mature companies might choose α=5%, β=30% (lean management phase).

2)Three-Dimensional Optimization Strategy

Below, we detail the three-step optimization rule, which defines the priority of decision-making strategies:

Improve d' (The Fundamental Solution)

Develop a more precise evaluation system (e.g., OKR + 360-degree feedback).
Extend the evaluation period (to reduce noise from performance fluctuations).

Why is d' the top priority?

Think about this: Waking up after 8 hours of sleep leaves you feeling refreshed and clear-headed. How much more effectively can you read a report in this state compared to when you drag yourself out of bed after only 2 hours, feeling groggy and unfocused? The difference is clear.

Similarly, when a company's daily management communication is unclear, the first step should always be:

Clarifying communication,
Streamlining processes,
Optimizing SOPs (Standard Operating Procedures),
Setting clear standards, and
Conducting regular reviews.

Thus, optimizing d' always comes first.

Improve d' (The Fundamental Solution)

The following is a comparison graph of ROC curves under different d' values generated by the Bayeslab AI module. As shown in the figure, each curve is marked with the dynamic variation pattern of the optimal profit-loss balance point (red dot):

Key Findings:

When d' increases, the curve shifts toward the upper-left, demonstrating:
✅ Significantly higher Hit Rate — More accurate identification of high-performing employees
✅ Synchronously lower False Alarm Rate — Reduced misjudgment of underperformers
✅ Clearer cost-benefit boundary — The optimal threshold (red dot) moves toward higher efficiency at lower cost

Why This Matters:

This reflects true cost reduction and efficiency improvement, achieving higher decision accuracy (hit rate) while minimizing resource waste (false alarms). Only by fundamentally enhancing d' (discriminability) can organizations break free from the trade-off dilemma of "either increasing rewards or tightening thresholds."

Dynamic Adjustment C(Threshold)：

(1) Key Variables:

Budget: The total amount currently available in the bonus pool.
employee_scores: A list of performance scores for all employees (e.g., [85, 72, 90, ...]).
bonus_unit: The unit cost of the bonus per person (a constant, e.g., 50,000 yuan/person).
c: The candidate performance threshold (C), gradually lowered from 100 to 60.

(2) Dynamic Adjustment Logic:

Test thresholds from strict to lenient:
- First, try a high threshold (e.g., C=100), rewarding only the top few outstanding employees.
- If the total expenditure exceeds the budget, gradually lower the threshold (C=99, 98, ...) until the highest threshold satisfies the budget.

Example:

If the budget is 5 million dollars and the bonus unit is 50,000 dollars, a maximum of 100 people can be rewarded.
If C=80 results in 120 qualifying employees (total expenditure = 6 million dollars > 5 million dollars), continue testing C=81 until the maximum C is found where the number of qualifying employees × 50,000 dollars ≤ 5 million dollars.

Fallback Threshold:

Even if the budget is extremely low (e.g., only enough to reward the lowest-performing employees), the final threshold will default to C=60, ensuring at least some employees receive a bonus.

Tiered Bonus Design:

Score Range	Bonus Multiplier	Theoretical Distribution (%)
≥90	2.0x	Top Performers (16%)
80-89	1.2x	High Performers (68%)
70-79	0.6x	Average Performers (27%)

(Note: Percentages may not sum to 100% due to rounding or overlapping criteria.)

Summary: A Quantitative Perspective on Management Decision-Making

When d' cannot be improved: The ROC curve determines the lower bound of error rates.
Policy iteration path:
- First, improve d' (optimize the evaluation system).
- Next, optimize C (select a point on the ROC curve).
- Finally, design the incentive structure (tiered bonuses/non-monetary compensation).

This case study illustrates: The entire management science process, starting with core signal detection parameters (d' and C) and extending to decision space (ROC curve), ultimately informs actionable policies.

Stay tuned, subscribe to Bayeslab, and let everyone master the wisdom of statistics at a low cost with the AI Agent Online tool.

Bayeslab makes data analysis as easy as note-taking!

Bayeslab makes data analysis as easy
as note-taking!

Start Free