The Metrics

The Good, the Bad, and the Ugly

2025-11-06

Introduction

  • Understand how performance was evaluated
    • What can be fixed
    • What can’t be fixed
    • Missing pieces
  • Identify possible improvements
  • Prepare for future evaluations

Outline

  1. Background

  2. Data and Analysis (& Flaws)

    1. Distributions
    2. Inference
    3. Observational Units
    4. Ratios
    5. Comparisons
  3. Conclusion

Background

Budget Cuts with Quantitative Data

Reasonable strategy … so where did things go wrong?

Each additional FTE raises the average … significantly

Highly ranked departments are slated for elimination!

Choice of Metrics

  • Evaluate Teaching performance

  • Evaluate Research performance

Rolled into metrics:

  • AAU membership criteria

  • Teaching Expenses

  • Revenue

  • Service

Fac Senate & FS Exec Cmte

  • March - Prog Eval Rubric will be from CCPE

  • April - Broad outline of metrics (the devil is in the details!)

  • May - Timeline, draft metrics presented to DEOs/Deans

  • June - Degree cuts likely, ELT plan due June 30

    • Interim EVC: Commitment to transparency, treating all depts equally
    • FS Exec: “gatekeeping” concerns - DEOs not able to share info w/ faculty
  • July - APC officers notified

  • August - APC meets

    • VC Davis - metric data is proprietary/confidential

Fac Senate & FS Exec Cmte

  • March - Prog Eval Rubric will be from CCPE

  • April - Broad outline of metrics (the devil is in the details!)

  • May - Timeline, draft metrics presented to DEOs/Deans

  • June - Degree cuts likely, ELT plan due June 30

    • Interim EVC: Commitment to transparency, treating all depts equally
    • FS Exec: “gatekeeping” concerns - DEOs not able to share info w/ faculty
  • July - APC officers notified

  • August - APC meets

    • VC Davis - metric data is proprietary/confidential

Checking the Claims

  • Transparent Process

  • Departments were consulted before plan was announced

  • All departments were treated equally

Can’t see/fix the data \(\Rightarrow\) not transparent

Department POV

  • May 15 - Academic Program Metrics discussion

    • Dept Exec Officers (DEOs), Deans, ELT
    • Marked confidential – cannot be shared with faculty
  • August 15 - Metric values provided to Stat DEO Statistics metrics spreadsheet

  • August 18 - Metric definitions provided to Stat Department Faculty

  • Sept 11 - Affected units notified by EVC Button or Interim VC Heng-Moss

  • shortly afterward - Full metrics spreadsheet begins circulating unofficially

  • Oct 1-10 - APC hearings

Timeline

  • \(\approx 140\) days where ELT could have talked to a Statistician
  • Within 29 days, we found a huge number of problems
  • ELT wanted us to have 0 days to examine the “full” metrics data1

Checking the Claims

  • Transparent Process

  • Departments were consulted before plan was announced

  • All departments were treated equally

Can’t see/fix the data \(\Rightarrow\) not transparent

???

Data and Analysis Flaws

                                  🍓                  
                                  🍊        🍑        
    🫐              🍓  🍓  🍓🍓🍊🍊  🍓    🍑🍑                          🍉
    🫐🫐🍇🍇        🍓🍓🍓🍓🍊🍊🍊🍊🍊🍊  🍑🍑🍑🍑🍏                    🍉🍉🍉
  🫐🫐🍇🍇🍇  🍓🍓🍓🍓🍓🍊🍊🍊🍊🍊🍊🍊🍊🍑🍑🍑🍑🍏🍏🍏🍏  🍍🍍      🍉🍉🍉🍉🍉🍉
🫐🫐🍇🍇🍇🍓🍇🍓🍓🍊🍊🍊🍊🍊🍊🍊🍊🍊🍊🍑🍑🍑🍏🍏🍏🍏🍏🍆🍉🍉🍍🍍🍉🍉🍉🍉🍉🍉🍉🍉

Observational Units

Units

Observational unit:
the entity from which data is collected in a study

Analysis unit:
the entity from which data is analyzed in a study

Mapping between these can be tricky
but it is critical to get it right!!

Observation vs. Analysis

  • Instructional metrics: typically observed at the class or instructor level, analyzed by department

  • Research metrics: observed at the faculty level, analyzed by department

Complicated process for how to map individuals to departments by apportionment

But, individuals in a department change over time!

Department Composition

Department Composition

Department Composition

Department Composition

Department Composition

  • Still need to adjust for time since degree

Distributions

What’s in a Distribution?

Distribution of heights in an all-female class. Image Source

Distribution of heights in an all-female class. Image Source
  • Statistical definition: A function that gives probabilities of occurrence for possible events

What’s in a Distribution?

Operationally, to create distributions, we need:

  • Comparable observational units (individuals)
  • Consistent measurement method
  • Reason to think evaluating the group of individuals makes sense

What’s in a Distribution?

What’s in a Distribution?

  • Mixture distributions can be useful in some situations… IF the goal is to understand the whole system

  • They’re less helpful if the goal is ranking groups
    🫐 < 🍇 < 🍓 < 🍊 < 🍑 < 🍏 < 🍆 < 🍍 < 🍉

What’s in a Distribution?

Our actual mixture distribution is more like this:

What’s in a Distribution?

Or this:

What’s in a Distribution?

Or this:

What’s in a Distribution?

If you have to work with a mixture distribution, you must account for lurking variables:

  • instructional norms
    (lab/classroom space, pedagogy, instruction method)
  • amount of grant funding that’s available in the discipline
  • citation accumulation rate…. and many, many more

Statistics can be used to adjust for these factors, but not accounting for structural differences invalidates conclusions

What is in a Z score?

\(Z\)-scores can be used in two different ways:

  1. make metrics on different scales comparable

  2. make observations from different distributions (more) comparable

What is in a Z score?

  • Standardization makes metrics on different scales comparable

What is in a Z score?

  • Standardization makes metrics on different scales comparable

What is in a Z score?

  • Standardization helps with apples to pumpkins comparisons:

What is in a Z score?

  • Standardization within different groups:

What is in a Standardization?

Inference

Metrics Documentation

From Distributions of Instructional Metrics, May 17, 2025 by Jason Casey

With all of these metrics, there is no inferential interpretation being made: the values are known descriptives, rather than estimates of a parameter.

  • OK, but… values don’t come from the same distributions
    • Grad-only (or grad-dominant) programs vs. undergrad programs
    • Grant funding isn’t similar across disciplines
    • Department size matters - excess capacity -> maximize SCH

And we’re back to 🍎 and 🎃. The interpretation is important!

Metrics Documentation

From Distributions of Instructional Metrics, May 17, 2025 by Jason Casey

Our use of the z-score was as a simple scaling mechanism to aid in creating a centroid (i.e., the instructional averages).

Neither positive nor negative z-scores carry any interpretation other than distance from the mean in standard deviations.

A negative score does not imply that a unit does something poorly, nor does a positive imply the opposite interpretation.

Justification

  • Budget cuts based on metrics are fundamentally inferential.
  • Not all inference is directly about parameters.

Justification

Educational administration is one of the lower performing units by UNL metric analysis for instruction and research. The unit was negative on seven of nine instruction metrics and six of eight research metrics.

– Josh Davis, Vice Chancellor, EDAD APC Hearing, Oct 9 2025

Skewed Distributions

Normalized scores should have mean = 0 and variance = 1

“Transparency”

  • Without \(\mu_X\) and \(\sigma_X\), there is no connection between an individual X and its Z-score.

  • Alleged “Z-scores” provided to departments were wrong

It was not possible for departments to vet the data

Skewed Distributions

An exam where 87% score zero is a bad exam.

Skewed?? – Better Metrics

  • Make the metric more inclusive: faculty with other awards are an indication of future potential for highly prestigious awards

(Still) Skewed?? – Better Statistics

  • Use ranks, not raw values

  • Ranks can be converted to Z-scores

  • if denominator is known, Z-scores and percentiles can be converted into ranks

Ratios & Normalization

Ranks and Percentiles

The research performance of the Statistics Department at UNL is ranked 39th best out of 158 Statistics Departments.

  • ranks and percentiles are essentially ratios, and therefore depend very much on the denominator \(n\)

  • interpretation changes drastically depending on the denominator

citations_2014_2023_avg

AcA for 2020 to 2023: 11 faculty, 123 articles with 2,157 citations

Metric Calculation Value Rank Percentile
Citation count per article 2,157 / 123 17.5 38 75.9%
Article count per faculty 123 / 11 11.2 27 82.9%
Citation count per faculty 2,157 / 11 196.1 34 78.5%



citations_2014_2023_avg for Statistics: 311.6167 … ???

… what is the denominator?

citations_2014_2023_avg

AcA for 2020 to 2023: 11 faculty, 123 articles with 2,157 citations
Metric Calculation Value Rank Percentile
Citation count per article 2,157 / 123 17.5 38 75.9%
Article count per faculty 123 / 11 11.2 27 82.9%
Citation count per faculty 2,157 / 11 196.1 34 78.5%
AcA for 2014 to 2023: 11 faculty, 289 articles with 4,670 citations
Metric Calculation Value Rank Percentile
Citation count per article 4,670 / 289 16.2 68 57.0%
Article count per faculty 289 / 11 26.3 21 86.7 %
Citation count per faculty 4,670 / 11 424.5 49 69.0%

citations_2014_2023_avg for Statistics: 311.6167 \(\approx\) 310.3 = (196.1 + 424.5)/2???

Different Disciplines:

different citation cultures 🍎 🎃

Dept Citations Faculty Citations/Faculty Rank out of Percentile
Statistics 2157 11 196.1 34 158 78.5%
Physics 15959 29 550.3 76 127 40.1%


What is in an average?

  • Statisticians like averages: \(X_1, X_2, ... X_n\) with the same mean \(\mu\) and the same variance \(\sigma^2\),

\[ \frac{X_1 + X_2 + ... + X_n}{n} = \bar{X} \ {\dot\sim} \ N\left(\mu, \frac{\sigma^2}{n}\right) \]

  • Averages allow more precision, better predictions, more confident conclusions

  • … so why is dividing by t_tt_headcount_2014_2023_avg a problem?

What is in a ratio?

  • If \(X\) and \(N\) are BOTH random variables, \(\frac{X}{N}\) has ratio distribution:

Any signal gets lost in the variability, metrics do not reflect actual performance

Ratios in the Metrics

  • p1_expenditures_normalized1 = \[ \frac{\text{p1_expenditures_2014_2023_avg}}{\text{t_tt_headcount_2014_2023_avg}} \]

  • total_sponsored_awards_inc_nuf_rsch_pub_serv_teach_avg_awards_budget = \[ \frac{\text{Average_total_sponsored_awards_inc_nuf_rsch_pub_serv_teach_fy2020_fy2024}}{\text{budget_from_evc_file_state_appropriated_budget}} \]

  • instructional_sch_4Y_share_growth =

Change in share (percentage) of total instructional SCH from AY2020 to AY2024


- … every metric was ‘normalized’, most are ratios of RVs rather than averages

What should have been done?

Observational units

  • Research performance measures products by faculty members, one at a time

  • Teaching performance measures classes, one at a time

  • Data based on single products and classes are auditable

But:

  • Internal record keeping seems inconsistent, full of errors
  • Rather than addressing errors in data and biases in metrics, told “nothing will change”

What is in a Cauchy Distribution?

  • Cauchy: X/Y with E[Y] = 0. Division by zero, statistically speaking

  • research_awards_growth_inc_nuf_fy20_fy24 is in plot #6

Comparisons

Scholarly Research Index (SRI)

  • measure created by Academic Analytics to evaluate research performance

  • based on # awards, # books, # citations, # articles, # grants, grant dollars, (clinical) trials and patents

SRI for all Statistics Departments

SRI allows – under certain circumstances – a comparison across fields, SRI percentiles by field would be better

SRI for all Statistics Departments

SRI for all Statistics Departments

SRI Shift for other Departments

“All departments are treated the same way” – Definitely not!

Checking the Claims

  • Transparent Process

  • Departments were consulted before plan was announced ???

  • All departments were treated equally 😬😱

Missed Opportunity

  • Rather than using Public AAU as a measuring stick to punish with

  • Use difference strategically for investment:

    SRI\(_{\text{Public AAU}}\) - SRI\(_{\text{Public}}\)

  • Size of differences indicate AAU relevance

Conclusions

Metrics

  • Statistical fundamentals matter! distributions, observational units, variance, and real-world interpretation

  • The general description of the metrics sounds good, but

    • The implementation is incredibly problematic
  • Inferential intent - using metrics to make these choices

    • Disclaimers don’t fix the results – negative scores were used to infer poor performance

- AAU metrics are like the hard, summative problems on exams – it’s important to measure the precursors

Fixes

  • Use SRI percentile instead of the other research metrics (translate to a z-score if necessary)

  • Don’t take ratios of random quantities

  • Compare like to like – graduate vs. undergraduate credit hours, service vs. major courses.

  • Only evaluate stable programs – use a 5-year grace period (consistent with CCPE guidelines)

- Involve statisticians in the process!!