The Good, the Bad, and the Ugly
2025-11-06
Background
Data and Analysis (& Flaws)
Conclusion
Reasonable strategy … so where did things go wrong?
Evaluate Teaching performance
Evaluate Research performance
Rolled into metrics:
AAU membership criteria
Teaching Expenses
Revenue
Service
March - Prog Eval Rubric will be from CCPE
April - Broad outline of metrics (the devil is in the details!)
May - Timeline, draft metrics presented to DEOs/Deans
June - Degree cuts likely, ELT plan due June 30
July - APC officers notified
August - APC meets
March - Prog Eval Rubric will be from CCPE
April - Broad outline of metrics (the devil is in the details!)
May - Timeline, draft metrics presented to DEOs/Deans
June - Degree cuts likely, ELT plan due June 30
July - APC officers notified
August - APC meets
Transparent Process
Departments were consulted before plan was announced
All departments were treated equally
Can’t see/fix the data \(\Rightarrow\) not transparent
May 15 - Academic Program Metrics discussion
August 18 - Metric definitions provided to Stat Department Faculty
Sept 11 - Affected units notified by EVC Button or Interim VC Heng-Moss
shortly afterward - Full metrics spreadsheet begins circulating unofficially
Oct 1-10 - APC hearings
Transparent Process
Departments were consulted before plan was announced
All departments were treated equally
Can’t see/fix the data \(\Rightarrow\) not transparent
???
🍓
🍊 🍑
🫐 🍓 🍓 🍓🍓🍊🍊 🍓 🍑🍑 🍉
🫐🫐🍇🍇 🍓🍓🍓🍓🍊🍊🍊🍊🍊🍊 🍑🍑🍑🍑🍏 🍉🍉🍉
🫐🫐🍇🍇🍇 🍓🍓🍓🍓🍓🍊🍊🍊🍊🍊🍊🍊🍊🍑🍑🍑🍑🍏🍏🍏🍏 🍍🍍 🍉🍉🍉🍉🍉🍉
🫐🫐🍇🍇🍇🍓🍇🍓🍓🍊🍊🍊🍊🍊🍊🍊🍊🍊🍊🍑🍑🍑🍏🍏🍏🍏🍏🍆🍉🍉🍍🍍🍉🍉🍉🍉🍉🍉🍉🍉
Observational unit:
the entity from which data is collected in a study
Analysis unit:
the entity from which data is analyzed in a study
Mapping between these can be tricky
but it is critical to get it right!!
Instructional metrics: typically observed at the class or instructor level, analyzed by department
Research metrics: observed at the faculty level, analyzed by department
Complicated process for how to map individuals to departments by apportionment
But, individuals in a department change over time!
Operationally, to create distributions, we need:
Mixture distributions can be useful in some situations… IF the goal is to understand the whole system
They’re less helpful if the goal is ranking groups
🫐 < 🍇 < 🍓 < 🍊 < 🍑 < 🍏 < 🍆 < 🍍 < 🍉
Our actual mixture distribution is more like this:
Or this:
Or this:
If you have to work with a mixture distribution, you must account for lurking variables:
Statistics can be used to adjust for these factors, but not accounting for structural differences invalidates conclusions
\(Z\)-scores can be used in two different ways:
make metrics on different scales comparable
make observations from different distributions (more) comparable
From Distributions of Instructional Metrics, May 17, 2025 by Jason Casey
With all of these metrics, there is no inferential interpretation being made: the values are known descriptives, rather than estimates of a parameter.
And we’re back to 🍎 and 🎃. The interpretation is important!
From Distributions of Instructional Metrics, May 17, 2025 by Jason Casey
Our use of the z-score was as a simple scaling mechanism to aid in creating a centroid (i.e., the instructional averages).
Neither positive nor negative z-scores carry any interpretation other than distance from the mean in standard deviations.
A negative score does not imply that a unit does something poorly, nor does a positive imply the opposite interpretation.
Educational administration is one of the lower performing units by UNL metric analysis for instruction and research. The unit was negative on seven of nine instruction metrics and six of eight research metrics.
– Josh Davis, Vice Chancellor, EDAD APC Hearing, Oct 9 2025
Normalized scores should have mean = 0 and variance = 1
Without \(\mu_X\) and \(\sigma_X\), there is no connection between an individual X and its Z-score.
Alleged “Z-scores” provided to departments were wrong
It was not possible for departments to vet the data
An exam where 87% score zero is a bad exam.
Use ranks, not raw values
Ranks can be converted to Z-scores
The research performance of the Statistics Department at UNL is ranked 39th best out of 158 Statistics Departments.
ranks and percentiles are essentially ratios, and therefore depend very much on the denominator \(n\)
interpretation changes drastically depending on the denominator
citations_2014_2023_avgAcA for 2020 to 2023: 11 faculty, 123 articles with 2,157 citations
| Metric | Calculation | Value | Rank | Percentile |
|---|---|---|---|---|
| Citation count per article | 2,157 / 123 | 17.5 | 38 | 75.9% |
| Article count per faculty | 123 / 11 | 11.2 | 27 | 82.9% |
| Citation count per faculty | 2,157 / 11 | 196.1 | 34 | 78.5% |
citations_2014_2023_avg for Statistics: 311.6167 … ???
… what is the denominator?
citations_2014_2023_avg| Metric | Calculation | Value | Rank | Percentile |
|---|---|---|---|---|
| Citation count per article | 2,157 / 123 | 17.5 | 38 | 75.9% |
| Article count per faculty | 123 / 11 | 11.2 | 27 | 82.9% |
| Citation count per faculty | 2,157 / 11 | 196.1 | 34 | 78.5% |
| Metric | Calculation | Value | Rank | Percentile |
|---|---|---|---|---|
| Citation count per article | 4,670 / 289 | 16.2 | 68 | 57.0% |
| Article count per faculty | 289 / 11 | 26.3 | 21 | 86.7 % |
| Citation count per faculty | 4,670 / 11 | 424.5 | 49 | 69.0% |
citations_2014_2023_avg for Statistics: 311.6167 \(\approx\) 310.3 = (196.1 + 424.5)/2???
different citation cultures 🍎 🎃
| Dept | Citations | Faculty | Citations/Faculty | Rank | out of | Percentile |
|---|---|---|---|---|---|---|
| Statistics | 2157 | 11 | 196.1 | 34 | 158 | 78.5% |
| Physics | 15959 | 29 | 550.3 | 76 | 127 | 40.1% |
\[ \frac{X_1 + X_2 + ... + X_n}{n} = \bar{X} \ {\dot\sim} \ N\left(\mu, \frac{\sigma^2}{n}\right) \]
Averages allow more precision, better predictions, more confident conclusions
… so why is dividing by t_tt_headcount_2014_2023_avg a problem?
p1_expenditures_normalized1 = \[
\frac{\text{p1_expenditures_2014_2023_avg}}{\text{t_tt_headcount_2014_2023_avg}}
\]
total_sponsored_awards_inc_nuf_rsch_pub_serv_teach_avg_awards_budget = \[
\frac{\text{Average_total_sponsored_awards_inc_nuf_rsch_pub_serv_teach_fy2020_fy2024}}{\text{budget_from_evc_file_state_appropriated_budget}}
\]
instructional_sch_4Y_share_growth =
Change in share (percentage) of total instructional SCH from AY2020 to AY2024
- … every metric was ‘normalized’, most are ratios of RVs rather than averages
Observational units
Research performance measures products by faculty members, one at a time
Teaching performance measures classes, one at a time
Data based on single products and classes are auditable
But:
measure created by Academic Analytics to evaluate research performance
based on # awards, # books, # citations, # articles, # grants, grant dollars, (clinical) trials and patents
“All departments are treated the same way” – Definitely not!
Transparent Process
Departments were consulted before plan was announced ???
All departments were treated equally 😬😱
Rather than using Public AAU as a measuring stick to punish with
Use difference strategically for investment:
SRI\(_{\text{Public AAU}}\) - SRI\(_{\text{Public}}\)
Size of differences indicate AAU relevance
Statistical fundamentals matter! distributions, observational units, variance, and real-world interpretation
The general description of the metrics sounds good, but
Inferential intent - using metrics to make these choices
- AAU metrics are like the hard, summative problems on exams – it’s important to measure the precursors
Use SRI percentile instead of the other research metrics (translate to a z-score if necessary)
Don’t take ratios of random quantities
Compare like to like – graduate vs. undergraduate credit hours, service vs. major courses.
Only evaluate stable programs – use a 5-year grace period (consistent with CCPE guidelines)
- Involve statisticians in the process!!