Skip to main content
Grantee Research Using Machine Learning to Measure Skills from Standardized Test Responses and Predict Students’ Future Earnings
Last updated on
Display Date

Add Urban on Google
When it comes to understanding students’ skills and economic outlook, we are not making full use of state standardized test data.

Education policymakers, practitioners, and researchers often use standardized test scores to measure student progress, likelihood of future academic and career success, and intervention impacts.

But conventional standardized test scales and summary scores were not built for all these purposes and therefore don’t serve them all well. The scales and scores group test responses according to an assessment’s underlying design and structure, not based on the items that show the greatest association with economic outcomes. Existing summary scores are also too broad to pinpoint essential skills.

This research team analyzed the administrative data of 12 million Texas students—including more than a billion item responses—to understand which questions best predicted earnings at age 25. By pioneering multiple machine-learning techniques, the researchers package questions into nuanced measures of skills.

They found that questions can be repackaged into detailed skill measures and that certain math and literacy skills are particularly powerful predictors of students’ future earnings.

The results suggest that existing standardized test data can be repurposed for skill measurement and economic success prediction.

 

Key Takeaways

A test’s underlying design can tell us some things about the broad abilities that are more strongly associated with adult wages. On average, a correct answer to a math question was associated with a 1 percentage-point increase in wages compared with a correct answer to a reading question. This general pattern held for the subjects the tests were designed to measure. Math subjects or objectives (e.g., number concepts or algebraic understanding) were more predictive of wages than reading objectives (e.g., text summarization or identifying relationships). Other test design features that predicted wages included question difficulty and questions appearing later in the test (aligning with evidence of the high wage returns on student cognitive endurance).

Still, the research team found that math summary scores and even math subject scores (e.g., geometry) hid which specific skills mattered most. By applying large language model (LLM) techniques to link 613 Common Core standards to digitized test questions, the team was able to obtained a detailed skill profile that revealed the specific competencies that are most strongly linked to wages.

 



The Skills That Are Associated with the Highest Wage Boosts Tend to Be Procedural, Computation, and Math-Based

Top 15 Common Core Standard Skills in Terms of Estimated Age-25 Wage Returns
 

  • $1300-$1400 wage return
    • Derive triangle area formula (1/2 ab sin C)
    • Define degree measure using a circle
  • $1200-$1299 wage return
    • Use unit cubes to measure volume
    • Interpret signs in ordered pairs
    • Measure and draw angles with a protractor
    • Recognize angles and angle measurement
  • $1100-$1199 wage return
    • Write order comparisons in context
    • Use additive angle measures to find unknowns
    • Relate angle measure to degrees
    • Define volume by packing unit cubes
  • $1000-$1099 wage return
    • Order rational numbers; use absolute value
    • Generate equivalent expressions
    • Interpret graphs as solution sets
    • Define a coordinate system with axes
    • Derive circle/solid formulas informally

Source: Research team's analysis
 


 

Skills most highly associated with wages at age 25 are math based, with a focus on procedural, domain-specific computation skills rather than more conceptual or interpretive ones. For example, students who could apply geometric tools, such as using coordinate systems, saw markedly higher wages than peers who were stronger in more conceptual geometry tasks, such as recognizing geometric symmetry.

Among literacy skills, one stood out: students’ ability to summarize text and identify the main idea and supporting details showed the strongest relationship to wages at age 25, in some cases exceeding many math skills.

Although skill-wage relationships did not differ much by race or ethnicity, the research team did find patterns in the types of questions some minoritized students were more likely or less likely to answer correctly. Namely, Black and Hispanic students were more likely to incorrectly answer questions that were particularly predictive of earnings. The achievement gaps between white and Hispanic students and between white and Black students using the wage-item-anchored scales were about 45 percent greater than those calculated using conventional summary scores.

Body

Potential Implications for Policymakers and Practitioners

Additional validation and translational work are needed to bolster these findings and enable their uptake into policy and practice settings, but SUMI sees the following potential applications:

Turn test data into actionable skill maps for policymakers and school leaders. Test makers and implementation and analysis partners could consider postprocessing supports or supplemental analytic tools (e.g., wage-prediction metrics or automated skill-cluster reports) that would allow schools and policymakers to draw more accessible and actionable insights from tests without changing the tests themselves.

For example, test administrators could partner with schools and districts to generate clear, manageable “top 10 skills” lists by grade level, based on the skill clusters most predictive of long-term outcomes. Because these insights would come from tests already in use, leaders could immediately apply them to guide instruction, resource allocation, and broader strategic planning.

Strengthen alignment between curricula and the most predictive standards. Elevating high-leverage skills can concentrate learning time where it matters most. The federal Institute of Education Sciences, state boards of education, and other government leaders could consider how this study and others that identify test-item-level drivers of later upward mobility could improve curriculum and accountability standards.

For example, Common Core standards that show the strongest links to later economic mobility can be strengthened in curriculum design, daily instructional routines, and high school redesign efforts. Using artificial intelligence (AI) to review curricula can help districts and states quickly determine where these predictive skills appear across grade levels (and where they’re missing), creating a road map for targeted improvements and strategic interventions.

Use skill-level patterns to pinpoint gaps, prevent disparities, and monitor resource allocation. To disrupt long-standing inequities, district superintendents, chief academic officers, and school principals could use these more granular indicators of later mobility to identify what students and skills need more investment, potentially enabling more targeted, better-timed interventions.

Future Research Directions

Future research could explore the following:

  • when disparities in skills that are strongly associated with wages develop and how these might relate to high school course taking and other structures that could be intervention targets.
  • assessing the dynamic nature of high-return skill accumulation (i.e., which skills are needed early on to build the higher return skills in later grades?)
  • how to use item-level data available to school districts to identify students that are lagging behind in economically relevant skills (or, conversely, to identify “diamonds in the rough”)

Methods, Data Sources, and Measures

This research team analyzed the administrative test data of 12 million Texas public school students in grades 3–12 linked to adult wage data retrieved from the Texas Education Research Center. To calculate students’ earnings at 25, the researchers looked at unemployment insurance records for people paid wages in Texas between 2003 and 2019 and converted these into real wages.

They first used regression analysis to look at how characteristics that were easily observable about a test question based on test design (e.g., question placement, difficulty, or math versus reading subject) predicted its association with wages at age 25.

The researchers then used machine learning to map individual questions to the Common Core skill taxonomies for English language arts and math to evaluate how fine-grained skills predict wages. They evaluated skill-wage relationships for differences by student sex (male or female), race or ethnicity (Hispanic, white, or Black), free and reduced-price lunch eligibility, gifted status, and English as a second language status.

In ongoing work, the researchers are using AI techniques to construct new test questions designed to strongly predict wage earnings to shed further light on why some questions are so much more predictive of long-term outcomes than others.

Body
Tags Measuring skills that drive student upward mobility Skills that drive student upward mobility
Related content