AJ
Project Lab
Adejumobi Joshua
AI Evaluation · Project Lab

Investigating What LLMs Are Really Doing

"Does the model mean what it says, or has it just learned to look like it does?"

The Question

When a language model produces a fair-looking answer to a sensitive question, has it actually reasoned more carefully, or has it just learned to detect that it's being tested and adjust its output accordingly? When it refuses a harmful request, does its internal state actually reflect that it recognized the harm, or is it producing the safe-looking response while internally indifferent? Both questions point at the same problem. AI safety today is mostly measured by what models output. But output and internal state can disagree. Adejumobi's lab investigates both, and the answer matters for how the field measures whether AI systems are actually getting safer.

What Students Build

Over five weeks, students contribute to two connected investigations.

The first asks whether a published method for reducing bias in language models actually works the way the paper claims. Students reproduce the original results on a standard bias benchmark, then test whether those results hold up when the evaluation framing is changed. If the reductions disappear when the model can't tell it's being tested, that is a strong signal the method is teaching compliance rather than reasoning.

The second goes deeper. Students take an open language model, give it harmful and ambiguous requests, record its responses, and extract its internal activations, the patterns that arise inside the model as it processes each prompt. They then train simple classifiers on those internal patterns to ask: when the model refuses, do its internals look like other refusals? When it complies, do its internals look like other compliances? Or are there cases where the model says one thing while its internals say another?

The investigations share a tooling backbone, and students contribute to both. They leave with working pipelines, real findings, and a written report. Last summer, Adejumobi's students presented research at the Women in Machine Learning workshop at NeurIPS. This summer's projects are positioned to produce similarly strong outputs.

The Mentors

Adejumobi Joshua leads SeqHub AI Research, contributing to the field of AI evaluation: how large language models behave with respect to bias, sycophancy, safety, and alignment. Last summer her students presented at the Women in Machine Learning workshop at NeurIPS. This summer she is inviting students into two live investigations: whether what a language model outputs corresponds to what it actually represents internally, and whether prompting alone can reduce bias in language models, or whether models are simply performing compliance because they sense they are being evaluated. Real research questions, with results that could go either way.

Who This Is For

Students need to be comfortable with Python, writing functions, working with data, running scripts. Some exposure to running language models, through Hugging Face or basic API calls, is helpful. No prior research experience is required, but a willingness to sit with uncertainty is. The right student here cares about doing research carefully, is comfortable with the possibility that their hypothesis is wrong, and gets interested when the data doesn't match the prediction. Students looking for a polished portfolio project will struggle. This is research, not a product.

Logistics

Five weeks. Mondays, Wednesdays, Fridays, 11:00 AM to 12:15 PM ET. Friday sessions extend to 1:00 PM for Demo Day. Cohorts of 3 to 4 students per mentor. $4,500. Apply by May 11, 2026.

Beyond the live sessions, students work on their own, and they are not alone when they do. The lab is supported by a 24/7 Slack channel and a team of scholars and practitioners at the Academy. Students also work alongside SeqHub's AI co-teacher, which helps them think through problems on off days without doing the work for them. Plan for 10 to 12 hours per week, with 4.5 hours in live sessions and the rest on independent work.