Building an Open Assessment Pipeline on the Learning Commons Knowledge Graph
There's a gap in AI and education. CZI built the Learning Commons Knowledge Graph — 250,000 standards, 2,000 learning components, 273,000 relationships mapping how K-12 math concepts connect. Student Achievement Partners built the Coherence Map showing how standards build on each other across grades. This is remarkable infrastructure.
But infrastructure alone doesn't change classrooms.
Someone has to build the applied layer — the tools that turn knowledge graphs into student-facing practice, that use learning progressions to generate adaptive sequences, that evaluate AI-generated content against the standards those graphs encode. That's what we built with Open Items.
What Open Items is
Open Items is open assessment infrastructure for K-12 education: 34,000+ CC-licensed assessment items with AI generation, LLM evaluation, and adaptive practice. It's built directly on the Learning Commons Knowledge Graph, which means every item is aligned to standards through the same graph that CZI, state departments of education, and curriculum developers use.
The pipeline works like this:
- Standards come from the Knowledge Graph. We import frameworks, standards, and learning components directly from the CASE Network. When we generate items for "3.NF.A.1," we're referencing the same canonical standard that every other system using the graph sees.
- AI generates items aligned to those standards. Using Gemini 3 Flash, we generate 7-item difficulty sequences per skill — from easy direct application through hard transfer problems. Each sequence costs fractions of a cent.
- LLM evaluators score quality on 5 dimensions. Factual accuracy, grade appropriateness, pedagogical soundness, schema validity, and completeness. Items scoring above threshold are auto-approved (85% pass rate). The rest go to human reviewers.
- Human reviewers approve or revise. Nothing reaches students without a human sign-off. The evaluation pipeline handles volume; humans handle judgment.
- Students practice adaptively. Elo-rated difficulty calibration adjusts to each learner. The Knowledge Graph's prerequisite relationships enable diagnostic placement and remediation suggestions.
Why the Knowledge Graph matters
Before the Knowledge Graph, every ed-tech company maintained its own standards mapping. Company A's version of "3.NF.A.1" might be subtly different from Company B's. Alignment was self-reported and unverifiable. Curriculum coherence — how concepts build on each other across grades — was invisible.
The Knowledge Graph changes this. It provides a shared, canonical representation of educational standards with explicit relationships: prerequisites, equivalences, alignments across frameworks. When we say an item is aligned to a standard, that alignment is traceable through a public graph that anyone can inspect.
For Open Items specifically, the graph enables three things we couldn't do before:
- Prerequisite-aware generation. When generating items for a 5th-grade fractions skill, we can traverse the graph backward to identify prerequisite 3rd-grade concepts and generate scaffolding items that address gaps.
- Cross-framework alignment. Items aligned to Common Core automatically map to state-specific versions of the same standards — California, Texas, New York — through the graph's equivalence relationships.
- Coverage analysis at scale. We can see exactly which learning components have items, which don't, and where the gaps are — across the entire K-12 math landscape.
What we've learned so far
AI generation is cheap but evaluation is the bottleneck. Generating a full K-12 item bank costs $25-50. But ensuring those items are mathematically correct, grade-appropriate, and pedagogically sound — that's where the real work is. Our LLM evaluation pipeline gets 98% mathematical accuracy, but the 2% that slip through can be seriously wrong. Human review remains essential.
The graph is more useful than we expected. We initially treated it as a standards lookup table. But the prerequisite relationships turned out to be the most valuable part — they enable adaptive learning paths, diagnostic placement, and intelligent remediation that actually follow the mathematical structure of K-12 learning.
Open infrastructure creates compound value. Every item we generate, evaluate, and calibrate adds to a public commons that any researcher, teacher, or tool builder can use. This is the opposite of the proprietary model where each company's item bank is locked behind licensing agreements. An open item bank with open psychometric data is a public good that gets more valuable as more people use it.
What we're building toward
Open Items is one applied project on top of the Knowledge Graph. We think there should be many. The graph is infrastructure; it's designed to be built on. We're publishing everything — code, items, psychometric data, evaluation results — so others can build on our work just as we built on CZI's.
If you're building on the Learning Commons Knowledge Graph, or thinking about it, we'd love to compare notes. If you're a researcher who needs calibrated assessment items for a study, our item bank is CC-licensed and ready. If you're building educational tools and want to integrate open items via API, we're building that too.
The source code is on GitHub. The live platform is at impact-edu.ai/openitems. Everything is open.