The Epistemology
Students. Courses. Occupations. Employers.
Each unit of analysis traces to one authoritative institution.
The California Community Colleges Chancellor's Office maintains the Management Information System Data Mart — the statewide system of record for enrollment, course-taking, and academic outcomes across all 116 colleges. Every student who enrolls in a California community college is represented in this system.
Kallipolis models student populations that mirror real enrollment patterns reported by each institution. The system surfaces academic trajectories, program concentrations, and competency profiles — the supply side of workforce development made empirical. A coordinator can see not just how many students are in a program, but what skills they carry and how those skills align with regional employer demand.
Student populations are synthetically generated and calibrated to DataMart's grade distributions by program area. Aggregate patterns — enrollment concentration, academic performance, program retention — match institutional reality by design. The methodology is a present-day commitment to privacy that the architecture is designed to outgrow through direct institutional partnership.
Each college's course catalog is its curricular commitment — the institutional declaration of what it promises to teach, at what depth, with what outcomes. No third-party aggregator has the authority to define what an institution teaches. Kallipolis sources directly from the institution itself.
Every course is interpreted through a skills taxonomy that connects curriculum to labor market language. The system knows not just what courses exist, but what skills they develop and how those skills map to occupational demand. This bridge between education and industry is what makes partnership proposals empirically grounded rather than anecdotal.
Course content is extracted from institutional catalog publications. Learning outcomes and course objectives are interpreted against a controlled skills vocabulary — skills are assigned from the taxonomy, not freely generated. This constraint ensures consistency across institutions and prevents the system from inventing competencies that don't exist in the curriculum.
The Centers of Excellence for Labor Market Research are the analytical arm of California's community college system. Their institutional purpose is to produce the labor market intelligence that workforce development decisions depend on. COE research is regionally calibrated to community college service areas — more relevant than national BLS or O*NET data for the institutions Kallipolis serves.
For every region, Kallipolis surfaces the occupations that community colleges are positioned to serve — filtered to the credential range between a postsecondary certificate and a bachelor's degree. Each occupation carries regional wages, employment levels, growth projections, and annual openings. Combined with skill alignment data, this turns abstract labor market trends into actionable partnership targets.
COE's regional demand data is filtered to the workforce-development band — the occupations where community college credentials are the pathway. Each occupation is assigned skills from the same controlled taxonomy used for courses, creating a shared vocabulary that makes skill gap identification possible. The system can identify not just what occupations exist, but which skills employers need that the curriculum does not yet develop.
The California Employment Development Department maintains employer records for every organization with payroll obligations in the state. EDD's administrative reach — processing payroll tax for California's entire employer base — gives it coverage no private database can match. These are verifiable, publicly maintained records that carry institutional legitimacy in public-sector procurement.
Kallipolis surfaces employers scoped to those community colleges can meaningfully engage — organizations with operational capacity for workforce partnerships. Each employer is connected to the occupations it hires for, the skills those roles require, and the curriculum that develops those skills. A coordinator sees not a list of companies, but a ranked landscape of partnership-ready organizations with empirical alignment scores.
Employer records are filtered to organizations above a size threshold that ensures partnership capacity. Industry classification connects each employer to plausible occupations through a sector-to-role crosswalk. The result is a curated set of real organizations — not a comprehensive economic census, but a workforce development lens on the employers that matter for institutional action.
Forward Deployment
Student data is synthetic, not real enrollments. Catalog PDFs lag behind live curricula. COE granularity varies by region. EDD captures registered employers, not hiring intent.
Each of these limitations is an invitation to collaborate. The pipeline is architected for institutional partnership. Direct MIS feeds replace synthetic students. Curriculum API access makes courses real-time. Local industry contacts validate employer readiness. Forward-deployment into the institution closes the gaps that distance creates.
The Pipeline
Each data authority feeds a dedicated pipeline stage. The stages converge in a single graph where curriculum, labor market, and institutional data connect through a shared skills vocabulary.
Catalog extraction and skill derivation
COE demand feed and OEWS supplement
EDD scraping and LLM enrichment
Synthetic generation calibrated to DataMart
Graph loading into the unified knowledge model