Nineteen Publishable Artifacts · Voice · Agents · Telehealth · Knowledge Graphs · One Trajectory

Notes from the
Clinical Frontier

A working notebook of preliminary manuscripts, prototypes, and open questions across healthcare AI — benchmarks, fine-tuned open models, voice and telehealth agents, clinical knowledge graphs, and the safety evaluations that hold them honest.

I'm Chandra Vikram, a healthcare AI engineer. My path here wasn't a straight line. I studied pharmaceutical sciences for my bachelor's, and in my final year I realised what kept pulling me back wasn't the lab bench but the technology behind modern medicine. That recognition turned into a Master's in Health Informatics at Indiana University, and from there into building real systems at the intersection of clinical workflows and AI. Coming in from a non-engineering background taught me something I keep coming back to: the most useful clinical AI gets built by people who can speak both languages, the clinic and the codebase. I presented at the AMIA FHIR App Challenge in San Francisco in 2024. You can reach me at chandravikram10@outlook.com.

I believe healthcare AI, when it's done with rigour, with safety treated as a first-class engineering constraint, and with clinicians firmly in the loop, is among the most transformative technologies of our generation. The right systems shorten the path from question to answer in clinical reasoning, return clinician time to patients, and surface evidence that even careful experts can miss. The wrong systems do real harm. Every project below is built around that distinction: safety as a deliverable rather than a footnote, evaluation pinned against published baselines, and failure modes named before any benchmark number is claimed.

Below are some of the projects I'm currently working on. Nineteen of them, spanning FHIR-grounded infrastructure, clinical benchmarks, safety and alignment, voice and telehealth agents, knowledge graphs, and retrieval evaluation. Each entry has its own page describing what the project is about, the approach I'm exploring, and the papers I'm currently reading as supporting work. Every citation links out, so you can follow the references yourself. Click any entry in the index to dive in.

These are ideas I find genuinely challenging and important enough to spend real time on. Some are infrastructure plays. Some are safety evaluations I think the field needs but no one has built. Some are clinical workflows that have stayed stubbornly broken for years. Each entry sketches the problem I'm trying to address and the approach I'm exploring. Nothing here is finished, and nothing is being claimed. This is a working notebook.

Index of Works

19 entries

01	Atrium	Infrastructure	A reference Model Context Protocol server exposing FHIR clinical data via SMART-on-FHIR launch.	Read paper
02	Caliper	Benchmark	A FHIR-grounded extension of HealthBench with a public cross-model leaderboard.	Read paper
03	Asclepius	Safety · Red-team	An adversarial benchmark for medical LLM jailbreaks and sycophantic capitulation.	Read paper
04	Longitude	Benchmark · Long-context	Diagnostic reasoning over decade-long longitudinal patient records, 150k–500k tokens each.	Read paper
05	Oracle	Agent · Reasoning	A differential-diagnosis agent that emits evidence-grounded reasoning traces, citation per claim.	Read paper
06	Safeguard	Domain · Clinical Pharmacy	A pre-dispatch safety validator for total parenteral nutrition orders. The unfair advantage.	Read paper
07	Auris	Multimodal · Speech	Ambient clinician-patient voice to validated FHIR resources, end-to-end.	Read paper
08	Chartwalker	Agent · Computer-use	A Claude-driven agent navigating a real EHR interface with a deterministic grading harness.	Read paper
09	Reason·Med	Fine-tune · Reasoning	An open clinical reasoning model trained via continued pretraining, SFT, and GRPO on Qwen3-8B.	Read paper
10	Conscience	Alignment · Fine-tune	Constitutional AI applied to clinical decision support, fine-tuned on Qwen3-8B.	Read paper
11	Triagemind	Agent · ED Triage	A four-agent ED triage system with calibrated uncertainty, red-flag screens, and structured handoff — pinned to Sax et al.'s 32.2% ESI mistriage baseline.	Read paper
12	Calline	Voice · Triage Line	Voice-first after-hours nurse triage agent: streaming ASR, gpt-realtime, sub-second TTS, uncertainty-gated escalation.	Read paper
13	Telesight	Telehealth · Visit Copilot	A three-phase telehealth copilot covering pre-visit chart prep, intra-visit CDS with Five-Rights gating, and post-visit instructions plus coding.	Read paper
14	Pharos	Voice · Chronic Disease	Long-horizon voice agent for HF and type-2 diabetes with persistent memory and clinician oversight loop. Targets 80% adherence at 12 weeks.	Read paper
15	Vestibule	Voice · Transitions	Post-discharge transition agent with a 24h/48h/72h/7d voice-call cadence pinned against Jencks's 19.6% 30-day Medicare readmission baseline.	Read paper
16	Medigraph	Knowledge Graph	Patient-centric clinical knowledge graph from FHIR + clinical notes + UMLS / SNOMED / RxNorm / LOINC. Composes with Atrium.	Read paper
17	Graphcore	GraphRAG	Microsoft GraphRAG methodology applied to clinical guideline corpora — first cost-quality Pareto curve for clinical GraphRAG.	Read paper
18	Chaincite	RAG · Faithfulness	Clinical RAG benchmark measuring citation correctness AND faithfulness — built on Wallat et al.'s ICTIR 2025 distinction.	Read paper
19	Ragprobe	RAG · Adversarial	Adversarial robustness benchmark for clinical RAG — PoisonedRAG / BadRAG / GARAG / Phantom / indirect-injection on clinical corpora.	Read paper

Atrium

A reference Model Context Protocol server for SMART-on-FHIR healthcare data — the plumbing every clinical AI team is currently rebuilding.

Infrastructure Tooling · Protocol

The Problem

Every healthcare AI team rebuilds the same plumbing between FHIR resources and language models. There is no canonical, production-grade Model Context Protocol server for clinical data. Hospitals that want to adopt MCP-enabled agents will either build it themselves or pay a vendor — and neither path serves the field.

What You'll Build

A reference MCP server, written in TypeScript, that exposes a SMART-on-FHIR sandbox as MCP tools and resources. The first release covers seven resource types — Patient, Observation, Condition, MedicationRequest, Encounter, DiagnosticReport, AllergyIntolerance — with paginated queries, code-system search, and longitudinal slicing. Authentication via SMART OAuth. Full audit logging for HIPAA defensibility. A synthea-seed companion script ships a realistic synthetic patient cohort so any developer can try it in under five minutes.

Why It Lands MCP is Anthropic's protocol. A polished, well-tested, well-documented healthcare MCP server is the kind of artifact that ends up in their official integrations registry — and that gets read directly by the team that built MCP. It also becomes load-bearing infrastructure for projects 05, 06, and 07 in this dossier.

Index of Works

The Problem

What You'll Build

The Problem

What You'll Build

The Problem

What You'll Build

The Problem

What You'll Build

The Problem

What You'll Build

The Problem

What You'll Build

The Problem

What You'll Build

The Problem

What You'll Build

The Problem

What You'll Build

The Problem

What You'll Build

The Problem

What You'll Build

The Problem

What You'll Build

The Problem

What You'll Build

The Problem

What You'll Build

The Problem

What You'll Build

The Problem

What You'll Build

The Problem

What You'll Build

The Problem

What You'll Build

The Problem

What You'll Build

Twelve-Month Roadmap

Three Rules That Make It Ship

Publication standard is per tier — never skipped.

No project starts without its evaluation defined.

Something ships every Friday.

Stack Appendix

Models & Bases

Datasets

Training & Compute

Eval & Tooling