VIRTUAL PATIENT · CRETIC PROJECT

Artificial Intelligence
for clinical reasoning research.

A multimodal AI-driven simulation environment. A multi-agent care team converses with the trainee; a real-rhythm physiologic monitor runs against emergency-driven scenarios with down-going trajectories; real-time affect capture and live process / sequence / transition-network analytics turn the session into a research-grade learning-analytics stream — every event row enriched with the contemporary vital signs.

Rohy in voice mode — a 3D patient avatar with a cinema-style subtitle overlay reading 'I feel dizzy and lightheaded', live multi-lead ECG and plethysmograph, a vital-sign panel showing tachycardia and hypoxia, and an alarm banner for low diastolic pressure.
AI-driven Multi-agent Multimodal Real rhythm engine Emergency scenarios Real-time affect Transition network analytics Vital-enriched event log
CRETIC project emblem — a red heart with an inscribed ECG trace.
RESEARCH CONTEXT

Built for CRETIC. Useful everywhere.

Rohy is engineered to support CRETIC — Optimizing Clinical Reasoning in Time-Critical Scenarios: A data-driven multimodal approach, a Research Council of Finland project that develops an innovative gamified platform combining virtual patients and educational-escape-room elements to simulate clinical emergencies for healthcare students and trainees.

The platform is built to serve that research agenda — but is deliberately general. Every component above works in a non-CRETIC simulation context.

Project led by Sonsoles López-Pernas → · School of Computing, University of Eastern Finland.

FUNDED BY THE RESEARCH COUNCIL OF FINLAND
MULTI-AGENT PLATFORM

A care team, not a chatbot.

Real clinical encounters are not a single back-and-forth with one person. A trainee — alone or working as part of a team — moves between the patient, the patient's family when the patient cannot give a history themselves, the bedside nurse who holds the medication record and the recent vitals, and the on-call consultant brought in for a second opinion. Each interlocutor is a distinct AI agent with its own persona, voice, knowledge scope, and arrival cue. Conversations branch and reconverge in a single session transcript, with the dialogue fidelity an actual clinical interview demands.

  • Patient interview — natural-language history-taking from a model that holds the case's clinical narrative, stays in role, and only volunteers what a patient at this stage of the encounter would plausibly volunteer.
  • Family or surrogate — for the patient who cannot speak for themselves (unconscious, intoxicated, paediatric, post-arrest, dementia, aphasia), the family agent carries the collateral history. They know the home medications and what happened on the way to the ED; they do not know the labs.
  • Bedside nurse and on-call consultant — the nurse holds the running medication record, the most recent set of vitals, and what was tried in the last few minutes. The consultant is paged in with a 1–3 minute server-anchored arrival and brings the broader differential. Per-case rosters script who is present from the start and who arrives on what cue.
  • Solo or team problem-solving — a single trainee orchestrates the whole team in one session; cohort or team training reviews and analyses the same transcript exactly as the dialogue happened. Sessions are recorded turn-by-turn for replay and debrief.
  • Knowledge scope is admin-controlled per agent, per case — the patient does not know the consultant's notes, the family does not know the labs, the nurse does not hold the post-discharge plan. The information asymmetry is the realism.
  • Local LLM backends (LM Studio, Ollama) are first-class; cloud providers are optional. Runtime model switching with per-user usage tracking.
PATIENT MONITOR

A real monitor, with a real rhythm engine.

Sum-of-Gaussians waveform synthesis renders morphologically correct PQRST-T at any heart rate. Rhythms compose with modifier overlays — ST elevation, ST depression, electrolyte derangements, conduction blocks, ectopic beats, signal noise — so a single engine can drive the full continuum from physiological baseline through pathological extreme.

  • Heart rate, SpO₂, NIBP, respiratory rate, temperature, and EtCO₂ as independent channels with admin-editable display ranges and alarm thresholds.
  • A pharmacokinetically curated treatment-effects engine with explicit onset, peak, and duration parameters; time-decaying changes propagate to the relevant vitals automatically.
  • Manually pinned vitals and rhythm states survive engine ticks — the override is the educator's intent, not a transient.
  • Snapshot binding: admin edits to a case during a running session do not bleed into the running monitor.
EMERGENCY-DRIVEN SCENARIOS

Designed to deteriorate.

Cases run on a time-keyframed scenario engine with explicit down-going trajectories. Vitals drift toward decompensation unless the trainee intervenes appropriately. Alarms cross severity thresholds, banners surface, audio escalates; acknowledgement and snooze route the alarm through audit history rather than silencing it.

  • Time-keyframed vital trajectories with interpolation between frames; the engine ticks until trainee action redirects the course.
  • Severity-bounded alarms — urgent, beep, chime, silent — routed simultaneously to banner, toast, audio, and audit surfaces.
  • A central notification dispatcher with one routing matrix per role; no parallel toast/banner/alarm paths to silently disagree.
  • Cross-case acknowledgement clearing — case A's acknowledged alarms never silence brand-new alarms in case B.
MULTIMODAL CAPABILITIES

Text. Voice. 3D talking head.

The trainee can type to the patient or speak. A 3D talking head responds with viseme-driven lipsync; in voice mode, a glassy cinema-style subtitle overlay renders the current line as the model streams it. Sentence-level pre-fetching keeps time-to-first-audio sub-second on local engines.

  • Browser STT for input, continuous mode with auto-pause when the assistant speaks — no round-trip to a cloud transcription service required.
  • Local TTS backends (Kokoro in-process ONNX, Piper subprocess) ship by default; cloud TTS (Google, OpenAI) is optional.
  • Viseme-driven lipsync over the canonical morph-target ordering; every cross-platform avatar in the library conforms to the same rig.
  • Cinema-style subtitle overlay updates token-by-token during a streaming reply; click anywhere to reveal the full transcript.
LABORATORY & RADIOLOGY ROOMS

Order. Wait. Read.

Laboratory and Radiology are first-class rooms in the simulation, not modal popups. A worklist tracks pending orders and ready reports; clicking a ready row opens the hospital-style report inline and adds it to a dismissible pill stack. Reference ranges are gender-specific where it clinically matters; flags surface high/low/abnormal automatically.

  • Searchable lab catalogue grouped by clinical category, with panel templates for common workups (Acute MI, DKA, Sepsis, Stroke, PE, …).
  • Radiology suite covering X-ray, CT, MRI, ultrasound, cardiac (12-lead ECG, echocardiogram), nuclear medicine, and fluoroscopy.
  • Realistic turnaround band so the wait pattern is felt without grinding a teaching slot to a halt.
  • Per-case admin editor for abnormal reports; image and video upload for case-attached findings.
Laboratory Investigations screen — a searchable test catalogue on the left, a hospital-style Laboratory Report in the centre showing a Hematocrit result with reference range and flagging, and a worklist on the right with READY and VIEWED sections. The Laboratory room carries a numeric badge dot in the bottom navigator.
PHYSICAL EXAMINATION

Where you tap matters. So does how.

An anatomically accurate body silhouette with invisible polygon hit regions surfaces named exam areas. Each region accepts inspection, palpation, percussion, and auscultation as discrete techniques; auscultation includes audio playback at the relevant location.

  • Anterior, posterior, and lateral views; gender-specific.
  • Cranial nerves, motor, sensory, reflexes, and coordination as discrete examinables alongside the regional map.
  • Idempotent recording keyed on (session, region, technique) — retries do not duplicate findings.
Physical Examination screen — an Anterior/Posterior body silhouette with the chest region highlighted, an examination-technique chooser with Auscultation selected, and a clinical narrative reading 'Vesicular breath sounds are heard throughout all lung zones with no added sounds…'.
RESEARCH-GRADE LEARNING ANALYTICS

Clustering, pattern mining, transition networks — in real time.

A research-grade learning-analytics dashboard runs against the live session. Every trainee action is encoded as a verb-typed event, fed through unsupervised clustering, sequence and process mining, and surfaced as state distributions, directed weighted transition networks, and temporal proportion plots — per cluster and per session. The pipeline is the same one a learning-analytics researcher would build by hand. It runs while the case is still in progress.

  • Real-time clustering of trainee action sequences via unsupervised machine learning. Silhouette score, average sequence length per cluster, and per-cluster state-distribution histograms are all reported on the same dashboard.
  • Sequence and process mining — ordered trajectories per session, state frequencies, density, edge counts, temporal stacked plots showing how state proportions evolve across the timesteps of the case.
  • Transition-network analytics — directed weighted graphs with edge probabilities, in-strength centrality, and a per-cluster network so behavioural archetypes can be compared side-by-side.
  • Pattern mining over the verb-coded action stream surfaces recurring micro-sequences (communicating → regulating → examining → treating, …) that distinguish a cluster from the cohort baseline.
Research-grade learning-analytics dashboard. A top strip shows session-level statistics — 65 sequences, 783 events, 10 states, 45.0% density, 45 edges — alongside '3 clusters found, silhouette score 0.393'. Below, three columns for Cluster 1 (7 sequences, 11%, avg length 35.6), Cluster 2 (44 sequences, 68%, avg length 6.6), and Cluster 3 (14 sequences, 22%, avg length 17.1), each with a horizontal state-distribution bar chart (communicating, navigating, regulating, examining, treating, …), a circular directed-weighted transition network, and a stacked-bar temporal proportion plot showing how state mix evolves across the timesteps of each cluster.
REAL-TIME EMOTION CAPTURE

Affect on-device. Joined to the action stream.

Camera-side emotion analytics runs entirely in the browser. Multi-task affect models produce dominant-emotion estimates and valence/arousal regression at a configurable sample rate; only aggregated windows leave the device. The affect stream shares the session, the time base, and the active room with the action stream — so behavioural trajectories and affective trajectories can be analysed jointly without an offline join.

  • Browser-only inference (MediaPipe face landmarker + ONNX Runtime Web). Aggregated windows only — raw frames and landmarks never leave the device; the server validator rejects either at the wire.
  • Valence and arousal as time series; dominant emotion as a categorical estimate; a transition network on the affect side mirrors the one on the action side.
  • Per-tenant capture configuration: window length, sample interval, minimum valid frames, smoothing alpha, hold time, switch confidence.
  • Frozen-at-write row-visibility flags — flipping a tenant toggle never retroactively exposes data captured under stricter rules.
Real-time emotion-capture dashboard. Top-left: capture timeline with valence and arousal traces over hundreds of windows. Top-right: emotion-distribution bar chart across six states. A statistics strip below reports analyzed windows, latest state, instability, and phase. Bottom: a circular transition-network graph with weighted edges between emotion nodes and self-loops on most nodes.
VITAL-ENRICHED EVENT STREAM

Every event row carries the patient's vitals — and the trainee's affect.

The session is not a black box. Every navigation, examination, order, treatment, message, and alarm response is captured as a verb-typed event; each row is enriched server-side with the active room, the contemporary vital signs — heart rate, blood pressure, SpO₂, respiratory rate, temperature, EtCO₂ — and the corresponding affect window at the moment the action occurred.

  • Verb-coded action stream merged into clinical labels for downstream analysis.
  • Each row stamped with the active room (Patient · Examination · Laboratory · Radiology · Consultant) so behavioural trajectories segregate by setting.
  • Vital-sign columns on every event let behavioural sequences correlate directly with physiologic state — no offline merge required.
  • Affect columns on every event row (valence, arousal, dominant emotion) close the loop on multimodal analysis.
  • Cohort export in CSV or JSON: login logs, chat logs, complete-session bundles, questionnaire responses.
CASE DEBRIEF

A tutor, not a judge.

The End & Debrief button hands the finished case to a Socratic discussant — a senior clinician-educator persona with its own voice, avatar, and LLM, separate from the patient. The opening turn is sent as a silent LLM call so meta-prompts never appear in the learner-utterance audit trail; the debrief itself probes reasoning, scaffolds answers, and asks before telling.

  • Separate persona, voice, avatar, and LLM from the patient; per-case overrides are first-class.
  • Captures structured feedback into session notes for cohort export.
  • Same multimodal infrastructure as the patient — text or voice, lipsync, subtitle overlay.
Case Debrief landing screen — patient portrait and chief complaint on the left, a Default Discussant persona labelled 'Case Debrief Tutor' on the right with a Start debrief button. The Consultant room is active in the bottom navigator.
YOUR INFRASTRUCTURE

Local-first. Cloud optional.

Local LLMs and local TTS are first-class. Cloud providers are optional. The default install path puts a local voice in the patient's mouth and routes inference through a local model server on the same host. Air-gapped installation is a documented operator path.

Multi-tenant ready, role-hierarchy aware, audit-logged, soft-deleted with right-to-erasure purge, instrumented with structured observability.

LOCAL
Kokoro TTS Piper TTS LM Studio Ollama MediaPipe ONNX Runtime Web
CLOUD (OPTIONAL)
Anthropic Claude OpenAI Google Gemini Google Cloud TTS OpenAI TTS
INSTALL & UPDATE

Install in the shape that fits the site.

Four install paths and two upgrade paths cover the realistic deployment surface — from a single classroom laptop to an air-gapped institutional host. The same operator tool handles in-place upgrades with snapshot and automatic rollback on failure.

Install

Linux + systemd from source

Full production install — bootstrap script wires up rohy.service, nginx, certbot, and the operator user.

sudo deploy/bootstrap.sh \
--frontend-url=https://your-host/rohy

Single-machine

Personal devbox, classroom lab, demo box. No systemd, no nginx — runs directly under your user account.

bash deploy/local-install.sh

Air-gapped

Self-contained source tarball plus a Docker image tarball, both sha256-verified. Documented operator path for sites without internet.

bash deploy/bundle-airgap.sh
# transfer + install per docs
Update

Source / systemd

Snapshot the DB, run the pre-flight, apply, verify with smoke and the 27-check tech-test; automatic rollback on failure.

sudo rohy-update check
sudo rohy-update apply
sudo rohy-update rollback # if you need it

Docker

Bump the image tag. Migrations auto-run on the new container's first boot; rollback is a pull of the previous tag.

docker compose pull rohy
docker compose up -d
ABOUT THE AUTHOR

Built by a medical doctor — for medical doctors.

“As a computer scientist now — who was a medical doctor before — I was thinking of how to link computer science, analytics, clinical reasoning, and medicine.”

Mohammed Saqr is a Professor of Computer Science at the University of Eastern Finland. He studied, trained, and practised medicine at the Faculty of Medicine, University of Alexandria, where he specialised in Neurology at the university's flagship teaching hospital. He taught clinical reasoning to medical students and residents, and has researched medicine, medical education, learning analytics, and problem-based learning extensively over the years.

Rohy is the product of that joint perspective. The clinical surfaces — symptom-led interview, alarm-bounded vitals, family-as-surrogate when the patient cannot speak, a consultant page that takes time — reflect what a clinician would actually want a trainee to encounter. The analytics surfaces — verb-coded events, real-time clustering, sequence and process mining, transition networks, vital- and affect-enriched logs — are what a learning-analytics researcher would expect to find. Designed by a medical doctor, for the medical doctors and educators who train the next ones.

saqr.me → · Professor of Computer Science · University of Eastern Finland · Trained neurologist · Alexandria Faculty of Medicine