Interpretable ML + Modern React β built for clinicians and patients alike
A full-stack clinical decision support system that surfaces early diabetes risk signals from routine patient data.
Combines an interpretable ML model with a modern React frontend, presenting results tailored for both clinicians and patients.
Warning
Medical Disclaimer β This system is intended for educational and research purposes only. It does not provide medical diagnoses and should not be used as a substitute for professional medical advice.
- Why Clinical Insight Engine?
- Key Features
- Architecture
- Tech Stack
- Getting Started
- Project Structure
- API Reference
- ML Pipeline
- Single-Patient Prediction
- Environment Variables
- Troubleshooting
- Roadmap
- Contributing
- Contributors
Diabetes affects over 500 million adults worldwide, yet early risk signals are often buried in routine clinical data. Clinical Insight Engine bridges that gap:
| Problem | Our Approach |
|---|---|
| Risk models are opaque black boxes | Interpretable Logistic Regression with per-feature impact scores |
| Results are one-size-fits-all | Dual-view output β detailed for clinicians, simplified for patients |
| Predictions lack context | Confidence-aware assessments with actionable follow-up recommendations |
| Patient data sits in silos | Longitudinal tracking with full assessment history |
Collects clinically relevant inputs:
Age Β· Gender Β· Hypertension Β· Heart Disease Β· Smoking History Β· BMI Β· HbA1c Β· Blood Glucose
|
π©» Clinician View
|
π§ββοΈ Patient View
|
- Stores assessments with full timestamps
- Enables longitudinal patient risk tracking over time
- Interactive bar charts for factor contributions
- Diabetes correlation heatmap for data exploration
graph TB
subgraph Client["π₯οΈ Client β React + TypeScript"]
UI["Risk Assessment Form"]
CV["Clinician View"]
PV["Patient View"]
VIZ["Data Visualizations"]
HIST["Assessment History"]
end
subgraph Server["βοΈ Server β Express.js"]
API["REST API Routes"]
VAL["Zod Validation"]
ORM["Drizzle ORM"]
PY["Python Bridge"]
end
subgraph ML["π§ ML Pipeline β Python"]
PROC["Data Preprocessing"]
MODEL["Logistic Regression"]
INTERP["Feature Interpretation"]
CACHE["Model Cache (pickle)"]
end
subgraph DB["ποΈ PostgreSQL"]
ASSESS["Assessments Table"]
end
Client -->|"HTTP Requests"| API
API --> VAL --> ORM
API --> PY -->|"spawn process"| ML
ORM --> DB
ML -->|"risk scores + factors"| PY
CACHE -.->|"load cached model"| MODEL
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | React 18 + TypeScript | UI framework with type safety |
| Vite | Lightning-fast dev server & bundler | |
| Tailwind CSS | Utility-first styling with dark mode | |
| TanStack Query | Server state & cache management | |
| React Hook Form + Zod | Form handling with schema validation | |
| Recharts | Interactive data visualizations | |
| Framer Motion | Smooth UI animations | |
| Backend | Express.js | REST API server |
| Drizzle ORM | Type-safe database queries | |
| PostgreSQL 14+ | Relational data storage | |
| Zod | Runtime schema validation | |
| ML Pipeline | Python 3.10+ | ML runtime environment |
| scikit-learn | Logistic Regression model | |
| pandas / NumPy | Data manipulation & preprocessing | |
| pickle | Model & scaler caching |
| Tool | Version | Check | Download |
|---|---|---|---|
| Node.js | 18+ LTS | node -v |
nodejs.org |
| npm | 9+ | npm -v |
bundled with Node |
| Python | 3.10+ | python3 --version |
python.org |
| PostgreSQL | 14+ | psql --version |
postgresql.org |
| Git | Any | git --version |
git-scm.com |
| Docker | 20+ | docker --version |
docker.com |
| Docker Compose | 2+ | docker compose version |
bundled with Docker |
If you have Docker installed, you can skip the manual installation of Node.js, Python, and PostgreSQL entirely. Running the application requires just a single command.
Simply run the following command in the project root:
docker compose upThis command will:
- Spin up a PostgreSQL 16 database container with persistent storage.
- Build the app container including Node.js 20 and a Python 3 virtual environment with all scikit-learn/pandas dependencies.
- Wait for the database to be healthy, then run migrations (
npm run db:push). - Automatically seed the database with sample clinical assessments (in development mode).
- Launch the full-stack server with live-reloading (HMR) enabled.
Once started, open your browser and navigate to:
- Web App & REST API: http://localhost:3000
To stop the services while preserving your data:
docker compose downTo stop the services and completely reset the database (deleting persistent volumes):
docker compose down -vIf you update package.json or requirements.txt dependencies, trigger a clean rebuild:
docker compose up --buildgit clone https://github.com/gopaljilab/Clinical-Insight-Engine.git
cd Clinical-Insight-Engine
npm installLinux / macOS
cp .env.example .envWindows (PowerShell)
Copy-Item .env.example .envWindows (Command Prompt)
copy .env.example .envIf .env.example doesn't exist, create .env manually and add:
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/clinical_insight_engineπ§ͺ Developer Authentication Setup (optional)
For local frontend authentication testing, create a .env.local file (git-ignored):
NODE_ENV=development
NEXT_PUBLIC_APP_URL=http://localhost:3000
DEV_CLINICIAN_EMAIL=developer@cardioguard.local
DEV_CLINICIAN_PASSWORD=DevSecurePassword123!
NEXT_PUBLIC_LOCAL_ENCRYPTION_KEY=your_local_32_character_secret_key_hereRules of thumb:
π .envβ database & server secrets onlyπ .env.localβ local seeded credentials only (never commit)- Restart the dev server after editing
.env.localso Vite reloads variables- Never paste demo credentials into UI, docs, screenshots, or PRs
- Start the app with
npm run dev - Open
http://localhost:5173 - Click Login or Go to App
- Enter your
.env.localseeded credentials - Complete the simulated OTP step
- You'll be redirected to
/dashboard
In development mode, the login form shows a small amber notice reminding you to use local seeded credentials. This banner and the
DEV_*variables are never exposed in production builds.
π§ Linux (Ubuntu / Debian)
# Install PostgreSQL
sudo apt update && sudo apt install postgresql postgresql-contrib
# Start & enable the service
sudo systemctl start postgresql
sudo systemctl enable postgresql
# Create database & set password
sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'postgres';"
sudo -u postgres psql -c "CREATE DATABASE clinical_insight_engine;"π macOS (Homebrew)
# Install PostgreSQL
brew install postgresql
# Start the service
brew services start postgresql
# Create database & set password
psql postgres -c "ALTER USER postgres WITH PASSWORD 'postgres';"
psql postgres -c "CREATE DATABASE clinical_insight_engine;"πͺ Windows
- Download and install PostgreSQL from postgresql.org/download/windows
- During installation, use:
- Username:
postgres - Password:
postgres - Port:
5432
- Username:
- Create a database named
clinical_insight_engineusing pgAdmin or the PostgreSQL CLI. - Update your
.envfile:
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/clinical_insight_enginePush the database schema:
npm run db:pushThe server runs a PostgreSQL preflight check on startup. If you see
Database startup check failed, verify that:
- PostgreSQL service is running
DATABASE_URLin.envis correct- The migration above has been run
- Port
5432is not blocked
π§ Linux / π macOS
# Create virtual environment
python3 -m venv .venv
# Activate
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txtπͺ Windows (PowerShell)
# Create virtual environment
py -m venv .venv
# Activate
.\.venv\Scripts\Activate.ps1
# Install dependencies
pip install -r requirements.txtIf the dataset already exists in the project:
# Linux / macOS
cp attached_assets/diabetes_dataset.csv ./diabetes_dataset.csv
# Windows (PowerShell)
Copy-Item attached_assets/diabetes_dataset.csv ./diabetes_dataset.csvIf the dataset is missing, generate synthetic data:
# Linux / macOS
python3 -c "from analyze import create_synthetic_data; create_synthetic_data()"
# Windows
py -c "from analyze import create_synthetic_data; create_synthetic_data()"# Start the full-stack dev server
npm run dev| Service | URL |
|---|---|
| Frontend | http://localhost:5173 |
| Backend API | http://localhost:3000 |
Stop the dev server:
Ctrl + C
Deactivate the Python environment:
deactivateClinical-Insight-Engine/
β
βββ client/ # React frontend
β βββ src/
β βββ components/ # Reusable UI components
β βββ pages/ # Route-level page components
β βββ hooks/ # Custom React hooks
β β βββ use-assessments.ts # TanStack Query hooks for API calls
β β βββ use-toast.ts # Toast notification state
β βββ lib/ # Utilities & API client
β β βββ queryClient.ts # Global fetch config + React Query setup
β β βββ utils.ts # cn() Tailwind class merge utility
β βββ utils/
β βββ search_filters.ts # Patient search & filter logic
β βββ date_fix.ts # Safe date parser helper
β
βββ server/ # Express.js backend
β βββ index.ts # Server entry point & startup
β βββ routes.ts # API route definitions
β βββ storage.ts # Data access layer (DB queries)
β βββ db.ts # Drizzle ORM + PostgreSQL pool
β βββ static.ts # Serves built React frontend
β βββ vite.ts # Vite dev server integration (HMR)
β βββ db_fix.ts # Clean process exit on DB errors
β
βββ shared/ # Shared between client & server
β βββ schema.ts # Drizzle DB schema + Zod types
β βββ routes.ts # Shared API request/response schemas
β
βββ script/
β βββ build.ts # esbuild + Vite production build script
β
βββ attached_assets/ # Static assets (dataset, images)
β βββ diabetes_dataset.csv
β
βββ analyze.py # ML pipeline β training & inference
βββ main.py # Python entry point
βββ diabetes_dataset.csv # Training dataset (root copy)
βββ correlation_heatmap.png # Diabetes feature correlation heatmap
βββ patient.json # Sample patient input for CLI prediction
β
βββ drizzle.config.ts # Drizzle ORM configuration
βββ vite.config.ts # Vite bundler configuration
βββ tailwind.config.ts # Tailwind CSS configuration
βββ tsconfig.json # TypeScript configuration
βββ postcss.config.js # PostCSS configuration
βββ components.json # shadcn/ui component registry
βββ pyproject.toml # Python project metadata
βββ requirements.txt # Python dependencies
βββ package.json # Node.js dependencies & scripts
βββ package-lock.json # Locked dependency versions
βββ uv.lock # uv Python lock file
β
βββ README.md # Project documentation
βββ ANALYSIS_README.md # ML analysis documentation
βββ CONTRIBUTING.md # Contribution guidelines
βββ CODE_OF_CONDUCT.md # Community code of conduct
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Application health check endpoint for monitoring |
POST |
/api/assessments |
Submit a new risk assessment |
GET |
/api/assessments |
Retrieve assessment history |
GET |
/api/assessments/:id |
Get a specific assessment by ID |
POST |
/api/ingest/fhir |
Ingest a FHIR R4 JSON bundle |
# Health Check
curl -X GET http://localhost:3000/health
# Submit Assessment
curl -X POST http://localhost:3000/api/assessments \
-H "Content-Type: application/json" \
-d '{
"gender": "Female",
"age": 52,
"hypertension": true,
"heartDisease": false,
"smokingHistory": "former",
"bmi": 30.1,
"hba1cLevel": 6.4,
"bloodGlucoseLevel": 148
}'Allows submitting standard FHIR R4 JSON bundles containing patient demographic details, clinical vitals/lab values, and clinical notes.
- Patient: Extracts
id,name,gender(mapped toMale/Female), and calculates patientagefrombirthDate. - Observation: Extracts clinical values such as
BMI,HbA1c,Blood Glucose, and flagshypertensionandheartDiseaseusing LOINC codes and display terms. - DocumentReference: Extracts note titles, descriptions, and decoded base64 attachments, merging them into a unified clinical note transcript.
To ensure clinical decisions are traceable and verifiable, the pipeline extracts source citations for key clinical features. When note text is found in DocumentReference entries, the parser:
- Performs regex/vitals and keyword scanning for Hypertension (e.g. BP measurements like
145/90or keywords likehypertension), Heart Disease (e.g.CAD,myocardial infarction), and Smoking History (e.g.former smoker,never smoked). - Extracts the exact sentence snippet enclosing the evidence (
source_snippet). - Computes the zero-indexed character bounds
[start, end]within the raw concatenated text (source_index). - If no evidence is found, these values are returned as
null.
A successful FHIR ingestion response returns the extracted clinical note and explainable insights:
{
"status": "success",
"id": 42,
"clinical_note": "Routine visit. BP reading 145/95 noted. Quit smoking last year.",
"explainable_insights": [
{
"insight": "Patient shows signs of hypertension",
"source_snippet": "BP reading 145/95 noted",
"source_index": [15, 38]
},
{
"insight": "Patient shows signs of heart disease",
"source_snippet": null,
"source_index": null
},
{
"insight": "Patient has a smoking history (former)",
"source_snippet": "Quit smoking last year",
"source_index": [40, 62]
}
]
}On the Clinician View tab of the results page, the clinical note is rendered in an interactive viewer:
- Interactive Highlights: Clicking any cited insight automatically highlights the matching text in the note.
- Auto-Scroll: The highlighted source text is scrolled smoothly into view.
- Keyboard Navigation:
- Use Arrow Down / Arrow Right to move to the next cited insight.
- Use Arrow Up / Arrow Left to move to the previous cited insight.
- Press Escape to clear the selection and highlight.
curl -X POST http://localhost:3000/api/ingest/fhir \
-H "Content-Type: application/json" \
-d '{
"resourceType": "Bundle",
"type": "collection",
"entry": [
{
"resource": {
"resourceType": "Patient",
"id": "pat-123",
"name": [
{
"use": "official",
"given": ["John", "Edward"],
"family": "Smith"
}
],
"gender": "male",
"birthDate": "1980-01-01"
}
},
{
"resource": {
"resourceType": "Observation",
"code": {
"coding": [
{
"system": "http://loinc.org",
"code": "39156-5",
"display": "Body Mass Index"
}
]
},
"valueQuantity": {
"value": 24.5,
"unit": "kg/m2"
}
}
}
]
}'The machine learning pipeline (analyze.py) implements an interpretable risk assessment model:
graph LR
A["π Raw Data"] --> B["π§Ή Cleaning & Validation"]
B --> C["βοΈ Feature Engineering"]
C --> D["π StandardScaler"]
D --> E["π Logistic Regression"]
E --> F["π― Risk Score 0β100%"]
E --> G["π Feature Importance"]
F --> H["πΎ Cached Model"]
G --> H
| Step | Details |
|---|---|
| Data Cleaning | Filters unrealistic values (BMI < 10, glucose < 50, HbA1c < 3) and replaces with medians |
| Encoding | Gender β binary; Smoking history β one-hot encoding |
| Scaling | StandardScaler on age, BMI, HbA1c, blood glucose |
| Model | LogisticRegression with balanced class weights |
| Caching | Trained model + scaler serialized via pickle for fast inference |
# Linux/macOS
python3 analyze.py
# Windows
py analyze.pyCreate a patient JSON file:
{
"gender": "Female",
"age": 52,
"hypertension": true,
"heartDisease": false,
"smokingHistory": "former",
"bmi": 30.1,
"hba1cLevel": 6.4,
"bloodGlucoseLevel": 148
}Run prediction:
# Linux/macOS
python3 analyze.py predict_file patient.json
# Windows
py analyze.py predict_file patient.json| Variable | File | Description |
|---|---|---|
DATABASE_URL |
.env |
PostgreSQL connection string |
NODE_ENV |
.env.local |
Set to development for local dev features |
SESSION_SECRET |
.env |
Required in production for signed Express sessions |
DEV_CLINICIAN_EMAIL |
.env.local |
Seeded clinician email (dev only) |
DEV_CLINICIAN_PASSWORD |
.env.local |
Seeded clinician password (dev only) |
NEXT_PUBLIC_LOCAL_ENCRYPTION_KEY |
.env.local |
Local encryption key (dev only) |
ENABLE_PHI_REDACTION |
.env |
Enable privacy-preserving PHI redaction (defaults to true) |
Security:
.env.localis git-ignored and should never be committed. Production builds do not expose dev credentials.
Request limits: JSON and URL-encoded API payloads are limited to
256kbby default. Add route-specific upload handling before increasing this global limit. Production sessions: When the app runs behind a TLS-terminating reverse proxy or load balancer, Express trusts one proxy hop in production so secure session cookies are issued fromX-Forwarded-Proto: httpsrequests.
"PostgreSQL is unreachable"
- Verify PostgreSQL is running:
sudo systemctl status postgresql(Linux) orbrew services list(macOS) - Confirm
DATABASE_URLin.envmatches your local credentials - Ensure port
5432is not blocked by another process - Check that the
clinical_insight_enginedatabase exists
"Database startup check failed"
- Run
npm run db:pushto create/update the required tables - Verify your
.envfile is in the project root (not insideserver/orclient/)
Python model errors
- Ensure the virtual environment is activated:
source .venv/bin/activate - Verify dependencies:
pip install -r requirements.txt - If
diabetes_dataset.csvis missing, copy it:cp attached_assets/diabetes_dataset.csv ./ - Or generate synthetic data:
python3 -c "from analyze import create_synthetic_data; create_synthetic_data()"
Port conflicts
- The dev server defaults to port 5173 (Vite)
- If occupied, Vite will automatically pick the next available port
- Check for processes:
lsof -i :5173(Linux/macOS) ornetstat -ano | findstr :5173(Windows)
- π Longitudinal patient risk tracking across visits
- π‘ Counterfactual reasoning β "What single change reduces risk most?"
- π¬ Cohort discovery and population-level insights
- π₯ Integration with Electronic Health Records (EHR)
- βοΈ Advanced bias detection and ML fairness metrics
- βοΈ Cloud deployment (Vercel / Render)
We love contributions! Whether it's a bug fix, a new feature, or improved docs β every PR makes a difference.
- Fork the repository
- Create your feature branch (
git checkout -b feat/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feat/amazing-feature) - Open a Pull Request
Please read our Contributing Guide and Code of Conduct before submitting.
Gopal Gupta Computer Science Engineer Β· Full-Stack Developer Β· Data Science & ML Enthusiast
Built with β€οΈ for better preventive healthcare
β Star this repo if you find it useful β it helps others discover the project!
- All schema changes must go through drizzle-kit generate.
- Improve heading hierarchy for better readability
- Ensure consistent spacing between sections
- Use proper Markdown formatting for code blocks and lists
- Align all installation and usage steps properly
- Introduction
- Features
- Tech Stack
- Installation
- Usage
- Project Structure
- Contribution Guidelines
- License
- Add badges (optional): build, license, contributors
- Add screenshots for better UI understanding
- Standardize code blocks for commands
Improve onboarding experience for new contributors and users by making README more structured, readable, and professional.
