System / Health — Live probes + Incidents — HexaHealth Help

เปิดหน้า

/super-admin/system — แสดงสถานะ service ของ HexaHealth platform แบบ real-time

หน้านี้ enabled เฉพาะ SUPER_ADMIN role — ลูกค้าไม่เห็น (สำหรับลูกค้า public status page อยู่ที่อื่น)

Live Probes

ระบบ probe ทุก service ทุกครั้งที่ refresh:

| Service | สิ่งที่ probe | |---|---| | Database | SELECT 1 วัด latency | | File Storage (R2/S3) | env config check | | AI · Claude | GET /v1/models (no token billing) | | LINE OA Bridge | GET /v2/bot/info |

แต่ละ probe มี timeout 5 วินาที

สถานะที่เห็น

ทุก service card แสดง:

Status badge: 🟢 operational / 🟡 degraded / 🔴 outage
Region (ap-southeast-1, us-east-1, ฯลฯ)
Uptime 30d (% จาก cron history)
Latency live (ms ปัจจุบัน)
p95 latency 30d (จาก probe history)
Error message (ถ้า outage) — แสดง raw error

"Degraded" = service ตอบช้ากว่า threshold (เช่น DB > 200ms, Claude > 2s)

Auto-refresh

หน้า refresh ทุก 60 วินาที + button Probe ตอนนี้ สำหรับ manual probe

Cron Background

/api/cron/health-check รันทุก 5 นาที โดย Vercel cron:

Probe ทุก service
เก็บผลใน HealthCheckResult table
คำนวณ uptime 30 วัน + p95 latency
Auto-open incident ถ้า probe fail 3 ครั้งติด
Auto-resolve incident เมื่อ probe ผ่าน 2 ครั้งติด

Incident Log

/super-admin/system → Incident & Maintenance Log:

ทุก incident แสดง:

Title
Severity: info / minor / major / critical
Status: investigating / identified / monitoring / resolved
Service affected
เริ่ม / สิ้นสุด
Description
Postmortem URL (link ไป Notion / blog)

Manual Incident

กด + เพิ่ม incident:

กรอก title, severity, status, service, description
ใช้สำหรับ planned maintenance, outage ที่ probe ไม่จับ (เช่น UI bug)

แก้/ลบทีหลังได้

ปุ่ม Action shortcuts

ที่ด้านล่าง — links ไป tool ภายนอก:

ดู Logs → Vercel logs dashboard
Cache (Cloudflare) → CF dashboard
External monitoring → Better Stack / UptimeRobot

Setup สำหรับ production

ใน .env:

CRON_SECRET=<random-token> — bearer token สำหรับ cron auth (Vercel cron ส่งให้)
ไม่ต้องตั้งอะไรเพิ่มสำหรับ probe — ใช้ env vars เดิมของ service นั้น ๆ

ตัวอย่าง alerting integration (future)

ปัจจุบันไม่มี automatic Slack/email alert. ต้องเพิ่มเอง:

Sentry — capture failed probe เป็น error → alert ตาม Sentry rules
Better Stack — ping ระบบเดียวกัน + Slack notification
Custom webhook — modify cron route ให้ POST ไป Slack