
That overlay is real: the face mesh comes from Apple Vision and the emotion bars from the actual HSEmotion model run on this portrait (it reads Sadness, 59%). Only the heart rate is staged — rPPG needs a moving video, not a still.
Vizier
Reads emotion and heart rate from a face on your screen — entirely on-device.
Vizier sits in your menu bar. Point it at a face on your screen — a video call, a recording — and it estimates, in real time and without sending anything anywhere, two things: the person's emotion (an eight-class facial-expression read) and their heart rate, inferred from the faint colour changes in skin as blood flows beneath it.
How it works
- Select a region — You draw a box around a face on screen (⌘R) — usually the other person's tile in a call.
- Capture the screen, not a camera — ScreenCaptureKit streams just that region. Vizier never opens the webcam — it reads pixels already on your display.
- Find the face — Apple's Vision framework locates the face and its contour every frame, and a skin-masked RGB mean is sampled from the full-resolution pixels.
- Analyze in a local sidecar — The samples are handed to a bundled Python process over an owner-only Unix socket. The POS algorithm (Wang et al., 2017) recovers a pulse from the near-invisible colour shifts of blood flow; HSEmotion (EfficientNet-B0 / AffectNet) classifies the expression into eight categories.
- Display — Heart rate, a live BVP waveform, and the emotion mix render in the menu bar and a floating overlay — the readout panel on the right of the image above.

The floating readout: a live landmark thumbnail, the current heart rate over a bedside-monitor pulse wave, and all eight emotions as bars — neutral, happiness, sadness, anger, fear, surprise, disgust, contempt — rolling-averaged so they don't flicker frame to frame.
Privacy
- Nothing is transmitted
- — There are no network calls in either the Swift app or the Python sidecar.
- Nothing is persisted
- — Face pixels live only in a 30-second in-memory window; just the resulting numbers cross the socket. No screenshots, no database.
- Local-only IPC
- — The Swift host and the ML sidecar talk over a per-process Unix socket, chmod 0600 — owner-only.
- Screen, not camera
- — It uses the Screen Recording permission, never the camera entitlement.
A note on consent. Vizier analyzes whoever is on screen — typically the other person on a call — and they have no way to know. That's worth sitting with. The numbers are a curiosity and a tech demo, not a lie detector or a medical device: rPPG from a compressed video stream is noisy, and emotion classifiers are coarse and biased.
Under the hood
- Swift + SwiftUI MenuBarExtra (host app + overlay)
- ScreenCaptureKit (region capture)
- Apple Vision (face + landmark detection)
- Python sidecar — NumPy POS rPPG + HSEmotion ONNX
- msgpack over a 0600 Unix domain socket