← Projects
Vizier's overlay on a portrait

That overlay is real: the face mesh comes from Apple Vision and the emotion bars from the actual HSEmotion model run on this portrait (it reads Sadness, 59%). Only the heart rate is staged — rPPG needs a moving video, not a still.

2026macOSOn-device

Vizier

Reads emotion and heart rate from a face on your screen — entirely on-device.

Vizier sits in your menu bar. Point it at a face on your screen — a video call, a recording — and it estimates, in real time and without sending anything anywhere, two things: the person's emotion (an eight-class facial-expression read) and their heart rate, inferred from the faint colour changes in skin as blood flows beneath it.

How it works

  1. Select a regionYou draw a box around a face on screen (⌘R) — usually the other person's tile in a call.
  2. Capture the screen, not a cameraScreenCaptureKit streams just that region. Vizier never opens the webcam — it reads pixels already on your display.
  3. Find the faceApple's Vision framework locates the face and its contour every frame, and a skin-masked RGB mean is sampled from the full-resolution pixels.
  4. Analyze in a local sidecarThe samples are handed to a bundled Python process over an owner-only Unix socket. The POS algorithm (Wang et al., 2017) recovers a pulse from the near-invisible colour shifts of blood flow; HSEmotion (EfficientNet-B0 / AffectNet) classifies the expression into eight categories.
  5. DisplayHeart rate, a live BVP waveform, and the emotion mix render in the menu bar and a floating overlay — the readout panel on the right of the image above.
Vizier's readout panel

The floating readout: a live landmark thumbnail, the current heart rate over a bedside-monitor pulse wave, and all eight emotions as bars — neutral, happiness, sadness, anger, fear, surprise, disgust, contempt — rolling-averaged so they don't flicker frame to frame.

Privacy

Nothing is transmitted
There are no network calls in either the Swift app or the Python sidecar.
Nothing is persisted
Face pixels live only in a 30-second in-memory window; just the resulting numbers cross the socket. No screenshots, no database.
Local-only IPC
The Swift host and the ML sidecar talk over a per-process Unix socket, chmod 0600 — owner-only.
Screen, not camera
It uses the Screen Recording permission, never the camera entitlement.

A note on consent. Vizier analyzes whoever is on screen — typically the other person on a call — and they have no way to know. That's worth sitting with. The numbers are a curiosity and a tech demo, not a lie detector or a medical device: rPPG from a compressed video stream is noisy, and emotion classifiers are coarse and biased.

Under the hood