On-Device Speaker Diarization (Who's Talking)

Different from the existing "Only Record My Voice" request (which is about filtering OUT other voices). This is about knowing WHO is talking — distinguishing between speakers in the transcript.

My use case: I live with my fiancee and Omi captures both of us. My AI assistant currently can't distinguish between us without guessing from context clues in the transcript. This limits what the system can do with the data — it can't attribute statements, track who said what, or build per-person context.

What I'd like:

- Basic 2-speaker diarization (even just "Speaker A" vs "Speaker B")

- On-device processing preferred (privacy, no cloud round-trip)

- Speaker labels in the webhook payload alongside transcript segments

- Bonus: ability to label/train speakers ("Speaker A = Mark", "Speaker B = Claire")

Even imperfect diarization would be transformative for couples/family use. The existing voice isolation request solves a different problem — this is about attribution, not filtering.

Related: upvoted "Only Record My Voice" as a complementary feature.

Omi

On-Device Speaker Diarization (Who's Talking)

Subscribe to post

Subscribe to post