FanVoice AI

AI-powered analysis of screen recordings that turns user-submitted videos into structured, time-ordered reproduction steps for faster triage and resolution

Client FanKave

Year 2024

Services

API Integration Multimodal AI Video Analysis Structured Output

View Project Visit Website

OVERVIEW

The Project

Built a screen recording analysis service that uses multimodal AI to watch user-submitted bug-report videos and produce structured, timestamped reproduction steps, error indicators, and a concise issue summary—so support and engineering can triage and fix issues without re-watching long recordings.

THE CHALLENGE

What We Solved

Challenge

Users often submit screen recordings when reporting bugs, but support and developers had to watch entire videos manually to extract reproduction steps. Transcriptions and logs were disconnected from what was on screen, and critical moments were easy to miss or poorly documented.

Solution

We designed and implemented an analysis pipeline that accepts a video URL plus optional context (issue title, description, transcription, logs). The service downloads the recording, uploads it to Google's multimodal AI (Gemini), and uses a structured prompt to extract time-ordered key events—each with a timestamp (MM:SS), short summary, description, and significance score (1–5). The model correlates on-screen actions with logs and transcription, outputs a single issue summary, and we validate and filter results (e.g. discarding timestamps beyond video duration) before returning structured JSON. Retries and schema validation ensure reliable, actionable output for downstream triage and issue resolution.

THE RESULTS

Numbers that matter

Structured Repro Step Clarity Time-ordered steps with timestamps and significance scores

Reduced Triage Time Engineers get actionable steps without watching full recordings

Unified Context Integration Video, transcription, and logs combined in one analysis