ML Gesture-Controlled Dual-deck DJ Mixer

Computer Vision | Real-Time Audio

USC SEP Hackathon (Sponsored by a16z) | October 2025

A professional dual-deck DJ system controlled entirely through hand gestures using computer vision and machine learning. Built during USC’s SEP Hackathon (sponsored by a16z), this system eliminates the need for expensive DJ equipment by transforming any webcam into a full-featured mixer with sub-50ms latency.

Github

1 minute demo showing real-time gesture control across both decks

SYSTEM ARCHITECTURE

The system processes hand gestures through a multi-stage pipeline to achieve professional-grade audio control:

Input

Google MediaPipe tracks 21 hand landmarks at 30+ FPS, enabling precise dual-hand gesture recognition with minimal latency.

Gesture processing

Raw hand coordinates are classified into postures (palm, fist, pinch, finger counts), then processed through a five-mode gesture detector with intelligent priority handling to prevent conflicts. Coordinate normalization ensures scale-invariant control regardless of hand distance, while hysteresis filtering eliminates jitter for smooth, professional transitions.

Audio Architecture

An event-driven architecture decouples gesture detection from audio processing. Each deck runs an independent Tone.js engine with multi-stem playback (vocals, drums, bass), real-time tempo control (0.8x-1.2x), and dynamic bandpass filtering (400-8000 Hz). A master crossfader blends both decks, maintaining synchronized timing throughout.

Performance

The optimized pipeline achieves end-to-end latency under 50ms, fast enough for live performance, while the React UI provides synchronized visual feedback with waveforms, sliders, and VU meters.