15,000+ hours of multi-speaker conversations in 10+ languages
Our proprietary dataset is helping train the world's bleeding edge speech models across the top research labs. Our dataset contains:
Speaker-separated audio files
24+ kHz audio
Off-the-shelf, ready today
Natural, unscripted conversations
Topic and speaker diversity
Metadata on accents and dialects