
AI is meaningful when you can naturally interact with it.
We are an audio data research company.
Our mission is to bring AI into the real world through voice, the most important interface to human interaction.
We develop audio datasets with the same rigor researchers bring to models.






Our datasets are used by Fortune 100 companies and research labs that work with speech recognition, translation, synthesis, and conversational AI.
A dataset suite designed for speech-to-speech, multilingual, and voice interaction systems
Converse
Our flagship English dataset includes over 15,000 hours of channel-separated, natural two-speaker conversations covering a wide range of topics.
Atlas
A multilingual dataset spanning 15+ languages. It includes metadata on dialects and accents and follows the same format as Converse.


Chorus
A dataset of conversations involving three or more speakers. Originally designed for training speaker-separation and diarization models.
Dialog
A collection of expert conversations across a range of domains.
Browse more datasets or design one with us
We offer additional proprietary datasets not listed here.
Contact us to request a sample, explore more options, or collaborate on a new dataset.
How to access our datasets
1. Request samples
We will set up a quick call to understand your use case and then send you relevant data samples.
2. Purchase access
Enter a data license agreement for the dataset and use-cases your team needs.
3. Receive data
For off-the-shelf datasets, we will grant your team access within one to two days.
Bonus: Experiment with us
We frequently partner with research teams to design new shapes of data for any use case.
Join us to shape the future of audio AI
We’re hiring for research, engineering, and operations roles.