AI is meaningful when you can naturally interact with it.

We are an audio data research company.
Our mission is to bring AI into the real world through voice, the most important interface to human interaction.

2
Process
2
Process

We develop audio datasets with the same rigor researchers bring to models.

i.

Hypothesize

Determine an audio AI capability we wish to unlock.

ii.

Design

Architect a shape of data to teach models that capability.

iii.

Experiment

Launch a targeted data collection.

iv.

Evaluate & Iterate

Measure data quality and tune the collection until a small, high-signal set is achieved.

v.

Productionize

Scale the dataset to thousands of hours.

vi.

Release

Publish the dataset, and continuously improve it over time.

Two overlapping clusters of vertical bars converging in the center on a colorful gradient oval, with labels “a.” on the left and “b.” on the right, illustrating hypothesis transition.A dense grid of vertical bars shifting toward the right into a circle filled with a rainbow gradient, representing the architected data shape for model training.Three staggered horizontal layers of vertical bars converging on a central multicolored diamond, labeled “1.”, “2.”, and “3.”, symbolizing targeted data-collection experiments.A broken ring of vertical bars with one quadrant shown as a rainbow gradient, annotated “A.”, “B.”, and “C.”, depicting the cycle of measuring quality and tuning a small high-signal set.A stylized waveform made of vertical bars colored in sequential rainbow segments, illustrating the scaled dataset spanning thousands of hours.Abstract graphic showing vertical bars, symbolizing audio dataset release.
i.

Hypothesize

Develop a perspective on a new capability we believe audio models should have.

Two overlapping clusters of vertical bars converging in the center on a colorful gradient oval, with labels “a.” on the left and “b.” on the right, illustrating hypothesis transition.
ii.

Design

Architect a shape of data to teach models that capability.

A dense grid of vertical bars shifting toward the right into a circle filled with a rainbow gradient, representing the architected data shape for model training.
iii.

Experiment

Launch a targeted data collection.

Three staggered horizontal layers of vertical bars converging on a central multicolored diamond, labeled “1.”, “2.”, and “3.”, symbolizing targeted data-collection experiments.
iv.

Evaluate & Iterate

Measure data quality and tune the collection until a small, high-signal set is achieved.

A broken ring of vertical bars with one quadrant shown as a rainbow gradient, annotated “A.”, “B.”, and “C.”, depicting the cycle of measuring quality and tuning a small high-signal set.
v.

Productionize

Scale the dataset to thousands of hours.

A stylized waveform made of vertical bars colored in sequential rainbow segments, illustrating the scaled dataset spanning thousands of hours.
vi.

Release

Publish the dataset, and continuously improve it over time.

Abstract graphic showing vertical bars, symbolizing audio dataset release.

Our datasets are used by Fortune 100 companies and research labs that work with speech recognition, translation, synthesis, and conversational AI.

3
Featured Datasets

A dataset suite designed for speech-to-speech, multilingual, and voice interaction systems

icon of two overlapping chat bubbles

Converse

Our flagship English dataset includes over 15,000 hours of channel-separated, natural two-speaker conversations covering a wide range of topics.

icon of a sphere globe with ring lines

Atlas

A multilingual dataset spanning 15+ languages. It includes metadata on dialects and accents and follows the same format as Converse.

triangle outline shape

Chorus

A dataset of conversations involving three or more speakers. Originally designed for training speaker-separation and diarization models.

icon of a small bot

Dialog

A collection of expert conversations across a range of domains.

Browse more datasets or design one with us

We offer additional proprietary datasets not listed here.
Contact us to request a sample, explore more options, or collaborate on a new dataset.

Contact us

How to access our datasets

1. Request samples

We will set up a quick call to understand your use case and then send you relevant data samples.

2. Purchase access

Enter a data license agreement for the dataset and use-cases your team needs.

3. Receive data

For off-the-shelf datasets, we will grant your team access within one to two days.

Bonus: Experiment with us

We frequently partner with research teams to design new shapes of data for any use case.

Contact us for more information.
4
Careers

Join us to shape the future of audio AI

We’re hiring for research, engineering, and operations roles.