Announcing our $5M Seed Round Led by First Round

Jan 16, 2025

We started David AI six months ago to build the data layer for audio AI. We believe audio will become as central to human-to-AI interaction as it is to human-to-human interaction, but in order to get there, model developers need access to much more high-quality audio data than they have today.

Today, we’re excited to announce our $5M seed round, led by First Round Capital with participation from BoxGroup, Y Combinator, SV Angel, Liquid 2, and an awesome set of angels.

To achieve their potential, audio models need better data.

Audio models need substantially more training data to improve reasoning performance, naturalness, and robustness. That data has historically been hard to come by.

High-quality audio data is fragmented – there's no Common Crawl for audio. It's scarce in the right formats – for example, until now, the most-cited multi-channel speech datasets in research are dated and only hundreds of hours in duration. It’s also hard to generate new audio – you need to ensure content accuracy as with text, while also accounting for acoustic properties, microphones and recording environments, languages, and localizations.

Speaking to researchers, we realized there was an opportunity to take audio data collection off their plates, so we built a product and operation set up for 1,000x scale.

David AI is the first audio-native AI data platform.

In 2025, audio AI will have its ‘ChatGPT moment’. Our mission is to accelerate this by helping our customers bring better audio models to market, faster.

We’re building the infrastructure to collect studio-grade audio data at an unprecedented scale across every language and geography – exponentially expanding the breadth of available audio data, while preserving the sound quality nuances that make or break a model. This requires novel software, hardware, and operations built specifically for audio.

Since founding David AI, we’ve collected the largest corpus of channel-separated speech data on the market. The dataset is 10x the next largest one and spans ~15 languages, with rich accent and dialect metadata. Our data has already been used to train several of the best speech models on the market.

Join us.

We’re a lean team that met while working at Scale AI, and we obsess over execution. In six months, we’ve exceeded 7-figures in revenue, partnering with leading AI labs from FAANG companies to startups.

If you’re excited about audio AI and driving measurable impact for the best AI companies in the world, join us. We’re hiring founding engineers and operators – when there’s a fit, we move quickly.

Apply here or reach out at tomer@withdavid.ai.