Enterprise-Grade Audio Datasets
High-quality, validated audio recordings for training and evaluating AI systems. Double-blind validated. Community-sourced. Ethically licensed.
Toishanese Transit Corpus v1.0
v1.0.0
High-quality audio recordings of transit-related phrases in Toishanese dialect. Includes bus, BART, directions, and taxi conversations.
1,247
Clips
3.2h
Duration
89
Speakers
Preview Samples
$10,000
One-time license
Toishanese Medical Corpus v0.5 (Beta)
v0.5.0
Medical and healthcare-related phrases including symptoms, pharmacy, and appointment scheduling. Currently in beta with ongoing collection.
423
Clips
1.1h
Duration
34
Speakers
$2,500
One-time license
Toishanese Family & Greetings v1.0
v1.0.0
Cultural greetings, family terms, honorifics, and celebration phrases. Essential for culturally-aware AI systems.
892
Clips
2.4h
Duration
67
Speakers
Preview Samples
$10,000
One-time license
DLI Toishanese Foundation (17 hrs)
v1.0.0
Defense Language Institute Toishanese Basic Course. 17+ hours of structured audio with aligned textbook transcriptions.
1,306
Clips
17.2h
Duration
12
Speakers
$50,000
One-time license
My Purchases
DLI Toishanese Foundation
Purchased 12/1/2025 · Downloaded 3 times
Licensing Tiers
Research
$2,500
- Academic use only
- No commercial deployment
- Attribution required
- Standard support
Commercial
$10,000
- Full commercial rights
- Production deployment
- Attribution required
- Priority support
Sovereign
$50,000
- Exclusive license option
- Air-gapped deployment
- No attribution required
- Dedicated support
Need a custom dataset or have questions about licensing?
Contact Enterprise Sales