Two undergraduates from Korea have built "Dia," an open-source speech AI model that competes with tools like ElevenLabs and Google’s NotebookLM. With no prior funding and minimal experience, they used Google’s TPU Research Cloud to train the model. Dia is now publicly available on Hugging Face and GitHub.
The Decode:
• Dia Offers Rich Voice Control - The 1.6B parameter model can generate podcast-like dialogues with control over tone, speaker tags, disfluencies, and even nonverbal sounds like coughs or laughter. Users can prompt Dia to generate random voices or clone real ones. In early demos, it rivaled commercial tools in quality and flexibility. ...
• Voice Cloning and Lack of Safeguards Raise Flags - Dia enables simple voice cloning, and its open access lacks strong safeguards. While the creators discourage misuse, they also disclaim responsibility. The data used for training hasn’t been disclosed, raising potential copyright concerns. ...
See the full story here: https://decodeai.ghost.io/undergrads-outsmart-big-tech-ai/