philip lelyveld The world of entertainment technology

10Aug/17Off

Stanford AI researchers make ‘socially inclusive’ NLP using Urban Dictionary and Twitter

143104959_38f8779060_oStanford University AI researchers have created a “socially equitable” natural language processing (NLP) tool they say improves upon off-the-shelf AI solutions used today that fail to account for things like regional dialects, slang, or the natural way people talk when they regularly speak more than one language.

In a paper published late last week, researchers found Equilid to be more accurate than commonly used identification tools like langid.py and Google’s CLD2. Popular language identification tools, the paper argues, draw on a “European-centric corpora” of the written word, as well as websites, Wikipedia, and newswires, methods that may not best represent the way people actually talk.

The report finds that more effective identification of language in underrepresented dialects could “help reveal dangerous trends in infectious diseases in the areas that need it most.”

“Flu tracking, election predictions, anytime you’re trying to do social sensing or you want to use social media to predict outcomes, that’s kind of where we see our application having the biggest impact,” lead author David Jurgens told VentureBeat in a phone interview.

AI trained with slang

To make Equilid work, language and text were drawn a variety of sources, like European legislation and Wikipedia, but also Urban Dictionary, conversations about articles in the Talk section of the Wikipedia website, and African American Vernacular English, also known as ebonics. Interpretations of the Bible and Quran were also used; the Watchtower magazine from Jehovah’s Witnesses, which is translated into hundreds of languages, was also a rich resource.

By far, Jurgens said, the majority of language used to train Equilid and strengthen its ability to recognize specific geographic regions came from Twitter. Equilid draws upon language from nearly 98 million tweets from 1.5 million users in 53 languages.

Equality through accuracy

The goal with Equilid, Jurgens said, was not just to make a more socially equitable product, but to improve the accuracy and overall quality of NLP.

See the full story here: https://venturebeat.com/2017/08/08/stanford-ai-researchers-make-socially-inclusive-nlp-using-urban-dictionary-and-twitter/

Comments (0) Trackbacks (0)

Sorry, the comment form is closed at this time.

Trackbacks are disabled.