philip lelyveld The world of entertainment technology

2Dec/14Off

How Google “Translates” Pictures Into Words Using Vector Space Mathematics

Now Oriol Vinyals and pals at Google are using a similar approach to translate images into words. Their technique is to use a neural network to study a dataset of 100,000 images and their captions and so learn how to classify the content of images.

But instead of producing a set of words that describe the image, their algorithm produces a vector that represents the relationship between the words. This vector can then be plugged into Google’s existing translation algorithm to produce a caption in English, or indeed in any other language. In effect, Google’s machine learning approach has learnt to “translate” images into words.

To test the efficacy of this approach, they used human evaluators recruited from Amazon’s Mechanical Turk to rate captions generated automatically in this way along with those generated by other automated approaches and by humans.

The results show that the new system, which Google calls Neural Image Caption, fares well. Using a well known dataset of images called PASCAL, Neural image Capture clearly outperformed other automated approaches. “NIC yielded a BLEU score of 59, to be compared to the current state-of-the-art of 25, while human performance reaches 69,” says Vinyals and co.

That’s not bad and the approach looks set to get better as the size of the training datasets increases.

See the full story here: http://www.technologyreview.com/view/532886/how-google-translates-pictures-into-words-using-vector-space-mathematics/?utm_campaign=newsletters&utm_source=newsletter-daily-all&utm_medium=email&utm_content=20141202

Comments (0) Trackbacks (0)

Sorry, the comment form is closed at this time.

Trackbacks are disabled.