philip lelyveld The world of entertainment technology

10Aug/22Off

The ‘Nonsense Language’ That Could Subvert Image Synthesis Moderation Systems

New research from Columbia university suggests that the safeguards that prevent image synthesis models such as DALL-E 2, Imagen and Parti from being able to output damaging or controversial imagery are susceptible to a kind of adversarial attack that involves ‘made up’ words.

The author has developed two approaches that can potentially override the content moderation measures in an image synthesis system, and has found that they are remarkably robust even across different architectures, indicating that the weakness is more than just systemic, and may key on some of the most fundamental principle of text-to-image synthesis. ...

The first, and the stronger of the two, is called macaronic prompting. The term ‘macaronic’ originally refers to a mixture of multiple languages, as found in Esperanto or Unwinese. Perhaps the most culturally-diffused example would be Urdu-English, a type of ‘code mixing’ common in Pakistan, which quite freely mixes English nouns and Urdu suffixes. ...

Cryptic Language in DALL-E 2

It has been suggested before that the gibberish that DALL-E 2 outputs whenever it tries to depict written language could in itself be a ‘hidden vocabulary’. However the prior research into this mysterious language has not offered any way to develop nonce strings that can summon up specific imagery. ...

See the full story here: https://www.unite.ai/the-nonsense-language-that-could-subvert-image-synthesis-moderation-systems/

Comments (0) Trackbacks (0)

Sorry, the comment form is closed at this time.

Trackbacks are disabled.