...
Available at compl-ai.org, the release “includes the first technical interpretation of the EU AI Act, mapping regulatory requirements to technical ones,” and provides tools to evaluate the extent of compliance, together with tools “to evaluate Large Language Models (LLMs) under this mapping,” the group says.
Reuters calls the framework an “EU AI Act checker,” explaining that the test group offers insight into areas where AI models appear at risk of falling short of the law. For example, “discriminatory output” has been a problematic area when it comes to the development of generative AI models, which often reflect human biases around gender and race, among other areas.
“When testing for discriminatory output, LatticeFlow’s LLM Checker gave OpenAI’s GPT-3.5 Turbo a relatively low score of 0.46,” Reuters writes, noting that in the same category, “Alibaba Cloud’s Qwen1.5-72B-Chat model received only a 0.37.”
Tests for “prompt hijacking,” a form of cyberattack in which malicious prompts are disguised as legitimate in order to obtain sensitive information, resulted in Meta’s Llama 2 13B Chat model getting a 0.42 score from the LLM Checker, with Mistral’s 8x7B Instruct model receiving a 0.38, Reuters says.
...
See the full story here: https://www.etcentric.org/eu-ai-act-checker-holds-big-ai-accountable-for-compliance/