Many popular social media and online platforms use hate
speech detectors that a team of researchers led by Professor N. Asokan have now shown to be brittle and easy to deceive. Bad grammar and awkward spelling—intentional or not—might make toxic social media comments harder for AI detectors to spot.
The team put seven state-of-the-art hate speech detectors to the test. All of them failed.
Modern natural language processing techniques (NLP) can classify text based on individual characters, words or sentences. When faced with textual data that differs from that used in their training, they begin to fumble.
"We inserted typos, changed word boundaries or added neutral words to the original hate speech. Removing spaces between words was the most powerful attack, and a combination of these methods was effective even against Google's comment-ranking system Perspective," says Tommi Gröndahl, doctoral student at Aalto University.
Read more at: https://phys.org/news/2018-09-detectors-online-speech-easily-duped.html#jCp


0 comments:
Post a Comment