Goy tuuhuud: Detectors for online hate speech can be easily duped by humans, study shows

Sunday, September 23, 2018

Detectors for online hate speech can be easily duped by humans, study shows

Many popular social media and online platforms use hate

speech detectors that a team of researchers led by Professor N. Asokan have now shown to be brittle and easy to deceive. Bad grammar and awkward spelling—intentional or not—might make toxic social media comments harder for AI detectors to spot.

The team put seven state-of-the-art hate speech detectors to the test. All of them failed.

Modern natural language processing techniques (NLP) can classify text based on individual characters, words or sentences. When faced with textual data that differs from that used in their training, they begin to fumble.

"We inserted typos, changed word boundaries or added neutral words to the original hate speech. Removing spaces between words was the most powerful attack, and a combination of these methods was effective even against Google's comment-ranking system Perspective," says Tommi Gröndahl, doctoral student at Aalto University.

Sunday, September 23, 2018

Detectors for online hate speech can be easily duped by humans, study shows

0 comments:

Post a Comment

Popular Posts