A Review on Obstacles and Solutions - Analysis & Notes.
Tags: tech, ml, ai
(Yin & Zubiaga, 2021) provides a comprehensive overview of the challenges and solutions related to building generalisable hate speech detection models. It highlights the importance of generalisation for real-world applications and critically examines existing research, focusing on:
1. Demonstrating the lack of generalisation:
- Cross-dataset testing: The paper summarises studies that demonstrate the poor performance of hate speech detection models when tested on datasets different from the ones they were trained on. It shows performance drops across various model typses, including traditional machine learning models, neural networks, and BERT-based approaches.
- Dataset similarity: The paper explores the influence of dataset characteristics on model generalisation. Datasets with similar content and annotation schemes tend to produce models that generalise better to each other.
- Overestimation of performance: The authors point out that previous research might have overestimated the generalisation performance of hate speech detection models due to methodological flaws in model training and evaluation.
2. Analysing key obstacles to generalisation:
- Non-standard grammar and vocabulary: The paper highlights the challenges posed by the unique linguistic phenomena in social media hate speech, such as obfuscated spelling, code words, and informal language use. These aspects hinder the effectiveness of standard NLP pre-training approaches and contribute to false negatives.
- Limited and biased labelled data:
- Small dataset size: The paper discusses the challenges of acquiring and annotating large datasets for hate speech detection due to subjective nature of hate speech and annotation complexities. Small dataset sizes lead to overfitting and limit generalisability.
- Sampling bias: The paper examines how various sampling methods used to collect hate speech data introduce biases. Keyword search and focusing on specific users lead to skewed datasets, potentially overfitting to specific language styles, authors, or topics.
- Annotation bias: The paper highlights the impact of inconsistent annotation schemes and differing definitions of hate speech across datasets. This leads to different training objectives and limits the generalisation of models trained on individual datasets.
- Representation bias: The paper explores the impact of societal biases reflected in hate speech data. Underrepresentation of minority groups in both data and annotators leads to biases against these groups, potentially amplifying harm instead of mitigating it.
3. Examining existing solutions:
- Addressing non-standard grammar and vocabulary:
- Lexicon expansion: The paper discusses approaches to identify code words used in hate communities and expand relevant lexicons.
- Subword and context-enriched embeddings: The paper reviews the benefits of character-level features, subword embeddings, and sentence embeddings to handle out-of-vocabulary words.
- Domain-specific pre-training: The paper highlights the importance of training word embeddings on relevant social media or news data to improve performance.
- Addressing limited and biased labelled data:
- Domain-specific pre-training: The paper mentions the potential of using pre-trained embeddings tailored for hate speech detection, such as White Supremacy Word2Vec.
- Transfer learning: The paper discusses the use of transfer learning from other tasks like sentiment analysis to benefit from larger datasets. However, the effectiveness of this approach remains inconsistent.
- Data augmentation: The paper explores methods to debias training data through data augmentation, including word replacement strategies and balanced sampling of identity terms.
- Model regularisation: The paper describes approaches to mitigate biases by incorporating regularisation terms into the model’s loss function to reduce the association between identity terms and hate speech labels.
- Addressing implicit hate expression:
- Contextualisation: The paper suggests including contextual information in datasets, such as original news articles or related sentences within a post, to enable understanding of implicit hate speech.
- Annotation schemes: The paper highlights the recent emergence of datasets with specific labels for implicit hate speech, allowing for better distinction between implicit and explicit expressions.
- Model design: The paper mentions approaches to capture implicit hate speech by incorporating features based on linguistic patterns and sentiment analysis.
4. Discussing future research directions:
- Datasets: The paper calls for clear label definitions, consistent annotation guidelines, and improved understanding of hate speech perception. It also emphasises the need for representative data collection methods and addressing biases during dataset construction.
- Models: The paper advocates for reducing overfitting through multi-dataset training, transfer learning, and incorporating domain-specific knowledge. It also encourages further investigation of bias mitigation strategies and developing models that can handle implicit hate speech.
- Model application and impact: The paper emphasises the importance of evaluating models beyond mathematical metrics. It suggests considering real-world applications, conducting in-depth error analysis, and studying the sociotechnical impact of hate speech detection systems.
Key takeaways:
- Building generalisable hate speech detection models is challenging due to the unique nature of online hate speech and limitations in existing NLP methods and data.
- Existing research has shown that current models overfit to specific datasets and exhibit biases against certain groups.
- To address these challenges, future research should focus on improving dataset quality, developing more generalisable models, and considering the wider context and impact of hate speech detection systems.
- Interdisciplinary collaboration is essential to address the multifaceted nature of hate speech detection.
Overall, this paper provides a valuable roadmap for researchers working on hate speech detection. It highlights the critical issues surrounding generalisation and bias, offering insights and directions for future research to develop practical and ethical solutions.
For this analysis, I used the framework outlined in:
Carey, M. A., Steiner, K. L., & Petri, W. A., Jr (2020). Ten simple rules for reading a scientific paper. PLoS computational biology, 16(7), e1008032. https://doi.org/10.1371/journal.pcbi.1008032
References
(Yin & Zubiaga, 2021)
References
Yin, W., & Zubiaga, A. (2021). Towards generalisable hate speech detection: a review on obstacles and solutions. arXiv preprint arXiv:2102.08886. https://arxiv.org/abs/2102.08886