Making computers understand sarcasm? Yeah, right.

Sarcasm Detection using NLP — Part 0

Shrusti Ghela
3 min readAug 4, 2022

--

I have a friend who does not understand sarcasm. Everything I ever said was taken to mean what it literally meant! Crazy, right?

While we are on the topic of not understanding sarcasm, do you remember Sheldon from the show ‘The Big Bang Theory?’

Sarcasm sign 😂

This made me wonder how humans actually distinguish between sarcasm and sincerity. Because, we don’t see people holding sarcasm sign every time they speak, do we? And can we teach computers to identify sarcasm?

There were times when computers did not understand human languages. We have come a long way in the field of Natural Langauge Processing. Up to a point that people outside of academia and Tech are discussing NLP. One of the most common NLP applications is Sentiment analysis.

“Sentiment analysis is often used to understand people’s subjective opinions. However, the analysis results may be biased if people use sarcasm in their statements. In order to correctly understand people’s true intentions, being able to detect sarcasm is critical.” [1]

And this is where this project comes into the picture!

“Sarcasm detection is a very narrow research field in NLP, a specific case of sentiment analysis where instead of detecting sentiment in the whole spectrum, the focus is on sarcasm. Therefore the task of this field is to detect if a given text is sarcastic or not.” [3]

Here, I focus on solving the SemEval-2022 Task 6: iSarcasmEval — Intended Sarcasm Detection in English and Arabic [2]

About the data:

  • This data is accumulated using a new data collection method where the sarcasm labels for texts are provided by the authors themselves, thus eliminating labeling proxies (in the form of predefined tags, or third-party annotators)
  • This method is used to collect the data in two languages: Arabic and English
  • For every sarcastic text in the dataset, the intended meaning is also provided by the author.
  • Linguistic experts have classified each sarcastic text into categories of ironic speech defined by Leggitt and Gibbs (2000): sarcasm, irony, satire, understatement, overstatement, and rhetorical question.

The task is further divided into 3 sub-tasks:

  • Subtask A: Given a text, determine whether it is sarcastic or non-sarcastic.
  • Subtask B: A binary multi-label classification task. Given a text, determine which ironic speech category it belongs to, if any.
  • Subtask C: Given a sarcastic text and its non-sarcastic rephrase, i.e. two texts that convey the same meaning, determine which is the sarcastic one.

I will try to solve these tasks using multiple different approaches using the data on hand and show my work in the upcoming blogs.

References:

[1]L. Xu, V. Xu, Project Report: Sarcasm Detection (2019), CS224n: Natural Language Processing with Deep Learning, Stanford University.

[2] A. Farha et al., SemEval-2022 Task 6: iSarcasmEval, Intended Sarcasm Detection in English and Arabic (2022), SemEval-2022.

[3] A. Berasategi, Sarcasm Detection with NLP (2020), Towards Data Science.

--

--