Hi there, let’s get back to one of our favorite topics: Vossian Antonomasia (VA). In our 2019 EMNLP-IJCNLP paper, we tried to detect VA automatically using rule-based methods and a simple neural network approach. The latter showed very promising results, so we continued along this line and brought in some heavier machinery – the Michael Jordans in the field of natural language processing (NLP): pre-trained language models (PLMs).
Neural networks and especially PLMs like BERT have shown that they can improve a wide range of NLP tasks, especially those for which large labeled datasets are not available.
The advantage of language models is the pre-training, which is conducted with large amounts of unlabeled text data. For example, BERT was trained on the English Wikipedia and BooksCorpus. In the pre-training phase, the model learns a basic understanding of language that can be used further for downstream tasks, for example, named entity recognition (NER), sentiment analysis, or part-of-speech tagging.
We decided to try this on Vossian Antonomasia for our latest paper published in Frontiers in Artificial Intelligence. Instead of classifying complete sentences, that is, deciding whether a sentence contains a VA expression or not, we reformulated the task. Now, the machine learning model is trained to identify all parts of a VA within a sentence, that is, the source, target and modifier, and distinguish them from one another. This is called sequence tagging.
Words: | A | Spice | Girls | of | hip-hop | , | the | Wu-Tang | Clan | offers | something | for | every | kind | of | rap | fan |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Tags: | - | B-SRC | I-SRC | - | B-MOD | - | B-TRG | I-TRG | I-TRG | - | - | - | - | - | - | - | - |
For the training of neural networks, we annotated our VA dataset on the word-level, that is we marked for each word in 3,066 sentences whether it is part of a source, target, or modifier or does not belong to a VA at all.
We then used the BERT base model and fine-tuned it with the annotated data. In particular, we added an additional layer on top of the BERT model that computes a tag for each word of the input sentence. Then the parameters were re-computed based on the input data.
In addition, we also trained a neural network model from scratch using a concatenation of ELMo and GloVe embeddings and a bidirectional long short-term memory (BLSTM) neural network with a conditional random field (CRF) on top that also tags each word.
Task | Approach | Precision | Recall | F1 |
---|---|---|---|---|
Classification | Baseline | 0.876 | 0.880 | 0.878 |
BLSTM-ATT | 0.921 | 0.074 | 0.947 | |
BERT-CLF | 0.971 | 0.977 | 0.974 | |
Sequence-Tagging | BASELINE | 0.765 | 0.616 | 0.682 |
BLSTM-CRF | 0.908 | 0.907 | 0.907 | |
BERT-SEQ | 0.908 | 0.944 | 0.926 |
First, we could improve the sentence classification task by almost 0.1 points in F1 score. Also, we could achieve strong results (0.93 in F1 score) on the new sequence tagging task, where BERT outperforms the BLSTM-CRF model.
In addition to the evaluation on our annotated dataset, we conducted a robustness study on real-world newspaper data. We also studied the ability of the model to predict new types of VA focussing on new types of source entities (e.g., organizations, locations, fictional characters) and on new syntactic variations around the source (e.g., “a SOURCE on”, “of SOURCE of”). In total, the model identified around 10,000 VA candidates in the NYT corpus that our previous models were not able to find. The following table shows the most frequently predicted source candidates. Due to limited capacity, we only evaluated samples of these candidates.
Source Candidates | Count |
---|---|
Holy Grail | 116 |
Cadillac | 88 |
Pied Piper | 85 |
RollsRoyce | 71 |
Paris | 60 |
Harvard | 58 |
Microsoft | 43 |
Venice | 42 |
Demon Barber | 39 |
King | 37 |
Switzerland | 37 |
McDonalds | 35 |
Darth Vader | 34 |
Wild West | 33 |
Cinderella | 32 |
Goliath | 29 |
Woodstock | 29 |
In summary, in our newest paper, we developed new models for extracting VAs on the word-level, that is, the models tag all words that belong to a VA expression in a sentence. In addition to the high evaluation scores on our annotated dataset, we showed in multiple robustness studies that the best model is able to predict new versions of VAs regarding syntactic variations and also types of named entities.
The full annotation and deeper analytics of the predicted candidates is one of our next projects. If you are interested in participation, feel free to contact us. Stay tuned!