Advanced NLP: Assignment 2
Transformer-Based Semantic Role Labeling (SRL)
A) Introduction
For this assignment you will fine-tune a transformer-based model for SRL (given the predicate). You can reuse some of the previous code developed to pre-process corpus and evaluate the results – however you will need to adapt it as needed. The task consists of: i) familiarizing yourself with HuggingFace transformers’ API and ii) adapt a pre-existing codebase for NER and fine-tune a BERT-base model for SRL. iii) evaluate the results.
B) Objectives:
- Familiarize yourself with Hugginface Transformers libraries, specifically targeting
understanding how to use and fine-tune existing LLMs to perform. sequence labelling
- Learn about the concept of subword tokenization, inherent to transformer models, and learn to deal with its impact on the input and the output of these models
- Gain hands-on experience developing/adapting transformer-based classifiers by fine-turning a BERT-family LLM for token-level negation scope detection.
C) What to do:
1. Start by reading Simple BERT Models for Relation Extraction and Semantic Role Labeling (Peng Shi, Jimmy Lin) and Joint Training with Semantic Role Labeling for Better Generalization in Natural Language Inference (Cemil Cengiz, Deniz Yuret).
Note: If you have not done so for a previous course, you can also benefit from reading:
NegBERT: A Transfer Learning Approach for Negation Detection and Scope Resolution (Aditya Khandelwal, Suraj Sawant). This is for a different NLP task, but their methods can also be suitable for the task of SRL.
2. Make sure you can run and understand the Python Notebook you will be adapting: https://github.com/huggingface/notebooks/blob/main/examples/token_classification.ipynb . We recommend using this adapted version made available for ML4NLP here. This notebook should be your starter code. It is designed for NER, and the goal is to adapt it for SRL.
Note: When you adapt the code, it is possible that you have questions. If you do, revisit/get better acquainted with the Huggingface’s Transformers and companion libraries. These are important libraries for NLP, but they are also quite extensive. It is not expected that you know the library by heart, but it is important to be able to navigate documentation as needed to understand existing codebases. Depending on your background and on the time you have spent with this library, we recommend the following:
- https://huggingface.co/docs/transformers/en/tasks/token_classification (quick introduction to token classification)
- If you need a bit more detail, you may also find useful to follow some sections of this tutorial: https://huggingface.co/learn/nlp-course/en/chapter1/1 – sections 2 “ Using Transformers” , 3 “ Fine-tuning a pretrained model” should be especially useful.
3. Adapt the notebook for the task of SRL, and fine-tune an LLM from the BERT-family for this sequence labeling task:
a) Make sure you are able to explicitly deal with the relation between predicates and the task of SRL. You can follow one of the methods used in any of the papers listed above, or you choose other suitable methods. Using a suitable method is a minimum requirement to pass this assignment. This will require you to adapt the input to the model in some way. If you have questions about the suitability of a new method, ask!
b) We recommend that you use distilbert-base-uncased (the default model in the original notebook) for your work. This model is smaller and therefore faster to fine-tune (and can also be used by less powerful machines). If you wish to do so, you are allowed to use other bert-style models for your experiments. Make sure you make it clear which model you are using and why.
c) Consider how you should post-process the output of the model to provide metrics that are suitable for the shared task. In particular, you must ensure that the number of tokens in your confusion matrix match the number of predictions expected by the test set. This needs to be motivated and dealt with explicitly.
d) Prepare the code to evaluate your model on the evaluation set. Provide Precision, Recall and F1 measures for token-level classification. Make sure to also include a labeled confusion matrix that supports the classification metrics. Store your model’s predictions over the preprocessed test set and save it in text format (e.g., as a tsv). Make sure this file contains, at least, the token, the gold label and the predicted label for each prediction.
e) Prepare a ready-to-use function where one can use the trained model to perform SRL on standalone sentences, given the predicate. Among other necessary arguments (e.g., model, etc.), the function should allow the input of a sentence segmented as a list of strings (e.g., [‘ Pia’ , ‘asked’ , ‘ Luis’ , ‘to’ , ‘write’ , ‘this’ , ‘sentence’ ,’ .’]) and a list defining the location of the predicate to label (e.g. [0,0,0,0,1,0,0,0], for the predicate ‘write’). If you think of another better way to design this function, that is also acceptable, as long as it is well documented. Provide an example showing that the function runs a sentence with more than one predicate (you can choose your own sentence(s)!). Note: this function will be important for your take-home exam.
4. Make sure you carefully document the pipeline. Try to make use of a mix of markdown and code comments.
6. Submit one zip file containing a Jupyter notebook (and HTML printout) accompanied by a requirements.txt file and any number of python modules with helper functions. Include also your model’s predictions on the testset (e.g., as a tsv). Do not upload the saved model on Canvas! Provide a link to download the model instead (e.g. needs to be a public link) – make sure the link is available at the top of your notebook. Make sure you run the notebook and save the notebook with the output of all cells. Read more information about the requirements below.
D) Requirements for Jupyter Notebook:
The Python Notebook should be formatted in a way that will substitute a written report. As such, it should be crafted with care, highlight all important steps of the pipeline and, when necessary, include explaining text and notes about decisions. Your report must include:
- A (publically open) link to download the trained model. We recommend using google drive to share a zip containing the model. 』 Do not upload your model on Canvas!
- A printed summary of the statistics for both the training and test sets (see above). Make sure your evaluation (and confusion matrix) matches the numbers you have printed in these statistics (i.e. the total number of tokens must match).
- Explain and exemplify, providing 1 or 2 examples, how you chose to preprocess the input. Make sure your examples include both human-readable (i.e., using text) examples and machine-readable input (i.e., using sub-word ids). Use prints to show these examples (do not provide them as text/comments).
- A printed example showing an excerpt (1 or 2 sentences) of the data as it is fed into the model (e.g., similar to the output of the function tokenize_and_align_labels ()in the starter notebook). Make sure no gold labels are passed into the model as features other than label.
- Explain and exemplify, providing 1 or 2 examples, how you process the output of the model. Make sure that your system’s predictions match the tokenization required by the shared-task. Describe the heuristics you use to go from subword to token level predictions.
- A printed evaluation table using Scikit Learn’s evaluation report, including a labeled confusion matrix. You should also include a couple paragraphs discussing these results (similar to A1)
- A printed example showing that your function to perform SRL on standalone sentences is working.
Please note that you can and should delete cells in the starter notebook that are used for demonstration/tutorial purposes (i.e., not specifically for your experiment). The notebook should focus on your experiment.
E) Notes on Computation Power:
We know you might not have the computing power to fully train your model for multiple epochs on a local machine, but using a free service like https://colab.research.google.com/ should be sufficient to run at least one epoch (probably more). Focus on the quality of execution, and not necessarily the quality of the end product.
F) What to submit:
Each student submits one zip file using the predefined naming convention (e.g. A2-Student Name.zip).
Inside the zip you should include:
- A requirements.txt with the necessary installation requirements.
- A Python Notebook showcasing the full experiment. This should be submitted both as a notebook (.ipyn) and as an HTML (.html). Make sure you save the notebook (and the HTML) after running every cell – so the outputs are also saved. You should be able to confirm this by inspecting the HTML.
- Any number of helper python modules (if needed).
- The model’s predictions on the testset as a text file (e.g., as a tsv).
G) Grading:
The assignment will be graded on a Pass/Fail basis. And based on the following requirements:
- Produce a running Python Notebook, including all steps (corpus preprocessing,
processing the input for fine-tuning, training, post processing the output and evaluating the trained model) to train and evaluate a transformer-based model for SRL;
Note: make sure the code runs and does not depend on any files that are not included with your submission (including preprocessed datasets). The only files your code can (and should) depend on are the original Universal Propbank data.
- The Python notebook should be structured and documented with explanations about the code pipeline.