58th Annual Meeting of ACL (Association for Computational Linguistics) 2020 July 5 - July 8
Unlike the NAACL (North American Chapter of the Association for Computational Linguistics which is the regional arm of ACL) conference last year, held in Minneapolis, ACL 2020 becoming virtual was very convenient. There were no issues with parking spots, which allows everybody to watch more sessions than usual. One essential item to consider with virtual conferences is time zones. Presenters are around the globe, so make sure the presentation time is in local time. Here is my account of ACL 2020 and a quick framing of what I think the effect and impact of sessions in ACL 2020 would be.
ACL 2020 - Effect and Impact
BERT (Bidirectional Encoder Representations from Transformers) is not going away anytime soon.
Bidirectionality in deep neural networks gives better representations of words because it takes into consideration the left and the right context. The built-in bidirectionality in BERT shows that context helps in most NLP (Natural Language Processing) tasks. That is why BERT is being used and explored by NLP practitioners. BERT has been around for almost two years now and was formally submitted to NAACL in 2019. Here is the list of every session in ACL 2020 that explicitly included BERT in the name of the paper: MobileBERT, ExpBERT, DeeBERT, GAN-BERT, exBERT, BERTRAM, SentiBERT, CamemBERT, CluBERT, tBERT, TaBERT, SenseBERT, FastBERT, schuBERT and there are several more papers that did not explicitly included BERT in the name but used BERT. The general approach of the sessions that used BERT are the following:
1 .pre-train BERT on a new dataset
2. fine-tune BERT on a specific task
3. fine-tune BERT on a specific task then benchmark it using a standard dataset like GLUE,
4. include BERT in the system pipeline for classification or generating embeddings
5. exploring BERT thru ablation or adding external features for task performance improvement.
Effect and Impact:
- Using deep neural models like BERT for NLP tasks is becoming a standard approach to getting a decent performance in the shortest amount of time. As an organization, we should be looking at how we would be able to include such models in our product platforms. The reason is that most of our products are knowledge-based solutions that rely heavily on NLP based tasks for improvement (e.g., Question Answering, Summarization, Text Classification).
- The downside to using deep neural models for NLP tasks is that it is more like a trial and error approach. The ACL community is well aware of this dilemma, which is why the BERT bandwagon that we see in ACL is accompanied by studies into understanding deep neural models for NLP tasks, as discussed below.
- The number of papers submitted using BERT was overwhelming and clearly showed its influence on the research that is happening for NLP.
Deep Neural Network NLP models need to be understood.
Deep neural network models have been pushing the state of the art performance for several NLP tasks. The tutorial for "Interpretability and Analysis in Neural NLP," as well as several papers, are looking into these deep neural networks.
"Showing your work doesn't always work"
"Contextual Embeddings: When are they worth it?"
"Understanding Attention for Text Classification"
Effect and Impact:
- The ACL community is doing studies to understand deep neural NLP models thru probing, testing, or measuring mutual information. The interest in understanding neural network NLP models are giving rise to visualization tools like LSTMVis , Captum , AllenNLP Interpret as well as give us a toolbox of interpretability for deep neural network NLP models.
- As of now, the studies surrounding the understanding of neural network NLP models give us practical insights on:
- how neural networks acquire syntactic knowledge
- how neural networks aid the scientific study of language
- creating better systems.
- accountability
- user's right to explanation
- Here are some of the papers that explore deep neural NLP models
Question Answering is being explored using transformer models, data augmentation, distant supervision, and new approaches for retrieval.
With BERT achieving higher accuracy against SQuAD (Stanford Question Answering Dataset) thru the power of contextual embeddings, transformer models are being explored for Question Answering.
Here are some of the papers that used BERT based models for Question Answering in ACL 2020:
- Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering
- Contextualized Sparse Representations for Real-Time Open-Domain Question Answering
- Span Selection Pre-training for Question Answering
- DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Effect and Impact:
- We can see the impact of large pre-trained models (e.g. BERT) using a large corpus (e.g. Wikipedia) on Question Answering as it captures knowledge
- Leveraging transformer models (e.g. BERT) in Question Answering can be combined with new unsupervised retrieval approaches:
- Augmenting training data and distant supervision can also improve on Question and Answering tasks:
- Transformers models are good at getting answers, but some answers to questions can be found in different places like Knowledge graphs and sometimes visual images:
Conclusion
Deep Neural Networks (e.g. BERT) for NLP tasks are here to stay. We should look into understanding deep neural networks because it becomes less of a trial and error if we understand deep neural networks. If we use deep neural networks in our system, a better understanding of them can help us create better, accountable, and interpretable systems. Deep Neural Networks models impact the research direction of NLP tasks, for example, Question Answering is being explored with such models.
Comments