I worked on several NLP tasks as a Machine Learning Engineer in 42Maru, with a special focus on text summarization.
42Maru is a Korean start-up, based in Seoul.
They run QA (Question Answering) solutions based on deep-learning.
When I joined the company, my first tasks were focused on research for several NLP tasks :
English paraphrasing : I learned and worked with Keras, and implemented a Pervasive Attention neural network.
Document Similarity : I started to work on this project right after BERT came out, so I extensively studied the architecture. I used BERT as service, and implemented a Siamese network on top of it, using Keras. This architecture lead to great results on the SICK and STS-B datasets.
Question Generation : In order to improve the MRC model through data augmentation, a coworker and I implemented a basic attentional encoder-decoder architecture (in Pytorch) for question generation. We then improved the performance by using BERT as encoder (and LSTM as decoder) and achieved SOTA on SQuAD dataset for QG :
Previous SOTA | Our Model | |
---|---|---|
BLEU 1 | 45.07 | 49.85 |
BLEU 2 | 29.58 | 36.79 |
BLEU 3 | 21.60 | 29.36 |
BLEU 4 | 16.38 | 23.98 |
<aside> 🎯 Skills
Keras
BERT
Transformer
NLP
Pytorch
Tensorflow
Flask
</aside>
Machine Reading Comprehension : After trying to improve the score of the previous model (using Tensorflow) of the company, we implemented 2 API (English + Korean) to make these MRC models available through a website (using Flask) :
Text Summarization was my main task while working at 42Maru.
I developed several models beating the SOTA in English text summarization, and built a production-ready, controlable demonstration.
After learning the basics of text summarization, I used Presumm as a starting point (because it was the current SOTA), and integrated XLNet, a newer model. I could improve the current SOTA on CNN/DM dataset in both Extractive and Abstractive summarization :
Extractive | PreSumm (Previous SOTA) | Ours |
---|---|---|
ROUGE 1 | 43.23 | 43.73 |
ROUGE 2 | 20.24 | 20.50 |
ROUGE L | 39.63 | 40.08 |
Abstractive | PreSumm (Previous SOTA) | Ours |
---|---|---|
ROUGE 1 | 42.13 | 42.60 |
ROUGE 2 | 19.60 | 20.16 |
ROUGE L | 39.18 | 39.67 |
<aside> 🎯 Skills
Pytorch
BART
Transformers
Fairseq
Beam search
Tensorflow 2
</aside>
In addition to better scores, my contributions allowed to generate summary from documents of any length, while the original PreSumm code has a limitation on the input document’s length.
Later, new models (BART, PEGASUS, etc…) were released, beating the SOTA by a large margin. I studied these models and their framework (transformers, fairseq).
I could improve BART model by adding a mecanism allowing us to control the abstractiveness of the generated summaries, without even re-training the model ! This mecanism also gave better results :
BART (Previous SOTA) | Ours | |
---|---|---|
ROUGE 1 | 44.16 | 44.86 |
ROUGE 2 | 21.28 | 21.60 |
ROUGE L | 40.90 | 41.77 |
But BART is a big model, and the GPU available at the company weren’t big enough for training.