Multi Document Text Summarization using Distilled Transformers
Abstract - With increase in amount of unstructured data the need to extract meaningful and precise insights has become equally critical. Text data is often stored in different file formats ranging from pdfs, docx and images. With the help of advanced abstractive summarization techniques (Transformers), the amount of time and efforts spent on extracting useful insights from lengthy and varied documents can be reduced. In our approach, we have identified distilled transformers to solve our problem in much faster and better way. Distillation is a compression technique that involves training a small model to mimic the behaviors of a larger model. This helps to get better performance over existing Transformer models with additional benefit of lightweight, responsive and energy efficient. We have tested our hypothesis with distilBART and distilPEGASUS and got promising results on metrics like ROUGE scores.
Keywords - Document Summarization, Distil Transformers, Abstractive Summarization, distilBART, distilPEGASUS, NLP.