Paper Title
Modified N-Gram based Model for Identifying and Filtering Near-Duplicate Documents Detection

Abstract
During last three decades World Wide Web (WWW) has expanded exponentially. A great deal of the web is full of duplicate or near-duplicate content. Documents that are served on the web are in different formats like PDF, HTML, excel and text. Our proposed solution is created on a publicly available dataset files. The dataset consists of files which are tagged as duplicate. Our work in this paper is based on the duplicate and near duplicate document detection using n-Gram based, a low-dimensional demonstration(LSI-SVD) approach, implemented in c#.net. Keywords - Duplicate document, N-gram, SVD (Singular Value Decomposition), LSI(Latent Semantic Indexing), Cosine similarity etc.