IJACEN Modified N-Gram based Model for Identifying and Filtering Near-Duplicate Documents Detection

Journal Paper

Paper Title :Modified N-Gram based Model for Identifying and Filtering Near-Duplicate Documents Detection

Author :Farheen Naaz, Farheen Siddiqui

Article Citation :Farheen Naaz ,Farheen Siddiqui , (2017 ) " Modified N-Gram based Model for Identifying and Filtering Near-Duplicate Documents Detection " , International Journal of Advance Computational Engineering and Networking (IJACEN) , pp. 55-59, Volume-5,Issue-10

Abstract : During last three decades World Wide Web (WWW) has expanded exponentially. A great deal of the web is full of duplicate or near-duplicate content. Documents that are served on the web are in different formats like PDF, HTML, excel and text. Our proposed solution is created on a publicly available dataset files. The dataset consists of files which are tagged as duplicate. Our work in this paper is based on the duplicate and near duplicate document detection using n-Gram based, a low-dimensional demonstration(LSI-SVD) approach, implemented in c#.net. Keywords - Duplicate document, N-gram, SVD (Singular Value Decomposition), LSI(Latent Semantic Indexing), Cosine similarity etc.

Type : Research paper

Published : Volume-5,Issue-10


	\|		PDF	\|	Viewed - 67	\|	Published on 2017-12-30

May. 2024
Submitted Papers	:	80
Accepted Papers	:	10
Rejected Papers	:	70
Acc. Perc	:	12%
Issue Published	:	134
Paper Published	:	1558
No. of Authors	:	4061

Published : Volume-5,Issue-10

JOURNAL SUPPORTED BY