International Journal of Advance Computational Engineering and Networking (IJACEN)
current issues
Volume-7, Issue-4  ( Apr, 2019 )
Past issues
  1. Volume-7, Issue-3  ( Mar, 2019 )
  2. Volume-7, Issue-2  ( Feb, 2019 )
  3. Volume-7, Issue-1  ( Jan, 2019 )
  4. Volume-6, Issue-12  ( Dec, 2018 )
  5. Volume-6, Issue-11  ( Nov, 2018 )
  6. Volume-6, Issue-10  ( Oct, 2018 )
  7. Volume-6, Issue-9  ( Sep, 2018 )
  8. Volume-6, Issue-8  ( Aug, 2018 )
  9. Volume-6, Issue-7  ( Jul, 2018 )
  10. Volume-6, Issue-6  ( Jun, 2018 )

Statistics report
Jul. 2019
Submitted Papers : 80
Accepted Papers : 10
Rejected Papers : 70
Acc. Perc : 12%
Issue Published : 75
Paper Published : 1156
No. of Authors : 2892
  Journal Paper

Paper Title
Fast Data Clustering and Outlier Detection using K-Means Clustering on Apache Spark

Abstract
The components forming the information society nowadays are seen in all areas of our lives. As computers have a great deal of importance in our lives, the amount of information has begun to gather meaningful and specific qualities. Not only the amount of information is increased, but also the speed of access to information has increased. Large data is the transformed form of all data recovered from different sources such as social media sharing, network blogs, photos, videos, log files, etc. into a meaningful and workable forms. Clustering on Big Data with machine learning methods is very useful. Clustering process allows very similar data to be placed under a group by separating the data into a specific group. Once datasets are divided, outlier detection is used to find fraudulent data. In this study, it is aimed to make data clustering and outlier detection process faster by using Apache Spark technology on Big Data with K-means clustering method. Clustering on Big Data can be time consuming. For this reason, Apache Spark fast cluster computing architecture is used in this study. It is aimed to perform fault tolerant, reliable, consistent and fast clustering process using this technology. The MLlib library of Spark components has a relatively small code size and ease of use. Its goal is to make practical machine learning scalable and useful. K-means method, which is included in the MLlib library used in this study, provides a successful analysis of big data. The results are presented in tables and graphs using sample dataset. Index Terms— Apache Spark, Big Data, K-means Clustering, Outlier Detection.


Author - Yadigar Erdem, Caner Ozcan

| PDF |
Viewed - 46
| Published on 2017-09-08
   
   
IRAJ Other Journals
IJACEN updates
Paper Submission is open now for upcoming Issue.
The Conference World

JOURNAL SUPPORTED BY