A Fuzzy Approach for Discovery of Web Usage Patterns from Web Log Data
Web server access logs contain substantial data about the accesses of users to a Web site. In order to reveal the
information about user preferences from, Web Usage Mining (WUM) is being performed. WUM contains three main steps:
preprocessing, knowledge extraction and results analysis. During the preprocessing stage, raw web log data is transformed
into a set of user profiles. Each user profile captures a set of URLs representing a user session. Clustering can be applied to
this sessionized data in order to capture similar interests and trends among users’ navigational patterns. Since the sessionized
data may contain thousands of user sessions and each user session may consist of hundreds of URL accesses, dimensionality
reduction is achieved by eliminating the low support URLs. But direct elimination of low support URLs and small sized
sessions may results in loss of a significant amount of information especially when the count of low support URLs and small
sessions is large. We propose a fuzzy solution to deal with this problem by assigning weights to URLs and user sessions
based on a fuzzy membership function. After assigning the weights we apply a Fuzzy c-Mean Clustering algorithm to
discover the clusters of user profiles. Our results show that fuzzy feature evaluation and dimensionality reduction results in
better performance and validity indices for the discovered clusters.
Keywords - Web usage mining; fuzzy c-means clustering, feature evaluation; dimensionality reduction.