CLASIFICATION SYSTEM OF LIBRARY BOOK BASED ON SIMILARITY OF THE BOOK TITLE USING K-MEANS METHOD ( CASE STUDY LIBRARY OF BHAYANGKARA SURABAYA )

In the grouping of book data in the library of Universitas Bhayangkara Surabaya at this time, the grouping is still based on the title and the existing field. So that resulted in the laying of some books whose title is not in accordance with the field of place. To facilitate the grouping of library books, in this research will provide a solution by doing the grouping of books based on the similarity of the title using K-Means method with the distance dissmilarity. The data are grouped a number of 500 titles in the library of Bhayangkara University Surabaya. The data will be processed through the Pre-processing process first of each book title by using the Information Retrieval System which results in the basic word. The basic word that will be used as a feature in the process of grouping so that can be known similarity. The result of the research is that it can be concluded that the application of Library Book Grouping System Based on Similarity of Book Title Using K-Means Method (Case Study of Bhayangkara Library Surabaya) is suitable for data that has been specified on each title. And some processes there are clusters that are always consistent in putting the book data in accordance with the similarity. Of all test results that have the best silhouette value is on using the value of K = 7, ie in the process to 1 with the value of silhouette = 0.2221


INTRODUCTION
Library is one place as a medium of learning through reading books that are provided in accordance with the field title of the book.The task of the library is to develop a collection of books, manage, and care for library materials, provide services as well as carry out the library administration.One example is like grouping books on every shelf that must be appropriate & neat, so it can facilitate visitors in the search book.If the library is well managed, it can be used as a place to read as well as learn comfortably for library visitors.

Backgroud
In the classification of book data in the library of Universitas Bhayangkara Surabaya at this time, the grouping is still based on the existing field.So that resulted in putting some books there whose title does not match with other titles.Things like this can make it difficult for visitors to find the book they are looking for.The previous researcher, Krisna Dwi Ananta, has been researching in the Final Project to create a grouping system of books in UBHARA library using K-Means method with Cosine Similarity distance.However, the results of the grouping system of the book is still not relevant or less suitable if applied.Because in the selection of attributes only based on the number & thickness of the existing book pages.This can cause multiple titles of books that have different genres mixed in one cluster.
Therefore, in this final project conducted a research entitled "Classification System of Library Book Based on Similarity of The Book Title Using K-Means Method (Case Study of Bhayangkara Library Surabaya)".In this research Pre-processing will be done in advance of each book title which will be used as a feature so it can be known similarity.It is expected that this research can produce a system for grouping books based on the similarity of the title of the book to facilitate library visitors in finding the desired book.

Formulation of the problem
Some of the main issues related to the research are as follows: 1) How do the process the Pre-processing of each book title to get the title resemblance using the InformationRetrieval System, which results will be used as a feature in the grouping process?2) How to create library grouping system based on similarity of book title using K-Means method with Dissmilarity distance measurement?3) How to measure cluster validation or grouping accuracy in each grouping result?

Limitations of the problem
Limitations of the problems in this study are as follows: 1) For grouping process use K-Means method & use Dissmilarity distance measurement.
2) The feature used for grouping is the basic word processed with InformationRetrieval System technique as Pre-processing of the title of the book.
3) The data used as a grouping of only the title of the book contained the Indonesian language.4) Data taken only the title of the book from the library UBHARA.

Research Purposes
This research has the following objectives:: 1) Create an application that can do the grouping book based on the similarity of the title of the book in UBHARA library.2) To analyze whether it is appropriate to use InformationRetrieval System as a pre-processing in order to get a base word that will serve as a feature in K-Means grouping and use Dissmilarity distance measurement.

THEORETICAL BASIS
Theoretical basis contains the theories that support in making research and system.

Data Mining
Tan (2006) defines data mining as a process for obtaining useful information from large database warehouses.Data mining can also be interpreted as extracting new information derived from large data chunks that aid in decision making.The term data mining is sometimes called knowledge discovery.Some of the techniques that are often mentioned in the literature of data mining in its application include: clustering, classification, association rule mining, neual network, genetic algorithm and others.What distinguishes perceptions of data mining is the development of data mining techniques for applications on large-scale databases.Before the popularity of data mining, these techniques can only be used for small-scale data.

K-Means Algorithm
K-Means is one method of nonhierarchy data grouping (sekatan) that seeks to partition existing data into two or more groups.This method partitions the data into groups so that the same characteristic data is entered into the same group and the different characteristic data are grouped into the other group.Understanding K-Means in this Final Project is referenced from Data Mining Concept and Application book using Matlab (Eko Prasetyo).The purpose of this data grouping is to minimize the objective function set in the grouping process, which generally attempts to minimize variation within a group and maximize variation between groups.
Grouping of data by K-Means method is generally done with the following algorithm: 1. Determine the number of groups.
2. Allocate data into groups at random.3. Calculate the center of the group (sentroid / average) of the data in each group.4. Allocate each data to the nearest centroid / average. 5.Return to step 3, if there is still data moving group, or if there is a change of centroid value above the specified threshold value, or if the value change on the objective function used is still above the specified threshold value.
In step 3, the centroid location of each group taken from the mean (mean) of all data values on each feature must be recalculated.If M denotes the amount of data in a group, i denotes the i feature in a group, and p denotes the data dimension, to compute the i-feature feature centroid.The formula is done as much as p dimensions so that i starts from 1 to p, the formula is as follows.
In step 4, the re-allocation of data into each group in the K-Means method is based on the comparison of the distance between the data with each group's centroid.The data is reallocated explicitly to the group that has the centroid with the closest distance from the data.This allocation can be formulated as follows (MacQueen, 1967).
is the value of the membership of point to the center of group , d is the shortest distance from the to K group data after being compared, and is the 1 st centroid.
The objective function used for K-Means is determined by the distance and value of the data membership in the group.The objective function used is as follows (MacQueen, 1967).
N is the amount of data, K is the number of groups, s the membership value of the data point to the center of the group , is the center of the l st group, and D ( , ) is the point distance to the group followed.a has a value of 0 or 1.If a data is a member of a group, the value = 1.Otherwise, the value of = 0.

Distance Measurement 2.3.1. Cosine Similarity
In this method the term weight contained in each document is presented in a vector.For example document a is presented by vector a = { , , , … } and document b is presented by vector b = { , , , … }.This correlation can be quantified by the cosine angle between two vectors in the equation below: With | a | and | b | is the norm a and b, the value of sim (a, b) varies from 0 to 1.This value indicates that the higher the value (a.b) the greater the similarity of the two vectors.

Dissimilarity
Dissimilarity is a numerical degree in which two objects are different, their range is 0 to 1, or even to ∞.This distance is used to calculate in k-means contained in equation (1) and equation (3).If similarity is a similarity measure then dissimilarity is a measure of unlike, and if the interval is [0,1], then dissimilarity can be formulated as follows.

System Planning
provide an overview of system design to be built or developed, and to understand the flow of information and processes in the system.

Image 1 Display DFD Preprocessing
Description of image 1 DFD levelPreprocessing using informationretrieval system & following its processing:: 1. Token process, retrieve every word & delete unimportant characters.2. Stopword process, delete the connect / word that often appears.
The stemming process, returning to the base word which will be used as a feature for grouping process.