An Adaptive Technique to Categorize Indic Language Documents
The significant growth of the electronic media to store and exchange text documents has led to the use of tools, which analyses and categorizes documents based on their content. The availability of full-text documents in electronic from emphasizes the need for intelligent information retrieval techniques. In Sri Lanka most of the public services use text documents written in Sinhala to provide their services. As a result, there is an essential need for a system which can be used to analyze and process documents in Sinhala. The main techniques examined in this study include data pre-processing and data clustering. The approach makes use of a transformation based on the text frequency, which enhance the clustering performance. This research provides an approach based on Latent Semantic Analysis to process text documents written in Sinhala, and empower citizens and organizations to do their daily work easily.
READ FULL TEXT