Machine Learning For Real Estate Contracts Automatic Categorization of Text
Authors: - IJCTC.Mani, J.Jayasudha
Number of views: 79
Automatic Text Classification is a machine learning task that automatically assigns a given
document to a set of pre-defined categories based on its textual content and mined features. Automatic
Text Classification has important applications in content management, contextual search, estimation
mining, product review analysis, spam filtering and text sentiment mining. This paper explains the
generic strategy for automatic text classification and analyses existing solutions to major issues such as
dealing with unstructured text, handling large number of features and selecting a machine learning
technique appropriate to the text-classification application.
There are statistical model, rule based model, hybrid model. Statistical model is based on training text
which configured in each categories, Rule Based model is based on rules like Positive term, Negative
term, Relevant term, Irrelevant term. Positive term list of mandatory terms. Negative Term list of
excluding terms. Relevant Term list of relevant terms. Irrelevant Term list of irrelevant terms. Hybrid
model is combination of statistical and rule based model. Hybrid model will give the accurate result. At
first model will be created as statistical model to get the exact result later for fine tuning process have to
add terms so at last the model will look as hybrid model.
We will discuss in detail issues pertaining to three different problems, namely, document representation,
classifier construction, and classifier evaluation.