Wenjun Shao
Guizhou university of finance and economics,,Guiyang, Guizhou Province. China.550004

Abstract:

In this paper, we propose a new approach to the automatic classification of legal law feature words based on TF-IDF algorithm. The proposed algorithm considers the various application scenarios to extract high quality law data, and it can be used in various fields such as law research, decision support and legal information retrieval.

As is known to us all, there are many different types of laws in different countries. It is always difficult for the lawyers to find out whether a given piece of legislation is applicable to their case or not. In this paper, we propose a new approach to the automatic classification process based on TF-IDF algorithm with three steps: preprocessing step, extraction process and feature selection step. The main purpose of these three steps is to ensure the high quality and standardized legal law feature words in order to extract after training and our algorithm can be used for different applications areas.

The preprocessing step adopts the word embedding technique in order to assign each legal law feature word into one-hot vector with 100 dimensions. After that, we will take advantage of TF-IDF algorithm to calculate and count the frequency value for each feature words in source text collection and compute a semantic similarity between target documents and source documents according to different models such as logistic model, decision tree etc. And then we will choose a relevant model from the collection randomly.