Jingyou Zhang
school of Mathmatics and Statistics, Chongqing three Gorges University,Chongqing,404100,China
Haiping Zhong
Mathematics and computer science college , YuZhang Normal University, Nanchang,330103, China

DOI:https://doi.org/10.5912/jcb1055


Abstract:

Healthcare data clustering plays a vital role in discovering meaningful patterns and insights from large and complex datasets. However, the boundary overlap of existing rough set mixed data clustering algorithms often impacts clustering accuracy. To address this issue, we propose a novel rough set mixed data clustering algorithm based on fuzzy mathematics evaluation tailored for healthcare data analysis. Our proposed approach begins by calculating the outlier degree for each object in the dataset, allowing us to identify objects with low outlier degrees to serve as initial cluster centers. Subsequently, we employ fuzzy mathematics evaluation to partition the clusters into core areas and boundary areas, effectively obtaining well-defined cluster structures. To handle heterogeneous healthcare data, we introduce a weighting and summation technique to process samples with different cluster structures. To measure the distance between classification attributes, we devise a distance measurement formula that takes into account the sequential characteristics of ordered classification attributes. This attribute distance is utilized to determine the centroids of each cluster, facilitating rough set mixed data clustering. We validate the effectiveness of our proposed algorithm on six distinct healthcare datasets obtained from the UCI repository. The experimental results demonstrate that the average clustering accuracy achieved by our algorithm surpasses that of the hybrid data clustering algorithm based on K-Modes and k-means. Specifically, when applied to the solar flare dataset, our method achieves an average accuracy of 0.739, outperforming the two comparison algorithms by 0.108 and 0.15, respectively. Overall, our research showcases the efficacy of our proposed clustering algorithm in handling healthcare data, yielding highly promising results when evaluated against various test datasets.