Chen Bing
Information Engineering School of Jiaozuo University,Jiaozuo, 454000,China
Zhang Ting
Information Engineering School of Jiaozuo University,Jiaozuo, 454000,China

DOI:https://doi.org/10.5912/jcb1040


Abstract:

Due to the limitation of memory, heterogeneous data mining algorithm of Internet of things takes too long to process large-scale data. This paper proposes a heterogeneous data mining algorithm for Internet of things based on Spark artificial intelligence architecture. The heterogeneous data is abstracted into the characteristic value of the unified dimension related to the target task, the attribute is established and the preprocessing result is output. A distributed parallel computing framework is established by using Spark artificial intelligence architecture, and the preprocessed data is distributed to different execution points. The data of each execution point is converted into bit matrix to calculate the attribute association degree, and the data is partitioned according to the association result. According to association rules, local and global frequent pattern trees are generated to complete heterogeneous data mining. The experimental results show that the execution time of the heterogeneous data mining algorithm designed in this paper is 1892s, which is 1147s, 1991s and 624s less than HUIM-ACO algorithm, FP-Growth algorithm and ENFP-Growth algorithm, respectively. The algorithm designed in this paper can effectively shorten data processing time and improve the execution efficiency. Internet of Things (IoT)-enabled technologies have been employed for the prevention of many chronic illnesses, but the most essential one is the continuous and real-time tracking system. There has been a constant stream of data generated by wearable medical devices with sensors, health cloud and mobile apps known as streaming big data. It is difficult to gather, process, and analyse such enormous data in real time in order to execute real-time actions in case of crises and extract hidden value because of the faster rate of data production. Use of outdated procedures that are both time-consuming and limiting. Because of this, real-time large data stream processing is essential to offer an efficient and scalable solution. A novel architecture for real-time health status prediction and analytics utilising big data technologies is proposed in this study.