فصلنامه علمی کارافن

فصلنامه علمی کارافن

خوشه بندی براساس ماتریس وابستگی و گراف جستجوی سطح اول

نوع مقاله : مقاله پژوهشی (نظری)

نویسنده
استادیار گروه مهندسی کامپیوتر ، دانشگاه ملی مهارت، تهران، ایران
چکیده
الگوریتم‌های خوشه‌بندی مبتنی بر چگالی به دلیل توانایی آنها در شناسایی خوشه‌هایی با اشکال مختلف و اشیاء نویز معمولاً در یادگیری ماشین و داده‌کاوی استفاده می‌شوند. این الگوریتم‌‌ها در تحلیل داده‌ها و کاربرد خروجی تحلیل آنها در صنعت و تجارت معروف هستند. با این حال، الگوریتم‌های سنتی خوشه‌بندی ممکن است در مجموعه داده‌هایی با چگالی‌های مختلف و خوشه‌های همسایه درهم تنیده مشکل داشته باشند. برای پرداختن به این چالش‌ها، یک الگوریتم خوشه‌بندی مبتنی بر چگالی جدید در این مقاله پیشنهاد شده است. در این الگوریتم ماتریس وابستگی و گراف جستجوی سطح اول برای یافتن نقاط پر چگال و ارتباط بین آتها استقاده شده است، مفهوم فضای مربوطه برای تعریف چگالی محلی و سراسری معرفی می‌گردد، و از یک روش شناسایی نقاط مرکزی برای شناسایی ساختارهای خوشه‌ای استفاده شده است. این الگوریتم همچنین از یک استراتژی تخصیص بر اساس فضای مربوطه برای اشیاء باقی مانده برای دستیابی به نتایج خوشه‌بندی دقیق استفاده می‌کند. نتایج تجربی بر روی مجموعه داده‌های واقعی، اثربخشی روش پیشنهادی را در عملکرد خوشه‌بندی نشان می‌دهد.
کلیدواژه‌ها
موضوعات

عنوان مقاله English

Clustering based on Affinity Matrix First Search-Breadth Graph

نویسنده English

shahin pourbahrami
Assistant Professor, Department of Computer Engineering, National University of Skills (NUS), Tehran, Iran
چکیده English

Density-based clustering algorithms are commonly used in machine learning and data mining due to their ability to identify clusters with different shapes and noisy objects. These algorithms are famous in data analysis and the use of their analysis output in industry and business. However, traditional clustering algorithms may have difficulty with datasets with different densities and overlapping neighboring clusters. To address these challenges, a new density-based clustering algorithm is proposed in this article. In this algorithm, the dependency matrix and the first -level search graph are used to find the dense points and the connection between the points, the concept of the relevant space is introduced to define the local and global density, and a central point identification method is used to identify the cluster structures. This algorithm also uses an allocation strategy based on the relevant space for the remaining objects to achieve accurate clustering results. Experimental results on real data sets show the effectiveness of the proposed method in clustering performance.

کلیدواژه‌ها English

clustering
nearest neighbor
first level search
Affinity matrix
graph
[1] Chen, Y., Hu, X., Fan, W., Shen, L., Zhang, Z., Liu, X., Du, J., Li, H., Chen, Y., & Li, H. (2020). Fast density peak clustering for large scale data based on kNN. Knowledge-Based Systems, 187, 104824, https://doi.org/10.1016/j.knosys.2019.06.032.
[2] Jan, Z., Ai-Ansari, N., Mousa, O., Abd-Alrazaq, A., Ahmed, A., Alam, T., & Househ, M. (2021). The role of machine learning in diagnosing bipolar disorder: scoping review. Journal of medical Internet research, 23(11), e29749,  https://preprints.jmir.org/preprint/29749.
[3] Wen, J., Xuan, S., Li, Y., Peng, Q., & Gao, Q. (2020). Image segmentation algorithm based on neutrosophic fuzzy clustering with non‐local information. IET Image Processing, 14(3), 576-584, https://doi.org/10.1049/iet-ipr.2018.5949.
[4] Zou, Q., Lin, G., Jiang, X., Liu, X., & Zeng, X. (2020). Sequence clustering in bioinformatics: an empirical study. Briefings in bioinformatics, 21(1), 1-10, https://doi.org/10.1093/bib/bby090.
[5] Tombros, A., Villa, R., & Van Rijsbergen, C. J. (2002). The effectiveness of query-specific hierarchic clustering in information retrieval. Information processing & management, 38(4), 559-582, https://doi.org/10.1016/S0306-4573(01)00048-6.
[6] Jain, A. K. (2008). Data clustering: 50 years beyond k-means. Joint European Conference on Machine Learning and Knowledge Discovery in Databases, https://doi.org/10.1007/978-3-540-87479-9_3.
[7] Jin, H., Leung, K.-S., Wong, M.-L., & Xu, Z.-B. (2005). Scalable model-based cluster analysis using clustering features. Pattern Recognition, 38(5), 637-649, https://doi.org/10.1016/j.patcog.2004.07.012.
[8] Murtagh, F., & Contreras, P. (2012). Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1), 86-97,  https://doi.org/10.1002/widm.53.
[9] Kriegel, H. P., Kröger, P., Sander, J., & Zimek, A. (2011). Density‐based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3), 231-240,  https://doi.org/10.1002/widm.30.
[10] Kumar, K. M., & Reddy, A. R. M. (2016). A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method. Pattern Recognition, 58, 39-48, https://doi.org/10.1016/j.patcog.2016.03.008.
[11] Tang, C., Wang, H., Wang, Z., Zeng, X., Yan, H., & Xiao, Y. (2021). An improved OPTICS clustering algorithm for discovering clusters with uneven densities. Intelligent Data Analysis, 25(6), 1453-1471, https://doi.org/10.3233/IDA-205497.
[12] Hu, L., & Chan, K. C. (2015). A density-based clustering approach for identifying overlapping protein complexes with functional preferences. BMC bioinformatics, 16, 1-16, https://doi.org/10.1186/s12859-015-0583-3.
[13] Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. science, 344(6191), 1492-1496, DOI: 10.1126/science.1242072.
[14] Guo, W., Wang, W., Zhao, S., Niu, Y., Zhang, Z., & Liu, X. (2022). Density peak clustering with connectivity estimation. Knowledge-Based Systems, 243, 108501, https://doi.org/10.1016/j.knosys.2022.108501.
[15] Xu, X., Ding, S., & Shi, Z. (2018). An improved density peaks clustering algorithm with fast finding cluster centers. Knowledge-Based Systems, 158, 65-74, https://doi.org/10.1016/j.knosys.2018.05.034.
[16] Xie, J., Gao, H., Xie, W., Liu, X., & Grant, P. W. (2016). Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors. Information Sciences, 354, 19-40, https://doi.org/10.1016/j.ins.2016.03.011.
[17] Xu, X., Ding, S., Wang, L., & Wang, Y. (2020). A robust density peaks clustering algorithm with density-sensitive similarity. Knowledge-Based Systems, 200, 106028, https://doi.org/10.1016/j.knosys.2020.106028.
[18] Hou, J., Zhang, A., & Qi, N. (2020). Density peak clustering based on relative density relationship. Pattern Recognition, 108, 107554, https://doi.org/10.1016/j.patcog.2020.107554.
[19] Tong, W., Liu, S., & Gao, X.-Z. (2021). A density-peak-based clustering algorithm of automatically determining the number of clusters. Neurocomputing, 458, 655-666, https://doi.org/10.1016/j.neucom.2020.03.125.
[20] Zhu, Y., Ting, K. M., Carman, M. J., & Angelova, M. (2021). CDF Transform-and-Shift: An effective way to deal with datasets of inhomogeneous cluster densities. Pattern Recognition, 117, 107977, https://doi.org/10.1016/j.patcog.2021.107977.
[21] Yaohui, L., Zhengming, M., & Fang, Y. (2017). Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy. Knowledge-Based Systems, 133, 208-220, https://doi.org/10.1016/j.knosys.2017.07.010.
[22] Wang, Y., Wang, D., Zhang, X., Pang, W., Miao, C., Tan, A.-H., & Zhou, Y. (2020). McDPC: multi-center density peak clustering. Neural Computing and Applications, 32, 13465-13478, https://doi.org/10.1007/s00521-020-04754-5.
[23] Ding, S., Du, W., Xu, X., Shi, T., Wang, Y., & Li, C. (2023). An improved density peaks clustering algorithm based on natural neighbor with a merging strategy. Information Sciences, 624, 252-276, https://doi.org/10.1016/j.ins.2022.12.078.
[24] Pourbahrami, S., Balafar, M. A., & Khanli, L. M. (2023). ASVMK: A novel SVMs Kernel based on Apollonius function and density peak clustering. Engineering Applications of Artificial Intelligence, 126, 106704, https://doi.org/10.1016/j.engappai.2023.106704.
[25] Pourbahrami, S., & Hashemzadeh, M. (2022). A geometric-based clustering method using natural neighbors. Information Sciences, 610, 694-706, https://doi.org/10.1016/j.ins.2022.08.047.
[26] Pourbahrami, S., Balafar, M. A., Khanli, L. M., & Kakarash, Z. A. (2020). A survey of neighborhood construction algorithms for clustering and classifying data points. Computer Science Review, 38, 100315, https://doi.org/10.1016/j.cosrev.2020.100315.
[27] Pourbahrami, S., Khanli, L. M., & Azimpour, S. (2019). A novel and efficient data point neighborhood construction algorithm based on Apollonius circle. Expert Systems with Applications, 115, 57-67, https://doi.org/10.1016/j.eswa.2018.07.066.
[28] Abdolmaleki, N., Khanli, L. M., Hashemzadeh, M., & Pourbahrami, S. (2022). ACQC: Apollonius Circle‐based Quantum Clustering. Journal of Computational Science, 64, 101877, https://doi.org/10.1016/j.jocs.2022.101877.
[29] Pourbahrami, S., Khanli, L. M., & Azimpour, S. (2020). Improving neighborhood construction with Apollonius region algorithm based on density for clustering. Information Sciences, 522, 227-240, https://doi.org/10.1016/j.ins.2020.02.049.
[30] Sundqvist, M., Chiquet, J., & Rigaill, G. (2023). Adjusting the adjusted Rand Index: A multinomial story. Computational Statistics, 38(1), 327-347, https://doi.org/10.1007/s00180-022-01230-7.
[31] Amelio, A., & Pizzuti, C. (2017). Correction for closeness: Adjusting normalized mutual information measure for clustering comparison. Computational Intelligence, 33(3), 579-601,  https://doi.org/10.1111/coin.12100.
 
دوره 22، شماره 1
فنی و مهندسی
بهار 1404
صفحه 36-59

  • تاریخ دریافت 30 مرداد 1403
  • تاریخ بازنگری 13 آبان 1403
  • تاریخ پذیرش 04 آذر 1403