خوشه بندی براساس ماتریس وابستگی و گراف جستجوی سطح اول

پوربهرامی, شهین

doi:10.48301/kssa.2024.468427.2944

خوشه بندی براساس ماتریس وابستگی و گراف جستجوی سطح اول

نوع مقاله : مقاله پژوهشی (نظری)

نویسنده

شهین پوربهرامی

استادیار گروه مهندسی کامپیوتر ، دانشگاه ملی مهارت، تهران، ایران

10.48301/kssa.2024.468427.2944

چکیده

الگوریتم‌های خوشه‌بندی مبتنی بر چگالی به دلیل توانایی آنها در شناسایی خوشه‌هایی با اشکال مختلف و اشیاء نویز معمولاً در یادگیری ماشین و داده‌کاوی استفاده می‌شوند. این الگوریتم‌‌ها در تحلیل داده‌ها و کاربرد خروجی تحلیل آنها در صنعت و تجارت معروف هستند. با این حال، الگوریتم‌های سنتی خوشه‌بندی ممکن است در مجموعه داده‌هایی با چگالی‌های مختلف و خوشه‌های همسایه درهم تنیده مشکل داشته باشند. برای پرداختن به این چالش‌ها، یک الگوریتم خوشه‌بندی مبتنی بر چگالی جدید در این مقاله پیشنهاد شده است. در این الگوریتم ماتریس وابستگی و گراف جستجوی سطح اول برای یافتن نقاط پر چگال و ارتباط بین آتها استقاده شده است، مفهوم فضای مربوطه برای تعریف چگالی محلی و سراسری معرفی می‌گردد، و از یک روش شناسایی نقاط مرکزی برای شناسایی ساختارهای خوشه‌ای استفاده شده است. این الگوریتم همچنین از یک استراتژی تخصیص بر اساس فضای مربوطه برای اشیاء باقی مانده برای دستیابی به نتایج خوشه‌بندی دقیق استفاده می‌کند. نتایج تجربی بر روی مجموعه داده‌های واقعی، اثربخشی روش پیشنهادی را در عملکرد خوشه‌بندی نشان می‌دهد.

کلیدواژه‌ها

خوشه بندی

نزدیکترین همسایه

جستجوی سطح اول

ماتریس وابستگی

گراف

موضوعات

هوش مصنوعی

عنوان مقاله English

Clustering based on Affinity Matrix First Search-Breadth Graph

نویسنده English

shahin pourbahrami

Assistant Professor, Department of Computer Engineering, National University of Skills (NUS), Tehran, Iran

چکیده English

Density-based clustering algorithms are commonly used in machine learning and data mining due to their ability to identify clusters with different shapes and noisy objects. These algorithms are famous in data analysis and the use of their analysis output in industry and business. However, traditional clustering algorithms may have difficulty with datasets with different densities and overlapping neighboring clusters. To address these challenges, a new density-based clustering algorithm is proposed in this article. In this algorithm, the dependency matrix and the first -level search graph are used to find the dense points and the connection between the points, the concept of the relevant space is introduced to define the local and global density, and a central point identification method is used to identify the cluster structures. This algorithm also uses an allocation strategy based on the relevant space for the remaining objects to achieve accurate clustering results. Experimental results on real data sets show the effectiveness of the proposed method in clustering performance.

کلیدواژه‌ها English

clustering

nearest neighbor

first level search

Affinity matrix

graph

[1] Chen, Y., Hu, X., Fan, W., Shen, L., Zhang, Z., Liu, X., Du, J., Li, H., Chen, Y., & Li, H. (2020). Fast density peak clustering for large scale data based on kNN. Knowledge-Based Systems, 187, 104824, https://doi.org/10.1016/j.knosys.2019.06.032.

[2] Jan, Z., Ai-Ansari, N., Mousa, O., Abd-Alrazaq, A., Ahmed, A., Alam, T., & Househ, M. (2021). The role of machine learning in diagnosing bipolar disorder: scoping review. Journal of medical Internet research, 23(11), e29749, https://preprints.jmir.org/preprint/29749.

[3] Wen, J., Xuan, S., Li, Y., Peng, Q., & Gao, Q. (2020). Image segmentation algorithm based on neutrosophic fuzzy clustering with non‐local information. IET Image Processing, 14(3), 576-584, https://doi.org/10.1049/iet-ipr.2018.5949.

[4] Zou, Q., Lin, G., Jiang, X., Liu, X., & Zeng, X. (2020). Sequence clustering in bioinformatics: an empirical study. Briefings in bioinformatics, 21(1), 1-10, https://doi.org/10.1093/bib/bby090.

[5] Tombros, A., Villa, R., & Van Rijsbergen, C. J. (2002). The effectiveness of query-specific hierarchic clustering in information retrieval. Information processing & management, 38(4), 559-582, https://doi.org/10.1016/S0306-4573(01)00048-6.

[6] Jain, A. K. (2008). Data clustering: 50 years beyond k-means. Joint European Conference on Machine Learning and Knowledge Discovery in Databases, https://doi.org/10.1007/978-3-540-87479-9_3.

[7] Jin, H., Leung, K.-S., Wong, M.-L., & Xu, Z.-B. (2005). Scalable model-based cluster analysis using clustering features. Pattern Recognition, 38(5), 637-649, https://doi.org/10.1016/j.patcog.2004.07.012.

[8] Murtagh, F., & Contreras, P. (2012). Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1), 86-97, https://doi.org/10.1002/widm.53.

[9] Kriegel, H. P., Kröger, P., Sander, J., & Zimek, A. (2011). Density‐based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3), 231-240, https://doi.org/10.1002/widm.30.

[10] Kumar, K. M., & Reddy, A. R. M. (2016). A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method. Pattern Recognition, 58, 39-48, https://doi.org/10.1016/j.patcog.2016.03.008.

[11] Tang, C., Wang, H., Wang, Z., Zeng, X., Yan, H., & Xiao, Y. (2021). An improved OPTICS clustering algorithm for discovering clusters with uneven densities. Intelligent Data Analysis, 25(6), 1453-1471, https://doi.org/10.3233/IDA-205497.

[12] Hu, L., & Chan, K. C. (2015). A density-based clustering approach for identifying overlapping protein complexes with functional preferences. BMC bioinformatics, 16, 1-16, https://doi.org/10.1186/s12859-015-0583-3.

[13] Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. science, 344(6191), 1492-1496, DOI: 10.1126/science.1242072.

[14] Guo, W., Wang, W., Zhao, S., Niu, Y., Zhang, Z., & Liu, X. (2022). Density peak clustering with connectivity estimation. Knowledge-Based Systems, 243, 108501, https://doi.org/10.1016/j.knosys.2022.108501.

[15] Xu, X., Ding, S., & Shi, Z. (2018). An improved density peaks clustering algorithm with fast finding cluster centers. Knowledge-Based Systems, 158, 65-74, https://doi.org/10.1016/j.knosys.2018.05.034.

[16] Xie, J., Gao, H., Xie, W., Liu, X., & Grant, P. W. (2016). Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors. Information Sciences, 354, 19-40, https://doi.org/10.1016/j.ins.2016.03.011.

[17] Xu, X., Ding, S., Wang, L., & Wang, Y. (2020). A robust density peaks clustering algorithm with density-sensitive similarity. Knowledge-Based Systems, 200, 106028, https://doi.org/10.1016/j.knosys.2020.106028.

[18] Hou, J., Zhang, A., & Qi, N. (2020). Density peak clustering based on relative density relationship. Pattern Recognition, 108, 107554, https://doi.org/10.1016/j.patcog.2020.107554.

[19] Tong, W., Liu, S., & Gao, X.-Z. (2021). A density-peak-based clustering algorithm of automatically determining the number of clusters. Neurocomputing, 458, 655-666, https://doi.org/10.1016/j.neucom.2020.03.125.

[20] Zhu, Y., Ting, K. M., Carman, M. J., & Angelova, M. (2021). CDF Transform-and-Shift: An effective way to deal with datasets of inhomogeneous cluster densities. Pattern Recognition, 117, 107977, https://doi.org/10.1016/j.patcog.2021.107977.

[21] Yaohui, L., Zhengming, M., & Fang, Y. (2017). Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy. Knowledge-Based Systems, 133, 208-220, https://doi.org/10.1016/j.knosys.2017.07.010.

[22] Wang, Y., Wang, D., Zhang, X., Pang, W., Miao, C., Tan, A.-H., & Zhou, Y. (2020). McDPC: multi-center density peak clustering. Neural Computing and Applications, 32, 13465-13478, https://doi.org/10.1007/s00521-020-04754-5.

[23] Ding, S., Du, W., Xu, X., Shi, T., Wang, Y., & Li, C. (2023). An improved density peaks clustering algorithm based on natural neighbor with a merging strategy. Information Sciences, 624, 252-276, https://doi.org/10.1016/j.ins.2022.12.078.

[24] Pourbahrami, S., Balafar, M. A., & Khanli, L. M. (2023). ASVMK: A novel SVMs Kernel based on Apollonius function and density peak clustering. Engineering Applications of Artificial Intelligence, 126, 106704, https://doi.org/10.1016/j.engappai.2023.106704.

[25] Pourbahrami, S., & Hashemzadeh, M. (2022). A geometric-based clustering method using natural neighbors. Information Sciences, 610, 694-706, https://doi.org/10.1016/j.ins.2022.08.047.

[26] Pourbahrami, S., Balafar, M. A., Khanli, L. M., & Kakarash, Z. A. (2020). A survey of neighborhood construction algorithms for clustering and classifying data points. Computer Science Review, 38, 100315, https://doi.org/10.1016/j.cosrev.2020.100315.

[27] Pourbahrami, S., Khanli, L. M., & Azimpour, S. (2019). A novel and efficient data point neighborhood construction algorithm based on Apollonius circle. Expert Systems with Applications, 115, 57-67, https://doi.org/10.1016/j.eswa.2018.07.066.

[28] Abdolmaleki, N., Khanli, L. M., Hashemzadeh, M., & Pourbahrami, S. (2022). ACQC: Apollonius Circle‐based Quantum Clustering. Journal of Computational Science, 64, 101877, https://doi.org/10.1016/j.jocs.2022.101877.

[29] Pourbahrami, S., Khanli, L. M., & Azimpour, S. (2020). Improving neighborhood construction with Apollonius region algorithm based on density for clustering. Information Sciences, 522, 227-240, https://doi.org/10.1016/j.ins.2020.02.049.

[30] Sundqvist, M., Chiquet, J., & Rigaill, G. (2023). Adjusting the adjusted Rand Index: A multinomial story. Computational Statistics, 38(1), 327-347, https://doi.org/10.1007/s00180-022-01230-7.

[31] Amelio, A., & Pizzuti, C. (2017). Correction for closeness: Adjusting normalized mutual information measure for clustering comparison. Computational Intelligence, 33(3), 579-601, https://doi.org/10.1111/coin.12100.