پیاده‌سازی سیستم حذف ابرپیوندهای نویزی با استفاده از رویکرد معنایی و رابطه‌ای آنتولوژی DBpedia

تقندیکی, کاظم

doi:10.48301/kssa.2023.382583.2426

پیاده‌سازی سیستم حذف ابرپیوندهای نویزی با استفاده از رویکرد معنایی و رابطه‌ای آنتولوژی DBpedia

نوع مقاله : مقاله پژوهشی (نظری)

نویسنده

کاظم تقندیکی

عضو هیات علمی گروه مهندسی کامپیوتر، دانشگاه فنی و حرفه‌ای، تهران، ایران.

10.48301/kssa.2023.382583.2426

چکیده

همانطور که دادههای وب به سرعت در حال گسترش و رشد هستند، ساختار گراف وب که یک نمایش گرافیکی از دنیای وب است، در حال بزرگ شدن می‌باشد و به تدریج ساختار محتوایی خود را به یک ساختار غیر محتوایی تبدیل کرده است. وجود دادههای هرز مانند ابرپیوندهای نویزی در گراف ساختار وب، بسیاری از الگوریتمهای لینککاوی را با مشکل مواجه ساخته و باعث کاهش سرعت و بازدهی الگوریتمهای بازیابی اطلاعات گردیده است. کارهای انجام شده به حذف ابرپیوندهای نویزی با استفاده رویکردهای ساختاری و رشتهای پرداختهاند. این رویکردها به اشتباه برخی از ابرپیوندهای مفید را حذف کرده و در بعضی شرایط قادر به تشخیص ابرپیوندهای نویزی نمیباشند. در این مقاله، ابتدا توسط یک خزنده تعاملی یک مجموعه داده از ابرپیوندهای نویزی و مفید با استفاده از خزش وب سایت‌ها ایجاد شد. سپس از طریق رویکردهای وب معنایی و امکاناتی نظیرآنتولوژی DBpedia به ساختار معنایی و رابطهای این ابرپیوندها توجه گردید. در ادامه با فعال کردن استدلالگر آنتولوژی DBpedia، فرآیند حذف ابرپیوندهای نویزی از گراف ساختار وب صورت گرفت. آزمایش‌های انجام گرفته بر روی این سیستم، دقت و توانایی تکنولوژی‌های وب معنایی را در حذف ابرپیوندهای نویزی نشان میدهد.

کلیدواژه‌ها

موضوعات

هوش مصنوعی

عنوان مقاله [English]

Implementation of a Noisy Hyperlink Removal System: Using the Semantic and Relational Approach of the DBpedia Ontology

نویسنده [English]

Kazem Taghandiki

Faculty Member, Department of Computer Engineering, Technical and Vocational University (TVU), Tehran, Iran.

چکیده [English]

With the rapid expansion and growth of web data, the web graph structure, which is a graphical representation of the web world, is getting larger and larger and has gradually changed from a content structure to a non-content structure. The presence of junk data such as noisy hyperlinks in the web structure graph has caused problems for many link mining algorithms and reduced the speed and efficiency of information retrieval algorithms. Research has been conducted to remove noisy hyperlinks using structural and string approaches. These approaches incorrectly remove some useful hyperlinks and are unable to detect noisy hyperlinks in some situations. In this paper, a dataset of noisy and useful hyperlinks was first created by an interactive crawler using website crawling. Then, through semantic web approaches and facilities such as the Dbpedia ontology, attention was paid to the semantic and relational structure of these hyperlinks. This was followed by activating the DBpedia ontology reasoner, the process of removing noisy hyperlinks from the web structure graph taking place. The tests performed on this system showed the accuracy and capability of Semantic Web technologies to remove noisy hyperlinks.

کلیدواژه‌ها [English]

Semantic Web
Noisy Hyperlinks
Ontology
Reasoner
Semantic Similarity
Relatedness Similarity

مراجع

[1] Nalini, M. K., Dhinakaran, K., Elantamilan, D., Gnanavel, R., & Vinod, D. (2022, January 28-29). Implementation of Indexing Techniques to Prevent Data Leakage and Duplication in Internet. 2022 International Conference on Advances in Computing, Communication and Applied Informatics Chennai, India. https://doi.org/10.1109/ACCAI53970.202 2.9752554

[2] Makkar, A., & Kumar, N. (2020). An efficient deep learning-based scheme for web spam detection in IoT environment. Future Generation Computer Systems, 108, 467-487. https://doi.org/10.1016/j.future.2020.03.004

[3] Wu, Y., Wu, Y., Liu, Y., & Shi, T. (2022, March 25-27). The research of the optimized solutions to Raft consensus algorithm based on a weighted PageRank algorithm. 2022 Asia Conference on Algorithms, Computing and Machine Learning, Hangzhou, China. h ttps://doi.org/10.1109/CACML55074.2022.00135

[4] Bhavitha, K. V., & Thangaraj, S. J. J. (2022, February 16-17). Novel Detection of Accurate Spam Content using Logistic Regression Algorithm Compared with Gaussian Algorithm. 2022 International Conference on Business Analytics for Technology and Security Dubai, United Arab Emirates. https://doi.org/10.1109/ICBATS54253.2022.9759003

[5] Benczur, A. A., Csalogany, K., Sarlos, T., & Uher, M. (2005, May 10-14). Spamrank–fully automatic link spam detection work in progress. Proceedings of the first international workshop on adversarial information retrieval on the web, Chiba, Japan. https://ww w.researchgate.net/publication/220846812_SpamRank_--_Fully_Automatic_Link _Spam_Detection

[6] Qi, X., Nie, L., & Davison, B. D. (2007, May 8). Measuring similarity to detect qualified links. Proceedings of the 3rd international workshop on Adversarial information retrieval on the Web, Banff, Alberta, Canada. https://doi.org/10.1145/1244408.1244418

[7] Wookey, L., & Geller, J. (2004). Semantic hierarchical abstraction of web site structures for web searchers. Journal of Research and Practice in Information Technology, 36(1), 23-34. https://doi.org/10.3316/ielapa.120100890765820

[8] Da Costa Carvalho, A. L., Chirita, P. A., De Moura, E. S., Calado, P., & Nejdl, W. (2006, May 23-26). Site level noise removal for search engines. Proceedings of the 15th international conference on World Wide Web, Edinburgh, Scotland. https://doi.org/ 10.1145/1135777.1135793

[9] Chen, Z., Liu, S., Wenyin, L., Pu, G., & Ma, W-Y. (2003, August 1). Building a web thesaurus from web link structure. Proceedings of the 26th annual international Association for Computing Machinery SIGIR conference on Research and development in informaion retrieval, Toronto, Canada. https://doi.org/10.1145/860435.860447

[10] Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004, May 2-7). WordNet::Similarity: measuring the relatedness of concepts. Demonstration Papers at Human Language Technology-NAACL 2004, Boston, Massachusetts. https://doi.org/10.5555/161402 5.1614037

[11] Li, F. (2008, October 12-14). Extracting Structure of Web Site Based on Hyperlink Analysis. 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing, Dalian, China. https://doi.org/10.1109/WiCom.2008.2538

[12] Keller, M., & Nussbaumer, M. (2011, September 7-9). Beyond the Web Graph: Mining the Information Architecture of the WWW with Navigation Structure Graphs. 2011 International Conference on Emerging Intelligent Data and Web Technologies, Tirana, Albania. https://doi.org/10.1109/EIDWT.2011.23

[13] Zheng, Y., Cheng, X-C., & Chen, K. (2008). Filtering noise in Web pages based on parsing tree. The Journal of China Universities of Posts and Telecommunications, 15(25), 46-50. https://doi.org/10.1016/S1005-8885(08)60153-3

[14] Bechhofer, S., Harmelen, F. V., Hendler, J., Horrocks, I., McGuinness, D. L., Patel-Schneider, P. F., & Stein, L. A. (2004). OWL Web Ontology Language Reference. W3C. https:// www.w3.org/TR/owl-ref/

[15] Widyassari, A. P., Noersasongko, E., Syukur, A., & Affandy. (2022, December 8-9). The 7-Phases Preprocessing Based On Extractive Text Summarization. 2022 Seventh International Conference on Informatics and Computing, Denpasar, Bali, Indonesia. https://doi.org/10.1109/ICIC56845.2022.10006998

[16] Rasham, S., Naz, A., Afzal, Z., Ahmed, W., Abbas, Q., Anwar, M. H., Ejaz, M., & Ilyas, M. (2022). The Challenges and Case for Urdu DBpedia. In A. Ullah, S. Anwar, Á. Rocha, & S. Gill (Eds.), Proceedings of International Conference on Information Technology and Applications. Springer Nature Singapore. https://doi.org/10.1007/9 78-981-16-7618-5_38

[17] GoogleTrends. (2021). Explore what the worldthe world is searching for right now. https ://trends.google.com/trends/

[18] FileHippo. (2021). FileHippo.com - Download Free Software. https://filehippo.com/

[19] Ercan, G., & Cicekli, I. (2007). Using lexical chains for keyword extraction. Information Processing & Management, 43(6), 1705-1714. https://doi.org/10.1016/j.ipm.2007.0 1.015

[20] Joshi, C., Attar, V. Z., & Kalamkar, S. P. (2022). An Unsupervised Topic Modeling Approach for Adverse Drug Reaction Extraction and Identification from Natural Language Text. In S. Tiwari, M. C. Trivedi, M. L. Kolhe, K. K. Mishra, & B. K. Singh (Eds.), Advances in Data and Information Sciences. Springer Singapore. https://doi.org/10.1007/978-981-16-5689-7_44

[21] Lott, B. (2012). Survey of keyword extraction techniques. UNM Education. https://www. docdroid.net/bii3/lott-pdf#page=10

[22] Fedorov, A. M., & Datyev, I. O. (2022). The Effect of Additive Regularization for Topic Modeling of Social Media Communities. In R. Silhavy (Ed.), Artificial Intelligence Trends in Systems. Springer International Publishing. https://doi.org/10.1007/978-3-031-09076-9_51

[23] Zaeri, A., & Nematbakhsh, M. A. (2012). A Terminological Search Algorithm for Ontology Matching. Modern Applied Science, 6(10), 37-52. https://doi.org/10.5539/mas.v6n1 0p37

[24] Ahuja, R., Chug, A., Kohli, S., Gupta, S., & Ahuja, P. (2019). The Impact of Features Extraction on the Sentiment Analysis. Procedia Computer Science, 152, 341-348. h ttps://doi.org/10.1016/j.procs.2019.05.008

پیاده‌سازی سیستم حذف ابرپیوندهای نویزی با استفاده از رویکرد معنایی و رابطه‌ای آنتولوژی DBpedia

Implementation of a Noisy Hyperlink Removal System: Using the Semantic and Relational Approach of the DBpedia Ontology

مراجع

دوره 20، شماره 3
فنی و مهندسی
آذر 1402
صفحه 485-507

فایل ها

سابقه مقاله

هم رسانی

ارجاع به این مقاله

آمار

پیاده‌سازی سیستم حذف ابرپیوندهای نویزی با استفاده از رویکرد معنایی و رابطه‌ای آنتولوژی DBpedia

Implementation of a Noisy Hyperlink Removal System: Using the Semantic and Relational Approach of the DBpedia Ontology

مراجع

دوره 20، شماره 3فنی و مهندسیآذر 1402صفحه 485-507

فایل ها

سابقه مقاله

هم رسانی

ارجاع به این مقاله

آمار

دوره 20، شماره 3
فنی و مهندسی
آذر 1402
صفحه 485-507