Distributed Record Linkage in Healthcare Datawith Apache Spark

Mohammad Heydari; Reza Sarshar; Mohammad Ali Soltanshahi

Distributed Record Linkage in Healthcare Datawith Apache Spark

محل انتشار: اولین کنفرانس ملی هوش مصنوعی و مهندسی نرم افزار

سال انتشار: 1402

نوع سند: مقاله کنفرانسی

زبان: انگلیسی

مشاهده: 108

فایل این مقاله در 6 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/1912860

شناسه ملی سند علمی:

AISOFT01_025

تاریخ نمایه سازی: 28 بهمن 1402

چکیده مقاله:

Healthcare data is a valuable resource for research,analysis, and decision-making in the medical field. However,healthcare data is often fragmented and distributed acrossvarious sources, making it challenging to combine and analyzeeffectively. Record linkage, also known as data matching, is acrucial step in integrating and cleaning healthcare data toensure data quality and accuracy. Apache Spark, a powerfulopen source distributed big data processing framework,provides a robust platform for performing record linkage taskswith the aid of its machine learning library. In this study, wedeveloped a new distributed data matching model based on theApache Spark Machine Learning library. To ensure thecorrect functioning of our model, the validation phase has beenperformed on the training data. The main challenge is dataimbalance because a large amount of data is labeled false, anda small number of records are labeled true. By utilizing SVMand Regression algorithms, our results demonstrate thatresearch data was neither over-fitted nor under-fitted, and thisshows that our distributed model works well on the data.

کلیدواژه ها:

Record Linkage ، Data Matching ، Apache Spark ، Distributed Machine Learning

نویسندگان

Mohammad Heydari

School of Industrial and Systems EngineeringTarbiat Modares UniversityTehran, Iran

Reza Sarshar

School of Industrial and Systems EngineeringTarbiat Modares UniversityTehran, Iran

Mohammad Ali Soltanshahi

School of Industrial and Systems EngineeringTarbiat Modares UniversityTehran, Iran