Projects

An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.

2022-now

Description

Entity Resolution (ER) aims to identify relations between different entity descriptions that pertain to the same real world object. Due to its quadratic time complexity, ER is typically carried out in two steps: first, blocking restricts the computational cost to similar descriptions, and then, matching estimates the actual similarity between them. A plethora of techniques has been proposed for each step. To facilitate their use by researchers and practitioners, we present pyJedAI, an open-source library that leverages Python’s data science ecosystem to build powerful end-to-end ER workflows. We demonstrate how this can be accomplished by both expert and novice users in an intuitive, yet efficient and effective way.


A knowledge lake management system for the agrifood data space
HORIZON-EUROPE

2022-now

Description

In this project, we create and research tools for Entity Linking, Entity Resolution and Schema Matching in order to provide innovative and sustainable solutions for the agrifood and agritech inductry.

A Winner-Take-All Hashing-Based Unsupervised Model for Entity Resolution Problems. [B. Sc. Thesis]

2021-2022

Description

In this project, we propose an end-to-end unsupervised learning model that can be used for Entity Resolution problems on string data sets. An innovative prototype selection algorithm is utilized in order to create a rich euclidean, and at the same time, dissimilarity space. Part of this work, is a fine presentation of the theoretical benefits of a euclidean and dissimilarity space. Following we present an embedding scheme based on rank-ordered vectors, that circumvents the Curse of Dimensionality problem. The core of our framework is a locality hashing algorithm named Winner-Take-All, which accelerates our models run time while also maintaining great scores in the similarity checking phase. For the similarity checking phase, we adopt Kendall Tau rank correlation coefficient, a metric for comparing rankings. Finally, we use two state-of-the-art frameworks in order to make a consistent evaluation of our methodology among a famous Entity Resolution data set.