Reliable Conversational Domain-specific Data Exploration and Analysis.
Horizon-MSCA Doctoral Networks, Grant Agreement No. 101168951
2025-now
DescriptionA foundational framework that bridges formal data retrieval and natural language innovation for reliable assistive technology.
Conversational AI and Large Language Models (LLMs) such as ChatGPT and Bard promise to answer complex problems by performing simple conversations. Unfortunately, their answering processes are inscrutable, as well as prone to bias, hallucinations, and high computational costs.The ARMADA doctoral network will train 15 highly skilled Early Stage Researchers to specialize in the area of Conversational AI and tackle the challenges associated with the recent advances in developing LLMs, particularly when assisting analysis in sensitive domains. These specialists will acquire unique knowledge and skills in Natural Language Processing, Machine Learning, Data Management, and Algorithms to evaluate and improve the reliability of LLMs. A reliable LLM will produce timely, consistent, and verifiable answers, guiding users in important decision-making processes.
A knowledge lake management system for the agrifood data space
Horizon Europe
2022-2024
DescriptionIn this project, we created tools for Entity Linking, Entity Resolution and Schema Matching tasks, and provided innovative and sustainable solutions for the agrifood and agritech inductry.
An open-source library that leverages Python's data science ecosystem to build powerful end-to-end Entity Resolution workflows.
2022-2024
DescriptionpyJedAI is a python framework, aiming to offer experts and novice users, robust and fast solutions for multiple types of Entity Resolution problems. It is builded using state-of-the-art python frameworks. pyJedAI constitutes the sole open-source Link Discovery tool that is capable of exploiting the latest breakthroughs in Deep Learning and NLP techniques, which are publicly available through the Python data science ecosystem. This applies to both blocking and matching, thus ensuring high time efficiency, high scalability as well as high effectiveness, without requiring any labelled instances from the user.
From 2022 until 2024, I was the architect and main deveper of this open-source. Currently, serving as one of the maintainers.
2024
DescriptionDeveloped three approaches for recommending casino games based on their description and attributes info. To do so, pyJedAI and graph-clustering algorithms were tested in this task. Entity Resolution techniques were used in a content-based recommendation challenge, yielding quite interesting results.
Among thirty teams participated, we were selected as one of the two finalist teams that won the competition.
A Winner-Take-All Hashing-Based Unsupervised Model for Entity Resolution Problems. [B. Sc. Thesis]
2021-2022
DescriptionIn this project, we proposed an end-to-end unsupervised learning model that can be used for Entity Resolution problems on string data sets.