Trustworthy Multimodal Retrieval-Augmented Generation for Complex Industrial Systems

PhD student
Director(s)
Starting date
October 2025
Application domain
Industrial
Host institution
University Lumière Lyon2

Recently, Retrieval-Augmented Generation (RAG) systems have shown strong potential by combining the power of information retrieval models with that of large language models. Unlike traditional models relying solely on internal knowledge, RAG enhances generation by dynamically accessing external sources. However, ensuring the trustworthiness of these systems remains a challenge, as their performance can vary unpredictably across different queries and contexts.
Conformal prediction offers a promising statistical framework to address this limitation. By bounding the prediction error rate, conformal methods provide trustworthy outputs, making them valuable for high-stakes industrial applications. However, industrial environments typically involve complex and multimodal data such as textual documents, technical diagrams, sensor outputs, and maintenance records. Effectively integrating these modalities presents a major research challenge. Multimodal architectures must not only align heterogeneous sources into a unified semantic space, but also preserve modality-specific semantic nuances essential for accurate retrieval and generation.
This thesis aims to explore and develop novel methods to generalize conformal guarantees across multimodal data. These contributions will support the design of more robust and trustworthy multimodal RAG systems.