Building Bridges with Industry @ CLARIN2024

Tuesday, 15 October 2024 , 14:00 - 15:30

Description

This first industry track session in a CLARIN Annual Conference brings together academics, research infrastructure experts and industry representatives from the Spanish industry landscape specialising in AI and language technologies, as well as healthcare, customer support and telecommunications. The objective of the session is to build and open up bridges between industrial R&D, academic research and existing resource infrastructures such as CLARIN . The industry experts will provide their perspectives on the value of collaborating with academic institutions and research infrastructures to foster more effective knowledge, data and technology transfer in order to accelerate technological advancements and create broader societal benefits.

This session is organised by Henk van den Heuvel and Iulianna van der Lek from CLARIN ERIC in close collaboration with Albert Cañigueral and Maite Melero from the Barcelona Supercomputing Center.

Programme

14:00 Welcome & Introduction by Henk van den Heuvel (CLARIN ERIC) (5 min)

14:05 Industry Presentations | Chair: Henk van den Heuvel (10 min per presentation)

Expanding the reach and building connections: lessons from the CKAN project by Adrià Mercader (Link Digital)
Translating AI Safety and Security Research into Industry Practice by Pau Rué (Alinia AI)
AI: The future of Mobile Interaction by Jordi Luque Serrano (Telefonica Innovación Digital)
Evaluating LLMs for factuality in multilingual, low-resourced scenarios associated with domain adaptation by José Manuel Gómez-Perez (Expert.AI)

14:50 CLARIN’s Role in Building Bridges by Henk van den Heuvel (CLARIN ERIC) (10 min)

15:00 Round Table Discussions | Moderator: German Rigau (CLARIN-ES) (25 min)

How can the resources of academia and RIs create value for industry?

15:25 Conclusions and Wrap-Up (5 min)

Speakers

Henk van den Heuvel ( , CLARIN ERIC)

Presentation abstract: In my presentation, I will briefly describe the CLARIN infrastructure and sketch an initial perspective of how CLARIN could foster collaboration between academia and industry through its infrastructure. Next, I look forward to opening up this perspective for receiving feedback and suggestions from the audience in the round table discussion.

About: Henk van den Heuvel is the director of the Centre for Language and Speech Technology (CLST) and Head of the Humanities Lab at the Faculty of Arts at Radboud University in Nijmegen (Netherlands), where he also holds the position of Research Data Manager (data steward). Henk joined the CLARIN Board of Directors in October 2023. Henk has expertise in Language and Speech Technology, specifically speech transcription, automatic speech recognition, and language learning. To this end, he has been involved in collecting, compiling, and validating many spoken and written language resources, which are used to train computers to recognise and transcribe speech. This, in turn, is vital to developing resources for language learners and people who have experienced language attrition.

Adrià Mercader (Senior Solutions Architect at Link Digital)

Presentation abstract: Over its long history as an open-source project, CKAN has expanded its original use case of government data portals to be adopted by various organizations, including research centres and private companies. It has also fostered an ecosystem of commercial vendors providing services and support around it. We will explore what might have caused this, dive into some examples, and see if there are lessons that the CLARIN project can take on board.

About: Adrià Mercader (website, LinkedIn, GitHub) has over a decade of experience working in the Open Data and Civic Tech Fields. He is currently a Senior Solutions Architect at Link Digital. He is also a core maintainer of the CKAN project and previously had a long tenure as the Tech Lead of the Open Knowledge Foundation. Over the years, he has overseen many data publication-related projects, working alongside government departments, UN agencies, Non-profit organisations and the private sector. He is an enthusiast of the Open web, especially the communities built around Open Source projects and open standards.

Pau Rué (ALINIA)

Presentation: “Alinia AI: Translating AI Safety and Security Research into Industry Practice".

About: Pau Rué (LinkedIn) is an experienced leader in data and Artificial Intelligence. He has successfully developed practical Machine Learning and AI solutions across various industries. Most recently, Pau served as VP of Artificial Intelligence at a HealthTech startup and as Director of Machine Learning at a SaaS company. In these roles, he observed the rise of generative AI, its potential, and its risks firsthand. Currently, Pau leads AI Research and Development at Alinia AI. Alinia helps businesses manage their AI systems according to safety and security needs and guidelines. Alinia AI's approach is driven by state-of-the-art research on AI safety and security.

Jordi Luque Serrano (Senior Research Scientist at Telefonica Research)

Presentation abstract: Large Language Models (LLMs), once envisioned as simple chatbots, have the potential to reshape the mobile landscape, becoming a new computing paradigm. Driven by advancements in multimodal AI and agent capabilities, LLMs can increasingly respond to multiple inputs, make decisions, and take action. This opens up possibilities for LLMs to become the foundation of new AI-powered operating systems for future generations of mobile devices. This shift in mobile interactions could see us fully interfacing with our smartphones through natural language rather than traditional apps, effectively transforming them into powerful language-controlled assistants. However, despite their growing sophistication, current "simple" chatbots still face challenges in achieving proper human-like understanding and context awareness in security and privacy or, even more alarming, in ethical alignment. For example, current companies' Usage Policy outlines restrictions on using LLMs for various activities, including unethical ones like committing crimes, generating harmful content, and offering legal/financial/medical advice, etc. In general, engaging in unethical practices is prohibited. These policies aim to ensure responsible and ethical use of these models, exemplifying how companies would like them to behave. Unfortunately, guidelines and current alignment techniques do not guarantee how the models behave. Are we ready for a future where AI is truly integrated into our lives and controls our devices?

About: Jordi Luque (LinkedIn) holds the esteemed position of Senior Research Scientist at Telefonica Innovación Digital, within the Discovery’s Research team. With a wealth of expertise spanning over 15 years, he has dedicated his career to the fields of signal processing, time series analysis and machine learning techniques, particularly in the domains of speech, language, biometrics and complex networks. Furthermore, he fulfils the pivotal role of Principal Consortium Coordinator for the EU project ELOQUENCE, an ambitious initiative to pioneer innovative solutions for high-stakes applications of large language models.
Jordi also holds a position as an Assistant Professor at the Universitat Politècnica de Catalunya (UPC), where he serves as a teacher in Machine Learning courses from the Artificial Intelligence Degree program and for the Interaction and Designing of Interfaces course from the Informatics Degree. His commitment to the academic and research community extends to his active involvement as organizer and committee member in several national and international conferences.
Jordi Luque is co-authorship of numerous patents and a substantial body of approximately one hundred scientific publications and is the winner of several international and national technology evaluations. His passion is unwavering in his pursuit of advancing state-of-the-art speech and language technologies while consistently seeking to create impactful and real-world applications that address social challenges.

José Manuel Gómez-Perez (Director of Language Technology Research at Expert.AI)

Presentation abstract: In the rapidly advancing world of AI, Large Language Models (LLMs) offer transformative potential, but their widespread adoption hinges on overcoming key challenges such as factuality that become particularly acute in scenarios of domain adaptation and limited multilingual support. For sectors ranging from healthcare to finance, ensuring that AI systems produce accurate, reliable information across multiple languages and specialized domains is critical. This talk explores innovative techniques, including those being developed in the INESData project, for evaluating and improving LLMs, with a focus on reducing errors and ensuring factual accuracy in low-resourced, multilingual scenarios. These advancements promise to unlock significant opportunities for diverse communities and industries across Europe, driving more efficient operations, enhancing decision-making, and building trust in AI-powered solutions.

About: José Manuel Gómez-Pérez is the Director of Language Technology Research at expert.ai, specializing in Natural Language Processing, Machine Reasoning, and large Knowledge Bases. With over 20 years of experience and a PhD in Computer Science and Artificial Intelligence, he has made significant contributions to the field. His work, highlighted by more than 100 peer-reviewed publications, has secured multimillion-dollar contracts in both the private and public sectors, including collaborations with multinational companies in sectors including pharma, insurance, and media, as well as prestigious institutions such as the European Space Agency. He is also the author of several influential books and has received numerous awards for his research, including scholarships and best paper recognition. In addition, José Manuel has taught at several leading international universities, and his insights have been featured in publications like Nature, Scientific American, and El País. Recently, he has contributed to shaping the European Strategic Agenda for Digital Language Equality and is part of several initiatives of the European Commission aimed at addressing the challenges posed by AI and large language models.

To learn more about upcoming similar initiatives, please watch the Industry and GLAM section on our website.