Publisher
source

Ioannis Patras

1 month ago

PhD Position in Multimodal (Audio and Vision) Conversational Foundation Models Queen Mary University of London in United Kingdom

Degree Level

PhD

Field of study

Computer Science

Funding

Full funding available

Deadline

December 31, 2026
Country flag

Country

United Kingdom

University

Queen Mary University of London

Social connections

How do I apply for this?

Sign in for free to reveal details, requirements, and source links.

Apply for this position

Keywords

Computer Science
Electrical Engineering
Artificial Intelligence
Computer Vision
Reinforcement Learning
Interpretability
Machine learning

About this position

This funded PhD position at Queen Mary University of London offers an exciting opportunity to advance the field of multimodal conversational foundation models. The research is conducted in collaboration with Tavus Inc, a US-based startup specializing in digital humans, and is embedded within the Multimedia and Vision Research group and the Centre for Multimodal AI. The project aims to design and train components of a next-generation multimodal model capable of perceiving and generating both verbal and non-verbal responses in conversational contexts.

Unlike traditional text-based dialogue systems, this research focuses on capturing the richness of human-to-human interaction, including facial expressions, voice intonation, gestures, emotions, and social signals. The objectives include multimodal perception of human behaviour using supervised, unsupervised, and reinforcement learning methodologies; post-training techniques for controllable multimodal generation aligned with conversational goals and personality; and the development of diffusion-based methods for high-quality, identity-preserving audio-visual output.

The team regularly publishes in top conferences such as CVPR, ICCV, ECCV, NeurIPS, TPAMI, and IJCV, and has access to extensive computational resources, including a server with 64 A100 GPUs and exclusive access to additional servers. The research environment is highly collaborative, with regular interaction between the London-based and international teams of Tavus Inc.

Funding includes a full tuition fee (for both Home and International students) and an annual stipend of £22,780 for three years. Applicants should hold or expect to obtain an MSc in Electronic Engineering, Computer Science, or a closely related discipline, with a distinction or first-class degree highly desirable. The application process requires submission of a CV, cover letter, research proposal, two references, and a certificate of English language proficiency (if applicable).

Applications are accepted year-round until the position is filled, with a preferred start date of June 1, 2026 (or as soon as possible). For further information, prospective candidates are encouraged to contact Prof Ioannis Patras at [email protected] and review the supervisor’s academic profile at https://ipatras.github.io and https://www.qmul.ac.uk/eecs/people/profiles/patrasioannis.html. Detailed application instructions are available at http://eecs.qmul.ac.uk/phd/how-to-apply/.

This position is ideal for candidates passionate about artificial intelligence, computer vision, and multimodal machine learning, seeking to contribute to cutting-edge research in conversational AI and digital avatars.

Funding details

Full funding including tuition fees and living expenses is available for this position. The scholarship covers all educational costs and provides a monthly stipend.

How to apply

Please submit your application including a cover letter, CV, academic transcripts, and contact information for two references. Applications should be sent via the online portal before the deadline.

More information can be found here

Official Email

Ask ApplyKite AI

Start chatting
Can you summarize this position?
What qualifications are required for this position?
How should I prepare my application?

Professors