- Title of the talk:
Multimodal Learning with Vision and Language

- Abstract:

Multimodal machine learning aims at developing methods that are capable of dealing with problems which require modeling and integration of multiple, mostly complementary, modalities. In the past few years, recent advances in deep learning have led to unified architectures that can efficiently process multimodal data, with many interesting applications in computer vision, image processing, speech processing and natural language processing
 
In this tutorial, we will provide a comprehensive review and analysis of the methods related to multimodal learning techniques, specifically focusing on visual and textual data. We will discuss neural network architectures that have been generally used in computer vision and natural language processing such as convolutional neural networks (CNNs), recurrent neural networks (RNNs) and neural attention mechanisms along with the deep generative models such as variational auto-encoders (VAEs) and generative adversarial networks (GANs). Using these architectures as building blocks, we will then cover some newly emerging tasks that combine vision and language such as image captioning, visual question answering, visual dialogue, joint video and language alignment, image synthesis from text, language guided image manipulation.

 

Aykut Erdem

Associate Professor in the Department of Computer Engineering at Hacettepe University and a co-founder of the Hacettepe University Computer Vision Laboratory(HUCVL)

 

- Short biography:

Aykut Erdem is an Associate Professor in the Department of Computer Engineering at Hacettepe University and a co-founder of the Hacettepe University Computer Vision Laboratory(HUCVL).The broad goal of his research is to explore better ways to understand, interpret and manipulate visual data. His research interests span a diverse set of topics, ranging from image editing to visual data mining, and to multimodal learning for vision and language. He received his BSc and MSc degrees from Middle East Technical University in 2001 and 2003. During his doctoral studies at the same institution, he was a guest researcher at Virginia Tech in the summer of 2004, and a visiting scholar at MIT in the fall of 2007. After completing his doctorate in 2008, he worked as a post-doctoral researcher at the Ca'Foscari University in Venice under the EU-FP7 SIMBAD project from 2008 to 2010. For more information, please see his webpage at https://web.cs.hacettepe.edu.tr/~aykut

 

Erkut Erdem

Associate Professor in the Department of Computer Engineering at Hacettepe University

 

- Short biography:

Erkut Erdem is an Associate Professor in the Department of Computer Engineering at Hacettepe University. He received his Master's and Ph.D. degrees in Computer Science from Middle East Technical University in 2003 and 2008, respectively. He pursued his post-doctoral research at Télécom ParisTech, Ecole Nationale Supérieure des Télécommunications during 2009-2010. He is one of the founders of Hacettepe University Computer Vision Laboratory. His research interests are on computer vision and machine learning with applications to image editing and integrated vision and language. For more information, visit http://web.cs.hacettepe.edu.tr/~erkut/