Our paper Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution. was accepted at ICCV 2019! Work with awesome collaborators from the National University of Singapore.
New technical report on arXiv: C.Y. Ma, Y. Kalantidis, G. AlRegib, P. Vajda, M. Rohrbach, Z. Kira. Learning to Generate Grounded Image Captions without Localization Supervision. arXiv:1906.00283, 2019. [Project Page]
Delighted to announce that four papers were accepted at CVPR 2019!
Our first Workshop on Computer vision for Global Challenges (CV4GC) was accepted at CVPR this year! Really excited about organizing an initiative to bring the computer vision community closer to socially impactful tasks, datasets and applications for the whole world. Check out the CV4GC website!
Our paper Focal Visual-Text Attention for Memex Question Answering was accepted for publication at the IEEE Transactions on Pattern Analysis and Machine Inteligence (impact factor: 9.455). It introduces our MemexQA Dataset, the first publicly available multimodal question answering dataset consisting of real personal photo albums.
Our paper Large-scale Visual Relationship Detection was accepted at the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI 2019, acceptance rate 16.2%). [Update Feb 2019]: Code is now on github.
Our paper A^2-Nets: Double Attention Networks was accepted at NIPS 2018. See you in Montreal! Work with awesome collaborators from the National University of Singapore.
Our work on visual similarity search over the whole Flickr corpus just launched! Try it yourselves by clicking on the magnifying glass icon at the top right corner of any photopage! Story covered in The Verge, Engadget, Petapixel, Digital Trends and Venture Beat.
After two amazing years at Yahoo Research, joined the Computer Vision Group at Facebook Research in Menlo Park.
Our paper "Tag Prediction in Flickr: A view from the darkroom" on large scale image classification with noisy training data received the best paper award at the 1st Workshop on Large Scale Computer Vision Systems at NIPS 2016.
Our paper "Multimodal Classification of Moderated Online Pro-Eating Disorder Content" was accepted at the ACM CHI 2017 conference (25% acceptance rate).
Will be a guest lecturer at Fei-Fei's and Juan Carlos' CS 131 Computer Vision: Foundations and Applications course at Stanford during the 2016-2017 Fall Semester.
Grew up and lived in Greece until 2015 with brief breaks in Sweden, Spain and the United States. Lived in San Francisco from 2015 till 2017 and currently in Oakland.
The large-scale visual similarity search work of my PhD came to a nice closure when we applied it on a trully web-scale real-time application, powering the visual search feature for Flickr. At the same time, my interests expanded towards modeling of vision and language and collaborated with Stanford on the Visual Genome project. During my time at Facebook my interests expanded to video understanding, deep arhitectures and broader representation learning.
Currently conducting research on representation learning, video understanding, multi-modal classification and large-scale vision and language.
Full list at my Google Scholar profile.