News

2025

Paper accepted at CVPR 2025: "DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers".
[paper, project page, code, poster]

Paper accepted at CVPR 2025: "LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation".
[paper, code, HF demo, poster]

Serving as an Area Chair for ICCV 2025.

2024

Paper accepted at ECCV 2024: "UNIC: Universal Classification Models via Multi-teacher Distillation". [paper, code]

Keynote speaker at Netherlands Conference on Computer Vision (NCCV 2024). Den Bosch, The Netherlands, 29th of May 2024.

Invited talk at the Synthetic Data for Computer Vision Workshop at CVPR 2024.

Paper accepted at CVPR 2024: "Label Propagation for Zero-shot Classification with Vision-Language Models". [paper, code]

Co-organizing the first ever African Computer Vision Summer School (ACVSS). The inaugural occurrence will be held in Kenya, in July 2024. Fully funded for African students, open to all.

Invited talk at the University of Bristol on "Improving Generalization with Generative AI and Test-time training". Part of the MaVi Seminar Series of the Machine Learning and Computer Vision Group.

Paper accepted at ICLR 2024: "Weatherproofing Retrieval for Localization with Generative AI and Geometric Consistency" [paper, project page]
Serving as an Area Chair for ECCV 2024.

2023

Paper accepted at NeurIPS 2023: "Test-time Training for Matching-based Video Object Segmentation" [paper, project page]
Serving as an Action Editor for the Transactions on Machine Learning Research (TMLR) journal.

Serving as an Area Chair for NeurIPS 2023 and CVPR 2024.

Paper accepted at CVPR 2023: "Fake it till you make it: Learning transferable representations from synthetic ImageNet clones" [paper, project page]

Paper accepted at ICLR 2023 (spotlight): "No Reason for No Supervision: Improved Generalization in Supervised Models" [paper, project page & pretrained models, code]

Paper accepted at SCIA 2023 and the L3D-IVU-2023 Workshop at CVPR 2023 : "Rethinking matching-based few-shot action recognition" [paper, project page & pretrained models, code]

2022

Find me on Mastodon at @skamalas@sigmoid.social

Invited talk on "Granularity-aware Adaptation for Image Retrieval over Multiple Tasks". Instance-Level Recognition Workshop (ECCV 2022).

Invited talk on "Improving generalization for classification and retrieval tasks". Visual Recognition Group at CTU in Prague.

Paper accepted at TPAMI: "PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling" [paper, code]

Paper accepted at ECCV 2022: "Granularity-aware Adaptation for Image Retrieval over Multiple Tasks" [paper]

Paper accepted at TMLR: "TLDR: Twin Learning for dimensionality reduction" [paper, code]

Amanda and I curated an illustrated version of "The summer day" by Mary Oliver, with watercolors created by the DALLE-Mini model

Paper accepted at ICLR 2022: "Learning Super-Features for Image Retrieval" [paper, code, Hugging Face Spaces Demo]

Serving as an Area Chair for CVPR 2023

Co-organizing the Wiki-M3L 2022: Wikipedia and Multi-Modal & Multi-Lingual Research workshop at ICLR 2022.

Invited talks on "Advances in Self-supervised Learning and measuring generalization" [slides]:

January 13th, Inria Rennes (Linkmedia Speaks Science lecture series, virtual)

January 18th, National Technical University of Athens (Research Challenges in Computer Science 2022, virtual) [video]

February 9th, University of Edinburgh (Advanced Vision Graduate Course, virtual)

2021

Our camper van Alppy celebrates 10k kilometers with us in six months 😍
Invited talks in Autumn 2021: Inria Grenoble (THOTH team, virtual), CTU in Prague (VRG group), Inria Paris (WILLOW team), ParisTech (Imagine team), University of Leicester (CSE Computing Seminar, virtual), KAUST (VisionCAIR group, virtual)
Paper accepted at 3DV 2021: "Leveraging MoCap Data for Human Mesh Recovery" [paper, blogpost]
Gave a two-part guest lecture on image representations at the Hanoi University of Science & Technology (HUST) (virtual) [slides]
Paper accepted at ICCV 2021: "Concept Generalization in Visual Representation Learning" [paper, code, project page]
Co-organizing the PAISS 2021 summer school
Co-organizing the ActivityNet Entities Object Localization Challenge at CVPR 2021
Paper accepted at CVPR 2021: "Probabilistic Embeddings for Cross-Modal Retrieval" [paper, code]
Invited talk at the Robotics Institute at Carnegie Mellon University on Self-supervised Learning and generalization (VASC Seminar Series, virtual)

2020

Serving as an Area Chair for ICCV 2021 and CVPR 2021
Paper accepted at NeurIPS 2020: Hard Negative Mixing for Contrastive Learning [paper, project page, blog post, slides, 3-min video]
Call for an open PhD position with Prof. Giorgos Tolias at CTU in Prague. [update: position has now been filled]
Paper accepted at ECCV 2020: Learning to Generate Grounded Image Captions without Localization Supervision [paper, code]
New chapter: I joined NAVER LABS Europe in Grenoble, France as a research scientist
Co-organizing the CV for Agriculture (CV4A) workshop at ICLR 2020

2019

Paper accepted at ICCV 2019: Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution [paper, code]
Tutorial and invited talk at Data Science Africa 2019 in Accra, Ghana:
- Tutorial on Image representations and fine-grained recognition [normal resolution slides (~6.2MB), lower resolution slides (~1.6MB)]
- Talk on Learnings from the Computer Vision for Global Challenges (CV4GC) initiative [slides]
Four papers accepted at CVPR 2019:
- Grounded Video Description (oral) [paper, ActivityNet-Entities dataset, code]
- Graph-Based Global Reasoning Networks [paper, code]
- DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition [paper, code]
- Less is More: Learning Highlight Detection from Video Duration [paper]
Co-organizing the Computer vision for Global Challenges (CV4GC) workshop at CVPR 2019
Paper accepted at PAMI: Focal Visual-Text Attention for Memex Question Answering [paper, MemexQA Dataset]
Paper accepted at AAAI 2019: Large-scale Visual Relationship Detection [paper, code]

2018 and older

Paper accepted at NeurIPS 2018: A^2-Nets: Double Attention Networks [paper, code]
Paper accepted at ECCV 2018: Multi-Fiber Networks for Video Recognition [paper, code]
Interactive visual art installation Do Silhouettes Dream? on display from July 26th till August 2nd 2017 at the ArtScience Museum in Singapore [ short paper, code, short interview]
New chapter: In February 2017 I joined Facebook Research in Menlo Park, California as a research scientist
Our work on Similarity Search at Flickr just launched [News coverage: The Verge, Engadget, Petapixel, Digital Trends and Venture Beat]
Best paper award at the LSCVS 2016 workshop at NeurIPS 2016 for Tag Prediction in Flickr: A view from the darkroom [paper]
Paper accepted at CHI 2017: Multimodal Classification of Moderated Online Pro-Eating Disorder Content [paper]

A very short personal bio

Grew up, lived and studied in Athens until 2015, with short breaks like a semester in Lund, a summer in Barcelona and a few summers in California. Moved to California in January 2015. Lived in Hayes Valley, San Francisco from 2015 to late 2017, and then at the wonderful "Roots and Branches" co-op in Oakland till late 2019. Returned to Europe the long way, via the Amazon jungle, Ghana, Costa Rica and many other places. Lived in Grenoble, under the Alps, from mid 2020 till 2023, mostly during the pandemic years. Enjoyed the mountains but missed the sun and sea. Moved to Barcelona in 2023. Deeply grateful to be sharing a home with Amanda.
Passionate about research, photography, filmaking, art, music, yoga, surfing and travelling.

A short research bio

I am currently working as a Principal Scientist at NAVER LABS Europe. I got my PhD in 2014 and I have been working at top-tier industrial reserch labs around the world since then. I have published scientific papers at all the top-tier venues of computer vision and AI (see the list of selected publications below, or visit my Google Scholar profile for a complete list and citation counts). A summary of my work experience follows:

- 2020 - now: Research scientist at Naver Labs Europe. Research on universal models for robotic perception, self-supervised representation learning, generalization and transfer learning, adaptation of large models with limited data/resources, multi-modal learning and some video understanding.

- 2017 - 2019: Research scientist at Facebook Research in Menlo Park. Research mostly focusing on video understanding, deep learning architecture modeling and vision and language.

- 2015 - 2017: Research scientist at Yahoo Research in San Francisco. Research powering the visual search feature on Flickr, collaborated with Stanford on the Visual Genome project, creating a dataset for modeling of vision and language.

- 2009 - 2014: PhD at the National Technical University of Athens on large-scale geometry indexing, nearest neighbor search and clustering. Supervised by Prof. Stefanos Kollias and Yannis Avrithis, working closely with my research brother Giorgos Tolias.

Socially impactful research: I am also very passionate about urging my research community tackle more socially impactful problems. I co-lead the Computer Vision for Global Challenges initative, and have organized the Computer Vision for Global Challenges workshop at CVPR 2019, and the Computer Vision for Agriculture (CV4A) workshop at ICLR 2020. I also co-led the organization of the Wiki-M3L 2022: Wikipedia and Multi-Modal & Multi-Lingual Research workshop at ICLR 2022. In 2024 I am helping co-organize the African Computer Vision Summer School (ACVSS) series. The inaugural occurrence was in Kenya, in July 2024.

Mentoring: I love to mentor students/younger researchers, especially from underrepresented groups in our field. Have been a mentor for the Deep Learning Indaba Mentorship Programme since 2021. Feel free to reach out to me directly if you are from an underrepresented group and think I can help you navigate your studies, research or next steps in AI . Happy to help.

Contact Details

email: ykalant(at)image.ntua.gr yannis.kalantidis(at)naverlabs.com

Selected Publications

Full list at my Google Scholar profile.

2025

M.B. Sarıyıldız, P. Weinzaepfel, T. Lucas, P. de Jorge, D. Larlus, Y. Kalantidis
DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers
CVPR 2025 [Project page] [code]

Vladan Stojnić, Y. Kalantidis, Jiri Matas, Giorgos Tolias
LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
CVPR 2025 [code] [HF demo]

2024

M.B. Sarıyıldız, P. Weinzaepfel, T. Lucas, D. Larlus, Y. Kalantidis
UNIC: Universal Classification Models via Multi-teacher Distillation
ECCV 2024 [Code]

V. Stojnić, Y. Kalantidis, G. Tolias
Label Propagation for Zero-shot Classification with Vision-Language Models
CVPR 2024 [Code]

Y. Kalantidis*, M.B. Sarıyıldız*, R.S. Rezende, P. Weinzaepfel, D. Larlus, G. Csurka
Weatherproofing Retrieval for Localization with Generative AI and Geometric Consistency
ICLR 2024 [Project page]

2023

J. Bertrand*, G. Kordopatis-Zilos*, Y. Kalantidis, G. Tolias
Test-time Training for Matching-based Video Object Segmentation
NeurIPS 2023 [project page]

M.B. Sariyildiz, K. Alahari, D. Larlus, Y. Kalantidis
Fake it till you make it: Learning transferable representations from synthetic ImageNet clones
CVPR 2023 [Project page]

M.B. Sariyildiz, Y. Kalantidis, K. Alahari, D. Larlus
No Reason for No Supervision: Improved Generalization in Supervised Models
ICLR 2023 (spotlight) [Project page] [code]

J. Bertrand, Y. Kalantidis, G. Tolias
Rethinking matching-based few-shot action recognition
SCIA 2023 (oral) & L3D-IVU @ CVPR2023 [Project page] [code]

2022

F. Baradel, T. Groueix, P. Weinzaepfel, R. Brégier, Y. Kalantidis, G. Rogez
PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling
TPAMI 2022 [code]

J. Almazan, B. Ko, G. Gu, D. Larlus, Y. Kalantidis
Granularity-aware Adaptation for Retrieval over Multiple Tasks
ECCV 2022 [Project page]

Y. Kalantidis, C. Lasance, J. Almazan, D. Larlus
TLDR: Twin Learning for Dimensionality Reduction
TMLR 2022 [code]

P. Weinzaepfel, T. Lucas, D. Larlus, Y. Kalantidis
Learning Super-Features for Image Retrieval
ICLR 2022 [code]

2021

F. Baradel, T. Groueix, P. Weinzaepfel, R. Brégier, Y. Kalantidis, G. Rogez
Leveraging MoCap Data for Human Mesh Recovery
3DV 2021 [Blog post]

M.B. Sariyildiz, Y. Kalantidis, D. Larlus, K. Alahari
Concept Generalization in Visual Representation Learning
ICCV 2021 [Project page] [code]

S. Chun, S.J. Oh, R.S. de Rezende, Y. Kalantidis, D. Larlus
Probabilistic Embeddings for Cross-Modal Retrieval
CVPR 2021 [code]

2020

Y. Kalantidis, M.B. Sariyildiz, N. Pion, P. Weinzaepfel, D. Larlus
Hard Negative Mixing for Contrastive Learning
NeurIPS 2020 [Project page] [Blog post] [slides]

B. Kang, S. Xie, M. Rohrbach, Z. Yan, A. Gordo, J. Feng, Y. Kalantidis
Decoupling Representation and Classifier for Long-Tailed Recognition
ICLR 2020

C.Y. Ma, Y. Kalantidis, G. AlRegib, P. Vajda, M. Rohrbach, Z. Kira
Learning to Generate Grounded Image Captions without Localization Supervision
ECCV 2020 [Project Page] [code]

2019

Y. Chen, H. Fan, B. Xu, Z. Yan, Y. Kalantidis, M. Rohrbach, Z. Yan, Y. Shuicheng, J. Feng
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution
ICCV 2019
[Numerous Opensource Implementations]
Y. Chen, M. Rohrbach, Z. Yan, Y. Shuicheng, J. Feng, Y. Kalantidis
Graph-Based Global Reasoning Networks
CVPR 2019
L. Zhou, Y. Kalantidis, X. Chen, J. Corso, M. Rohrbach
Grounded Video Description
CVPR 2019 (oral)
[ActivityNet-Entities dataset and code]
B. Xiong, Y. Kalantidis, D. Ghadiyaram, K. Grauman
Less is More: Learning Highlight Detection from Video Duration
CVPR 2019
Z. Shou, Z. Yan, Y. Kalantidis, L. Sevilla-Lara, M. Rohrbach, X. Lin, S.-F. Chang
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition
CVPR 2019
L. Jiang, L. Cao, Y.~Kalantidis, S. Farfade and A. Hauptmann
Focal Visual-Text Attention for Memex Question Answering
TPAMI 2019
J. Zhang, Y. Kalantidis, M. Rohrbach, M. Paluri A. Elgammal, M. Elhoseiny
Large-Scale Visual Relationship Understanding
AAAI 2019
[code]

2018

Y. Chen, Y. Kalantidis, J. Li, Y. Shuicheng, J. Feng
A^2-Nets: Double Attention Networks
NeurIPS 2018
Y. Chen, Y. Kalantidis, J. Li, Y. Shuicheng, J. Feng
Multi-Fiber Networks for Video Recognition
ECCV 2018
[code]

2017

R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D.A. Shamma, M. Bernstein and L. Fei-Fei
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
IJCV 2017
L. Jiang, Y. Kalantidis, L. Cao, S. Farfade, J. Tang and A. Hauptmann
Delving Deep into Personal Photo and Video Search
WSDM 2017
S. Chancellor, Y. Kalantidis, J. A. Pater, M. De Choudhury and D. A. Shamma.
Multimodal Classification of Moderated Online Pro-Eating Disorder Content
CHI 2017

2016

P. Garrigues, S. Farfade, H. Izadinia, Kofi Boakye and Y. Kalantidis
Tag Prediction in Flickr: A view from the darkroom
Large Scale Computer Vision systems Workshop at NeurIPS 2016 (Best paper award)
Y. Kalantidis, C. Mellina and S. Osindero
Cross-dimensional Weighting for Aggregated Deep Convolutional Features
Web-scale Vision and Social Media (VSM) Workshop, ECCV 2016
[code]
Y. Kalantidis, L. Kennedy, H. Nguyen, C. Mellina and D.A. Shamma
LOH and behold: Web-scale visual search, recommendation and clustering using Locally Optimized Hashing
Web-scale Vision and Social Media (VSM) Workshop, ECCV 2016
Y. Kalantidis, A. Farahat, L. Kennedy, R. Baeza-Yates and D.A. Shamma
Visual Congruent Ads for Image Search
ICPR 2016

2011 - 2015

Y. Avrithis, Y. Kalantidis, E. Anagnostopoulos and I. Z. Emiris
Web-scale image clustering revisited
ICCV 2015 (oral)
Y. Kalantidis and Y. Avrithis
Locally Optimized Product Quantization for Approximate Nearest Neighbor Search
CVPR 2014
[code]
G. Tolias, Y. Kalantidis, and Y. Avrithis
Towards large-scale geometry indexing by feature selection
CVIU 2014
Y. Kalantidis, L. Kennedy and L.-J. Li
Getting the Look: Clothing Recognition and Segmentation for Automatic Clothing Suggestions in Everyday Photos
ICMR 2013
Y. Avrithis and Y. Kalantidis
Approximate gaussian mixtures for large scale vocabularies
ECCV 2012
G. Tolias, Y. Kalantidis, and Y. Avrithis. Symcity: Feature selection by symmetry for large scale image retrieval. In ACM Multimedia (Oral paper) (ACM MM 2012), 2012.
Y. Avrithis, Y. Kalantidis, G. Tolias, and E. Spyrou. Retrieving landmark and non-landmark images from community photo collections. In ACM Multimedia (Oral paper) (ACM MM 2010), 2010.
Y. Avrithis, G. Tolias, and Y. Kalantidis. Feature map hashing: Sub-linear indexing of appearance and global geometry. In ACM Multimedia (Oral paper) (ACM MM 2010), 2010.
Y. Kalantidis, LG. Pueyo, M. Trevisiol, R. van Zwol, and Y. Avrithis. Scalable triangulation-based logo recognition. In International Conference on Multimedia Retrieval (ICMR), 2011.
Y. Kalantidis, G. Tolias, Y. Avrithis, M. Phinikettos, E. Spyrou, P. Mylonas, and S. Kollias. Viral: Visual image retrieval and localization. Multimedia Tools and Applications (MTAP), 2011.

Photography

Primarily shoot 35mm film and upload selected works on Flickr and recently also on Instagram.
Note: All rights for the photos below are reserved. Feel free to contact me for non-commercial, research or artistic uses, happy to provide them for such usecases.

Burning Man car souvlaki

Cascata Umana in Toscany

Half Dome Cables

Roadtripping on the Pacific Coast Highway, California

Ignore the boat

As tide goes by

Strolling through a rainy day

Pretty Lighthouse, Pretty Flame - Burning Man 2016

What is hip?

Stepping Stone

Double exposure, Bolinas, CA

(Street) music pays

Grandma turns 100

Freestyle acro

No parking any time. Yes jazzing any time

Paperback

Family on hammock

Archer

Favela superstar

One Way

Film

Have shot two films, an award winning short and a feature length documentary screened at the Athens International Film Festival (Νύχτες Πρεμιέρας) in 2015. Both are available online.

Love is Blind (2008)

Short film (8', black and white)
Official Selection:
- 11th International Panorama of Independent Film and Video Creators (Patra, Greece 2009)
- Athens Fantastic Film Festival 2009 (Ilioupoli, Athens, Greece 2009)
Awards:
- Audience Choice Award (Athens Fantastic Film Festival 2009, Ilioupoli, Greece)
- Online Fantastic Film Audience Choice Award (2010, www.bigbang.gr)
Blind Crossing (2015)

Documentary Feature Film (57', color)

Visit the IMDB page.
Official Selection:
- Athens International Film Festival (Athens, Greece, 2015)
- Chalkida DocFest (Chalikda, Greece, 2015)

Interactive art

Do Silhouettes Dream?

...is an interactive art installation.

It will be on display from July 26th till August 2nd 2017 at the ArtScience Museum in Singapore.

"Do Silhouettes Dream?" aims to place fascinating results of cutting-edge artificial intelligence in a public embodied experience. As participants stand in front of a projection screen, their silhouettes create an area of hallucination that changes with their movement. Hallucinations may vary; some are ``dreams'', transporting the area covered by the silhouette to past or future times. Others escape reality and either alter the visual artistic style or form ``deep dreams'', i.e. hallucinations created through a feedback loop on artificial deep neural networks.

More information can be found in this short paper, while code to reproduce the installation is available on Github.

The artwork was accepted in the Art Track of the Creativity and Cognition 2017 conference as part of the Microbites of Innovation showcase. Co-created with Clayton Mellina and Cheng Xu.

You may watch a short interview with Cheng and me here.

Some photos and videos here!

(*) Figures above feature a Creative Commons image courtesy of Mike Cartmell on Flickr and a frame from Singapore City Day-Night Time lapse, used with permission.

Music

Keyboard/piano player, lover of improvisational music (aka lazy to study). Have played with many bands [citations needed] spanning different genres, e.g. Greek rock with Amorfi Plektani, pop-rock with λaternative, jazz with the Silly Walks and improvisational folk-rock with Dimitris K. Karras.

Disturbing Music: In 2014, curated a web-radio series dedicated to experimental and improvisational music. All playlist can be found in the Disturbing Music Blog.

2023 Update: Played with the House Band at the CVPR 2023 reception. It was a lot of fun and definitely one of the largest gigs ever. So I have technically giged in New Orleans now, I guess :) - Here is a tweet with some basic proof.

Traveling

Lover of road trips, occasional car/van dweller (see also Alppy the Van below). Below is a map of countries visited with red dots denoting current and past homes.

Surfing

Hopelessly in love with surfing, despite being a (very-very-slowly-progressing) beginner. The map below shows some of the surf spots surfed over the years.

Alppy the Van

In March 2021, and after years of planning, we adopted a 2016 Opel Vivaro that was living quietly under the shadow of the Alps. During the Summer of 2021 and with the help of Dionysis the woodsmith, Alppy the Van (they/them) was transformed from a 9-seater van to a home. An old canine friend, Alppy Junior, is excited to be our loyal companion and up for any adventure!

Alppy News
> Mar 2021: Alppy the Van finds a new household
> Jul 2021: The first version of Alppy as a home is completed 😍
> Aug 2021: First roadtrip with Alppy around the Peloponnese, the West coast of Greece and Toscany, Italy
> Sept 2021: In less than 6 months, Alppy celebrates 10k kilometers with us 💗! Cheers to many, many more!

News

2025

2024

2023

2022

2021

2020

2019

2018 and older

A very short personal bio

A short research bio

Contact Details

Selected Publications

Full list at my Google Scholar profile.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2011 - 2015

Photography