Yash Kant

Hi! I am a Ph.D. student in Robotics Group and Department of Computer Science at University of Toronto, advised by Igor Gilitschenski. Previously, I was a Research Visitor at Georgia Tech advised by Devi Parikh and Dhruv Batra.

I want to build robots that specialize in doing day-to-day tasks by learning from simulations! Along these lines, for the past year or so, I have been working on projects at the intersection of Simulation, Embodied AI, and 3DV (stay tuned!).

I finished my undergraduate studies from Indian Institute of Technology Roorkee. I have interned at Microsoft, Bangalore and visited National University of Singapore twice as a research assistant.

If you have any questions / want to collaborate / discuss research, feel free to send me an email / schedule a chat! I enjoy talking to new people.

I am also looking for a research internship starting in Summer 2022 preferably in the areas of 3DV, and Simulation. Feel free to reach out for the same!

Email  /  CV  /  Github  /  Google Scholar  /  Twitter  /  LinkedIn

profile photo
Clean My House: Rearranging households without explicit instructions
Yash Kant, et al. (Coming Soon!)

For the past several months, I have been building a challenging task and the simulation pipeline in Habitat to support benchmarking household robots!

Within this task, an agent is spawned in a cluttered (untidy) house, and is tasked to locate misplaced objects, and rearrange them to their correct positions (receptacles) without any explicit instructions. To capture the rich diversity of real world scenarios, we support cluttering 15 household environments with 1800+ everyday 3D object models spread across 118 categories!

Contrast and Classify: Training Robust VQA Models
Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal
International Conference on Computer Vision (ICCV), 2021
Self-Supervised Learning Workshop at NeurIPS, 2020
arXiv / project page / code / slides

We propose a training scheme which steers VQA models towards answering paraphrased questions consistently, and we ended up beating previous baselines by an absolute 5.8% on consistency metrics without any performance drop!

Spatially Aware Multimodal Transformers for TextVQA
Yash Kant, Dhruv Batra, Peter Anderson, Jiasen Lu, Alexander Schwing, Devi Parikh, Harsh Agrawal
European Conference on Computer Vision (ECCV), 2020
VQA Workshop at CVPR, 2020
arXiv / project page / code / short talk / long talk / slides

We built a self-attention module to reason over spatial graphs in images. We ended up with an absolute performance improvement of more than 4% on two TextVQA bechmarks!

Automated Video Description for Blind and Low Vision Users
Aditya Bodi, Pooyan Fazli, Shasta Ihorn, Yue-Ting Siu, Andrew T Scott, Lothar Narins,
Yash Kant, Abhishek Das, Ilmi Yoon
CHI Extended Abstracts 2021

We built a system to automatically generate descriptions for videos and answer blind and low vision users’ queries on the videos!



I borrowed this template from Jon Barron's website.