I want to build robots that specialize in doing day-to-day tasks by learning from simulations! Along these lines, for the past year or so, I have been working on projects at the intersection of Simulation, Embodied AI, and 3DV (stay tuned!).
For the past several months, I have been building a challenging task and the simulation pipeline in Habitat to support benchmarking household robots!
Within this task, an agent is spawned in a cluttered (untidy) house, and is tasked to locate misplaced objects, and rearrange them to their correct positions (receptacles) without any explicit instructions. To capture the rich diversity of real world scenarios, we support cluttering 15 household environments with 1800+ everyday 3D object models spread across 118 categories!
We propose a training scheme which steers VQA models towards answering paraphrased questions consistently, and we ended up beating previous baselines by an absolute 5.8% on consistency metrics without any performance drop!
We built a system to automatically generate descriptions for videos and answer blind and low vision users’ queries on the videos!
Adding Complement Objective Training to Pythia: I experimented with adding
Complement Objective Training in FAIR's vision and language framework Pythia and also wrote a
report on my findings here,
the code is here.