Hi, I am a PhD student at École polytechnique fédérale de Lausanne (EPFL), working on machine learning and computer vision under the supervision of Prof. Amir Zamir at VILab. Currently, I'm most interested in uncovering System-2–like thinking abilities in vision and multimodal models, for example, understanding how a model can leverage additional compute at test time to "think more". More broadly, I am curious about how System-1–style generalizable representation learning and System-2–style reasoning and adaptation can be effectively integrated, and how such capabilities might relate to and differ from human cognition.
Previously, I completed my master's studies at ShanghaiTech, working with Prof. Xuming He. I studied semantic segmentation under distribution shifts and explored topics like test-time adaptation, domain generalization and uncertainty estimation.