Shot in Tamarindo, Costa Rica

Sicong Jiang

PhD @ McGill | Research Intern @ Google DeepMind

I am a final-year PhD candidate at McGill University and currently a Research Intern at Google DeepMind Autonomous Agent Team led by Edward Grefenstette, focusing on self-improving agents. Previously, I led research at Abaka AI as a Founding Scientist, where I architected and scaled data pipelines that produced mission-critical datasets for several frontier AI labs.

My research addresses the fundamental challenge of building reliable AI agents through the lens of automated evaluation and reward modeling. I develop structured benchmarks and reward models—such as EditReward (ICLR 2026) and AgentThink (EMNLP 2025)—to enhance long-horizon reasoning and robustness. My goal is to design high-fidelity feedback loops that enable agents to achieve robust, scalable self-improvement.

📢 Actively looking for full-time Research Scientist / MLE roles starting in Fall 2026. Happy to connect!

Scholar • LinkedIn • GitHub • X • • Email • Resume

Jun 2026

🎉 ChartNet featured in MIT News! Grateful for the fantastic collaboration with the MIT-IBM Lab.

Mar 2026

🚀 Excited to join Google DeepMind as a Research Intern in London, UK.

Feb 2026

🎉 Two papers accepted by CVPR 2026 (one Main + one Findings). Check ChartNet, EgoTL.

Feb 2026

🎉 Two papers accepted by ICRA 2026. Check FASIONAD+, MTRDrive.

Jan 2026

🎉 One paper accepted by ICLR 2026. Check EditReward.

Nov 2025

🎉 One paper accepted (oral) by Bridge Program of AAAI 2026.

Aug 2025

🎉 One paper accepted by EMNLP 2025. Check AgentThink.

Aug 2025

🤝 Joined 2077AI-Foundation—thrilled to contribute to the AI open-source community!

Jul 2025

🚀 Joined Abaka AI as a Founding Technical Member in Palo Alto, California.

Jul 2025

🎉 One paper accepted by ICCV 2025 Foundation Models for AD Workshop. Check VLA4AD Survey.

Mar 2025

✉️ Invited to contribute to Humanity's Last Exam, an AGI reasoning benchmark.

Feb 2025

🎉 One paper accepted by ICLR 2025 Trustworthy LLM Workshop. Check SparseAttack-LLM4TS.

Jan 2025

🎉 One paper accepted by AISTATS 2025. Check Attack-LLM4TS.

* indicates equal contribution. For full list, visit Google Scholar.

AI Agents, Benchmarks & Evaluation

EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing

K. Wu*, S. Jiang*, M. Ku, P. Nie, M. Liu, W. Chen
ICLR 2026
Website • Paper • GitHub ⭐ 138

AgentThink: Tool-Augmented Reasoning in VLMs for Autonomous Driving

K. Qian*, S. Jiang*, Y. Zhong*, Z. Luo, Z. Huang, et al.
EMNLP 2025
Website • Paper • GitHub ⭐ 142

VeriWeb: Verifiable Long-Chain Web Benchmark for Agentic Information-Seeking

2077AI Team
Under review, 2025
Website • Paper • GitHub ⭐ 86 • HuggingFace Daily Paper #2

EgoTL: Egocentric Think-Aloud Chains for Long-Horizon Tasks

L. Liu, D. Li, Y. Liang, S. Jiang, H. Vijay, H. Hu, et al.
CVPR 2026 Findings
Website • Paper

Foundation Models: Robustness, Safety & Applications

A Survey on Vision–Language–Action Models for Autonomous Driving

S. Jiang*, Z. Huang*, K. Qian*, Z. Luo, T. Zhu, et al.
ICCV Workshop, 2025
Paper • GitHub ⭐ 579 • Tech Channel Report

Adversarial Vulnerabilities in Large Language Models for Time Series Forecasting

F. Liu*, S. Jiang*, L. Miranda-Moreno, S. Choi, L. Sun
AISTATS 2025
Paper • GitHub ⭐ 15

FASIONAD+: Enhanced Safety in Autonomous Driving with Adaptive Feedback

Z. Luo*, S. Jiang*, K. Qian*, Z. Huang, J. Miao, et al.
ICRA 2026
Paper

MTRDrive: Memory-Tool Synergistic Reasoning for Robust Autonomous Driving in Corner Cases

Z. Luo*, K. Qian*, J. Wang, Y. Luo, J. Miao, Z. Fu, Y. Wang, S. Jiang, Z. Huang, et al.
ICRA 2026
Paper

Communication-Aware Reinforcement Learning for Cooperative Adaptive Cruise Control

S. Jiang, S. Choi, L. Sun
TRB Annual Meeting (Oral), 2024
Paper

Research Intern

Google DeepMind

Mar 2026 – Present · London, UK

Mentors: Jingling Li, Lucia Lopez Rivilla, and Edward Grefenstette

Self-evolving Agents: Developing self-contained training and inference frameworks where agents iteratively refine their reasoning paths and behavior capabilities through self-generated feedback loops.

Agentic Post Training: Designing execution-grounded reward mechanisms and online reinforcement learning pipelines to internalize verification trajectories, mitigating hallucination and enhancing robust zero-shot self-correction.

Director of Research

Abaka AI

Aug 2025 – Feb 2026 · Palo Alto, CA, United States

Research: As a founding member of the Research team, I lead benchmarking and evaluation for agentic and multimodal LLMs. I led the EditReward (ICLR'26) project and co-developed large-scale benchmarks including SuperGPQA (NeurIPS'25), ChartNet (CVPR'26), EgoTL (CVPR'26) and VeriWeb.

Advanced Dataset & Pipeline Design: Led several zero-to-one pipeline builds—architecting and deploying high-difficulty dataset solutions and production pipelines from scratch across coding, IMO-level math, multimodal data, agentic trajectories, and RL environments. These datasets and pipelines are directly used for model training and evaluation for several frontier AI labs.

Core Contributor

2077AI Open Source Foundation

Aug 2025 – Mar 2026

As a core contributor, conducting substantial research across benchmarks, datasets, and agent evaluation for the open-source community.

Agent Evaluation: Led research on agent evaluation and training datasets, focusing on long-horizon reasoning, tool use, and self-evolving agent capabilities.

Multimodal Image Datasets: Led multimodal dataset research for image generation, including preference data and evaluation frameworks for alignment and controllability.

Applied Scientist Intern

Mercor

May 2025 – Aug 2025

Multimodal Data Pipelines: Built data pipelines and multi-stage QA systems for multimodal LLM projects, overseeing large-scale annotation workflows and label consistency.

Dataset Quality & Validation: Conducted analysis and validation to refine annotations and ensure robust datasets for LLM post-training.

Research Assistant

McGill University

Jan 2022 – May 2025 · Montreal, QC, Canada

AgentThink (Agent Reasoning): Led a collaboration with Xiaomi and Tsinghua on tool-augmented reasoning for vision-language models in autonomous driving, achieving +54% answer accuracy on open-source models.

Adversarial LLM4TS: Developed a black-box attack framework and public benchmarks for LLM-based time-series forecasting, in collaboration with the Amazon Chronos and Nixtla teams.

Research Assistant

Georgia Institute of Technology

Aug 2019 – Dec 2020 · Atlanta, GA, United States

Multi-Agent RL Exploration: Developed a multi-agent search strategy combining MADDPG with frontier-based exploration, and built evaluation benchmarks for exploration efficiency.

Awards

2024

McGill Engineering Doctoral Award (MEDA)

2021

TISED Doctoral Recruitment Award (DRA), McGill University

2019

Outstanding Graduate of Liaoning Province; Most Influential Graduate, Northeastern University

2017

National 1st Prize, China Undergraduate Mathematical Contest in Modeling

2017

1st Class Academic Scholarship, Northeastern University

Academic Service

Workshops Organizer

Conferences Reviewer

Advances in Neural Information Processing Systems (NeurIPS)
International Conference on Learning Representations (ICLR)
International Conference on Artificial Intelligence and Statistics (AISTATS)
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
International Conference on Computer Vision (ICCV)
Conference on Language Modeling (COLM)
Conference on Empirical Methods in Natural Language Processing (EMNLP)
Association for the Advancement of Artificial Intelligence (AAAI)
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
IEEE International Conference on Robotics and Automation (ICRA)
IEEE Intelligent Transportation Systems Conference (ITSC)

Journals Reviewer

IEEE Robotics and Automation Letters (RA-L)
Transportation Research Part C: Emerging Technologies (TRC)
IEEE Transactions on Intelligent Transportation Systems (T-ITS)

I enjoy music by Tyler, the Creator, SZA and Chappell Roan.

Sometimes I also listen to Taylor Swift, Olivia Rodrigo and 9m88.

My favorite influencer is Allywoo on RedNote.

Cat: Bobo, a golden shaded British Shorthair who is good at programming with buttons.