About
I completed the Ph.D in Computer Science (June-2020) at School of Computing & Information Systems (SCIS), Singapore Management University (SMU) (Rank 81 Overrall, Rank 4 in Software Engineering Research on CSRanking) under the supervision of Prof. Lingxiao Jiang. I published papers at top-tier academic conferences across different domains in Computer Science, such as Software Engineering (ICSE , ESEC/FSE, ASE), Artificial Intelligence (AAAI), Natural Language Processing (EMNLP, ACL), Information Retrieval (SIGIR).
I’m also an active open-source contributor, with the majority of my work available on my Github. Notable projects include CodeTF (~1500 stars), CodeT5+ (~2400 stars), CodeCapybara, The Vault.
Throughout my research career, I’ve had the honor of working with brilliant minds and talents from SOAR Group - SMU, FSoft AI Center, Huawei Ireland Research Center, Salesforce AI Research.
(Past) affiliations:
- Head of AI, FPT Software AI Center, Viet Nam
- Senior Research Scientist, Salesforce AI Research, worked with Prof. Steven Hoi.
- Principal Research Scientist, Huawei Ireland Research Center, worked with Prof. Yijun Yu.
Research Interests
I am deeply passionate about the future of AI for Software Development (AI4Code, AI4Software) as I believe that software engineering is a crucial skill in our evolving economy and a key to achieving human-level artificial intelligence. Software influences nearly every modern device, and my goal is to create tools and conduct research that assist developers with real-world software engineering tasks. My work involves developing algorithms to train and fine-tune Large Language Models for code (CodeLLMs). Additionally, I explore the integration of CodeLLMs with multi-agent systems and traditional program analysis techniques. This innovative approach aims to create coding assistants that seamlessly integrate into the software development lifecycle, enhancing the developer experience. In summary, my research is structured around 4 pillars:
- Foundation: Developing large language models tailored for coding (CodeLLMs) to set the groundwork for further enhancements.
- Optimization: Refining CodeLLMs to address challenges like hallucinations and security issues, enhancing trustworthiness, and establishing benchmarking standards.
- Application: Applying and refining CodeLLMs to software engineering tasks such as code generation, code search, code summarization, program synthesis, automated bug detection & program repair, code migration, software testing, etc.
- Integration: Seamlessly integrating these models into the software development life cycle to foster effective collaboration between human developers and AI-driven tools, including IDE extensions and low-code/no-code platforms.
Highlighted Publications
- CodeT5+: Open Code Large Language Models for Code Understanding and Generation, EMNLP 2023
[on Marktechpost]. - CodeTF: One-Stop Transformer-based Library for CodeLLMs
[on YCombinator, Syncedreview, Marktechpost]. - TreeCaps:Tree-based Capsule Networks for Source Code Processing, AAAI 2021.
- InferCode: Self-Supervised Learning of Code Representations by Predicting Subtrees, ICSE 2021.
- Self-Supervised Learning for Code Retrieval and Summarization through Semantic-Preserving Program Transformations, SIGIR 2021.