MMAI 2026
Bio
Hi, I’m Emma, a junior majoring in Computation and Cognition at MIT. I do research on engineering glutamate responsive fMRI probes. Outside of class, I am a varsity sailor and enjoy eating, snowboarding, rock-climbing, and singing.
Homework
Homework 1 — Dataset
Curated and analyzed the BUSI breast ultrasound and BCSS histopathology datasets, exploring preprocessing, annotation pipelines, and evaluation strategies for multimodal oncology AI.
Datasets / Multimodal AnalysisHomework 2 — Multimodal Fusion
Implemented and compared four fusion architectures (Early, Late, Tensor, LMF) combining ResNet-50 image embeddings and mask geometry features on the BUSI dataset; Late Fusion achieved best accuracy at 91.45%.
Multimodal FusionHomework 3 — Vision-Language Models
Fine-tuned Qwen2.5-VL-3B-Instruct on BUSI breast ultrasound images for lesion classification using LoRA, exploring how VLMs perform on specialized medical imaging tasks with limited data.
Vision-Language Models / Fine-tuningHomework 4 — Reinforcement Learning for VLMs (GRPO)
Implemented GRPO advantage computation and rule-based reward functions to train Qwen3-VL-2B-Instruct on BUSI. Best run (lr=5e-5, temp=0.7, LoRA r=32) achieved 48.1% accuracy vs. 21.8% zero-shot baseline, with 100% format compliance.
Reinforcement Learning / GRPOHomework 5 — Multimodal AI Agents
Built a human-in-the-loop clinical decision-support agent for breast ultrasound triage using smolagents and Qwen2.5-VL-3B. Designed a custom HITL benchmark, evaluated against adversarial tasks, and instrumented with Langfuse for trace-level observability.
Agentic AI / HITL SystemsHighlights
HW1 — BUSI dataset class distribution: benign, malignant, and normal
HW1 — Sample BUSI images with segmentation masks across classes
HW2 — PCA of ResNet-50 image embeddings vs. mask geometry features
HW2 — Fusion method comparison: Late Fusion wins at 91.45%
HW3 — LoRA fine-tuning runs: simpler setups generalized better on small datasets
HW3 — Pre-trained Qwen2.5-VL baseline: 75% accuracy, with benign over-prediction
HW4 — GRPO sweep: run3 (lr=5e-5, temp=0.7, r=32) achieved best last-5 avg reward of 1.500
HW4 — GRPO-trained model: 48.1% accuracy and 100% format compliance vs. 21.8% zero-shot baseline
HW5 — BUSI HITL Agent: VLM-backed smolagents loop with clinician review, BI-RADS reference, and PubMed search
HW5 — Text-only vs. vision agent: same 2/5 pass rate, but vision flips which tasks succeed
About This Site
Repository
Source code lives at github.com/greenMangoes13/mmai.
Built from the Academic Project Page Template and adapted into a course portfolio / homework hub.
License
This site content is licensed under Creative Commons Attribution-ShareAlike 4.0 International .
Site Manifest
@misc{wang_mmai_2026,
title = {MMAI 2026},
author = {Emma Wang},
howpublished = {\url{https://greenMangoes13.github.io/mmai/}},
note = {Modeling Multimodal AI 2026 course site: homework, final project, and notes},
year = {2026}
}