MMAI 2026

Emma Wang

Modeling Multimodal AI 2026
Homework, final project, and random thoughts

Bio

Hi, I’m Emma, a junior majoring in Computation and Cognition at MIT. I do research on engineering glutamate responsive fMRI probes. Outside of class, I am a varsity sailor and enjoy eating, snowboarding, rock-climbing, and singing.

multimodal AI coursework experiments computer vision LLMs / VLMs

Final Project — EEG2Video

Auditing and improving EEG-to-video generation models, benchmarking cross-modal reconstruction quality and evaluating alignment between neural signals and generated visual content.

Group Repo My Fork

Homework

Homework 1 — Dataset

Curated and analyzed the BUSI breast ultrasound and BCSS histopathology datasets, exploring preprocessing, annotation pipelines, and evaluation strategies for multimodal oncology AI.

Datasets / Multimodal Analysis

Homework 2 — Multimodal Fusion

Implemented and compared four fusion architectures (Early, Late, Tensor, LMF) combining ResNet-50 image embeddings and mask geometry features on the BUSI dataset; Late Fusion achieved best accuracy at 91.45%.

Multimodal Fusion

Homework 3 — Vision-Language Models

Fine-tuned Qwen2.5-VL-3B-Instruct on BUSI breast ultrasound images for lesion classification using LoRA, exploring how VLMs perform on specialized medical imaging tasks with limited data.

Vision-Language Models / Fine-tuning

Homework 4 — Reinforcement Learning for VLMs (GRPO)

Implemented GRPO advantage computation and rule-based reward functions to train Qwen3-VL-2B-Instruct on BUSI. Best run (lr=5e-5, temp=0.7, LoRA r=32) achieved 48.1% accuracy vs. 21.8% zero-shot baseline, with 100% format compliance.

Reinforcement Learning / GRPO

Homework 5 — Multimodal AI Agents

Built a human-in-the-loop clinical decision-support agent for breast ultrasound triage using smolagents and Qwen2.5-VL-3B. Designed a custom HITL benchmark, evaluated against adversarial tasks, and instrumented with Langfuse for trace-level observability.

Agentic AI / HITL Systems

Highlights

HW1 — BUSI dataset class distribution: benign, malignant, and normal

HW1: sample ultrasound images with masks

HW1 — Sample BUSI images with segmentation masks across classes

HW2 — PCA of ResNet-50 image embeddings vs. mask geometry features

HW2 — Fusion method comparison: Late Fusion wins at 91.45%

HW3 — LoRA fine-tuning runs: simpler setups generalized better on small datasets

HW3 — Pre-trained Qwen2.5-VL baseline: 75% accuracy, with benign over-prediction

HW4 — GRPO sweep: run3 (lr=5e-5, temp=0.7, r=32) achieved best last-5 avg reward of 1.500

HW4 — GRPO-trained model: 48.1% accuracy and 100% format compliance vs. 21.8% zero-shot baseline

HW5 — BUSI HITL Agent: VLM-backed smolagents loop with clinician review, BI-RADS reference, and PubMed search

HW5: text-only vs vision agent comparison

HW5 — Text-only vs. vision agent: same 2/5 pass rate, but vision flips which tasks succeed

About This Site

Repository

Source code lives at github.com/greenMangoes13/mmai.

Built from the Academic Project Page Template and adapted into a course portfolio / homework hub.

License

This site content is licensed under Creative Commons Attribution-ShareAlike 4.0 International .