MiniApp Leaderboard

Data

MiniAppBench is the first comprehensive benchmark designed to evaluate principle-driven, interactive application generation. Unlike prior benchmarks that emphasize static UI layouts or isolated algorithmic code snippets, MiniAppBench targets MiniApps—HTML-based applications that require both faithful visual rendering and non-trivial interaction logic.

The dataset is split into two subsets: validation (100 instances) and test (400 instances), and can be accessed at MiniAppBench dataset. The validation set includes publicly available evaluation references to support reproducible experiments, while the test set keeps the references hidden to enable unbiased evaluation.

Leaderboard

All results shown on this leaderboard are evaluated on the test split of MiniAppBench.

Submit

Submission requirements

Please sign in with Hugging Face before submitting.
One submission per user per day (UTC).
Upload a .zip file only.
The .zip must contain the HTML outputs for the test set queries.
- Each file should be named using the query index: <index>.html (e.g., 1.html, 2.html, ...).
We may contact you via email for verification and request additional materials. Please be prepared to provide:
- Model access (one of the following):
  - Preferred: an inference API endpoint we can use to reproduce the results.
  - Alternatively: model checkpoints (ckpts) plus clear deployment / inference instructions (environment, dependencies, and how to run).
- A related paper, if available (e.g., an arXiv link or a PDF).
After you submit, we will update the results within 3 days.

Model name

Model family

Upload zip (.zip only)