What Is Reused From The Paper
- Success progression by feedback round (R0 to R5).
- Success rates by benchmark task category.
- Error-stage perspective based on SAP validation order.
ABAP Code Generation Benchmark
SAP ABAP LLM MODEL TESTING
This dashboard benchmarks LLMs specifically on SAP ABAP code generation across 180 tasks and 10 repetitions per task, with up to 5 feedback iterations. Baseline methodology comes from the original paper: Benchmarking Large Language Models for ABAP Code Generation (2601.15188) .
Only fully evaluated models are shown (Max rounds tested = 6). Sort by any column and filter by model name.