AI Construct Assembly Benchmark

GeneBench

Comparing AI models' ability to assemble experimentally verified constructs. Benchmarked against verified Addgene plasmids as the 100% accuracy control.

Models Tested

500

Plasmids per Model

100%

Addgene Verified

Quality vs Performance

Compare accuracy, speed, and cost across AI models

Click points to select

Loading chart...

Filters

Provider

Min Accuracy

Any

0%100%

Chart Axes

Y-Axis

X-Axis

Legend

Claude Opus 4

Claude Sonnet 4

GPT-5

GPT-4o

Gemini 3 Pro

Gemini 3 Flash

DeepSeek V3

Llama 4 405B

Model Leaderboard

Ranked by assembly accuracy across 500 verified Addgene plasmids

Methodology

#	Model	Accuracy	Speed	Cost	Errors
1	Claude Opus 4Anthropic	97.8%	45s	$0.12	3
2	GPT-5OpenAI	96.5%	52s	$0.15	5
3	Gemini 3 ProGoogle	95.8%	38s	$0.08	6
4	Claude Sonnet 4Anthropic	94.2%	28s	$0.04	8
5	DeepSeek V3DeepSeek	92.1%	42s	$0.01	11
6	GPT-4oOpenAI	91.3%	35s	$0.05	12
7	Llama 4 405BMeta	90.8%	55s	$0.03	13
8	Gemini 3 FlashGoogle	89.5%	18s	$0.02	15

Benchmarks conducted using verified Addgene plasmid sequences as ground truth.

Learn about our methodology