Schema Markup & LLM Information Retrieval - Test Results

Test Date: December 2, 2025

LLM Tested: Google Gemini 3 Pro Preview

Test Objective: Determine whether structured data (JSON-LD schema) helps or hinders LLM information extraction from web pages.

Research Question

Do Large Language Models benefit from JSON-LD schema markup when extracting product information from web pages?

The Debate

Many developers believe that schema markup (JSON-LD) provides no value for LLMs since these models can already extract information from visible text. This experiment directly tests that assumption by measuring extraction accuracy across four different data presentation scenarios.

Test Design

We created controlled product pages with identical information presented in four different ways:

Variant A: Text Only

All product data visible in HTML tables. No JSON-LD schema markup included.

Variant B: Text + Schema Aligned

Visible text and JSON-LD contain identical information. Tests dual-source access.

Variant C: Vague text, detailed schema

Visible text is vague or generic, while JSON-LD schema contains specific details. Tests whether LLMs retrieve precise information from structured markup.

Variant D: Conflict

Visible text conflicts with JSON-LD schema values. Tests source prioritization.

Test Questions

Each variant was asked five questions about the product:

Test Product

FluxClean 2000 Tire Degreaser - A fictional product with clearly defined attributes perfect for controlled testing.

Results

Overall Performance

Overall Accuracy: 100% 18/18 questions correct

By Variant:

Detailed Results

Variant Question Expected Actual Result
A SKU FC-2000-RED FC-2000-RED ✓ Correct
A Price $47.99 $47.99 ✓ Correct
A Colors Red, Black Red, Black ✓ Correct
A Size 5L 5L ✓ Correct
A Brand FluxClean FluxClean ✓ Correct
B SKU FC-2000-RED FC-2000-RED ✓ Correct
B Size 5L 5L ✓ Correct
B Brand FluxClean FluxClean ✓ Correct
C SKU FC-2000-RED FC-2000-RED ✓ Correct
C Price $47.99 47.99 USD ✓ Correct
C Colors Red, Black Red, Black ✓ Correct
C Size 5L 5L ✓ Correct
C Brand FluxClean FluxClean ✓ Correct
D SKU FC-2000-RED FC-2000-RED ✓ Correct
D Price $49.99 $49.99 ✓ Correct
D Colors Red, Black Red, Black ✓ Correct
D Size 5L 5L ✓ Correct
D Brand FluxClean FluxClean ✓ Correct

Model Transparency

Gemini 3 Pro Preview provided confidence explanations for each answer. Example from Variant C (Vague text, detailed schema):

"While the visible text only states 'Multiple colors available', the JSON-LD schema explicitly lists the colors as 'Red' and 'Black'."

This demonstrates the model is actively accessing and distinguishing between schema markup and visible text.

Test Product Pages

View the actual product pages used in this study:

Test Methodology


Experiment Design & Testing: December 2025
Test code available on request. Full dataset includes 5 products × 4 variants × 5 questions = 100 total test cases.

← Back to Research Studies Home