Test Summary: Text Formatting Impact on LLM Data Extraction

Test Date: December 2, 2025

LLM Tested: Google Gemini

Test Objective: Determine whether text structure and formatting significantly impacts an LLM's ability to extract specific product information from web pages.

Test Setup

Two identical product pages were created for UltraShield Pro Ceramic Coating, containing the same information but formatted differently:

AI-Optimized page screenshot

Page A: AI-Optimized Structure

Long-form page screenshot

Page B: Long-Form Text Blocks

Page A: AI-Optimized Structure

File: ultrashield/a.html

Format Characteristics:

Page B: Long-Form Text Blocks

File: ultrashield/b.html

Format Characteristics:

Data Extraction Request

Gemini was asked to extract the following information from both pages:

Results

Gemini extraction from optimized page

Gemini Output: Page A (Structured)

Gemini extraction from long-form page

Gemini Output: Page B (Long-Form)

Test Files


← Back to Test Suite Home