Overview: This site contains controlled experiments testing how Large Language Models (LLMs) process and extract information from web pages. Each study examines different aspects of LLM behavior when interacting with structured data, formatting variations, and content presentation methods.
Research Question: Do LLMs benefit from JSON-LD schema markup when extracting product information, or can they extract equivalent data from plain text alone?
Methodology: Five products tested across four variants (text only, aligned text+schema, schema with extra facts, and conflicting data) to measure extraction accuracy and source prioritization.
View Study Details →Research Question: How does text formatting (tables, lists, plain paragraphs) affect LLM information extraction accuracy?
Methodology: Testing whether structured HTML formatting helps LLMs parse information more accurately than unformatted text blocks.
View Study Details →