Context
I wanted to understand how AI-powered search tools — things like ChatGPT browsing, Perplexity, and Google’s AI Overviews — actually crawl and interpret a static site. Most SEO advice is still written for traditional search engines. Does structured data actually help AI crawlers, or are they just parsing the raw HTML?
The question: if I add JSON-LD schema markup to a simple Hugo site, does it measurably change how AI tools understand and reference the content?
Setup
I built a minimal Hugo site with three types of structured data:
- Person schema on the about page and site-wide
- BlogPosting schema on every post (headline, datePublished, author, keywords)
- BreadcrumbList schema for navigation context
The site uses semantic HTML throughout — <article>, <nav>, <main>, <time> — and avoids JavaScript-rendered content entirely. Every page is plain server-rendered HTML.
For testing, I used three approaches:
- Google’s Rich Results Test to validate the schema markup
- Manual queries in ChatGPT and Perplexity to see how they reference the content
- Checking server logs for crawler user-agent strings
Observations
The structured data validated cleanly in Google’s testing tool. All three schema types were detected and parsed without errors.
AI crawlers showed up in server logs within 48 hours of deploying: GPTBot, ClaudeBot, and PerplexityBot all made requests. They crawled the sitemap first, then individual pages.
When I asked ChatGPT and Perplexity questions related to my content, the results were mixed. Perplexity was more likely to cite the site directly and pull accurate summaries. ChatGPT referenced the content but sometimes paraphrased loosely.
Pages with clearer heading structure and explicit summaries in the description meta tag performed better across both tools. The JSON-LD didn’t seem to hurt, but the plain HTML structure appeared to matter more for actual content extraction.
Takeaways
Structured data is table stakes — it helps, but it’s not magic. The real drivers of AI crawlability seem to be:
- Clean semantic HTML — headings, paragraphs, lists. No clever div-soup.
- Explicit descriptions — both in meta tags and in the opening paragraph of content.
- Fast, static delivery — AI crawlers respect robots.txt and seem to prefer sites that respond quickly.
- Sitemap availability — this is how crawlers discover pages. Without it, they rely on link following.
I’ll run this experiment again in a few months to see if the landscape changes. The AI search space is moving fast.