Why the encyclopedia became more important to the machine than the website

When a company thinks about its visibility on the internet, Wikipedia usually does not make the priority list. That is understandable: a Wikipedia article seems secondary compared with the company’s own website, blog, advertising, or SEO. But for answer systems, the hierarchy looks very different.

An analysis of 680 million ChatGPT citations from August 2024 through June 2025 showed that, among the top 10 most cited sources, Wikipedia accounts for nearly half — 47.9% [1]. This is not an accident. All major language models — ChatGPT, Gemini, Claude, Llama — were trained on corpora in which Wikipedia was intentionally given extra weight. The Google C4 dataset, one of the core training sets, deliberately increased Wikipedia’s share relative to other web sources [2]. And in June 2025, ChatGPT became Wikipedia’s top traffic referrer — creating a symbiotic loop in which AI cites the encyclopedia and users click back through to it [3].

For a brand, that means something concrete: if the company has a high-quality Wikipedia page, the answer system gets a reliable, neutral, verified source for entity identification. If there is no page, the model is forced to assemble information from less structured and less authoritative sources — and the result will be less accurate.

Wikidata: the brand’s machine-readable passport

Wikipedia is a text encyclopedia for people. Wikidata is a structured database for machines. Every entry in Wikidata has a unique identifier (Q-ID), which is used to anchor an entity unambiguously. Google Knowledge Graph draws directly from Wikidata [4]. When an answer system encounters a brand name, it first checks whether there is an entry for it in the knowledge graph — and that is where Wikidata becomes a critical link.

Unlike Wikipedia, Wikidata does not impose strict notability requirements. A company that cannot get a Wikipedia article because it lacks sufficient media coverage can still create a Wikidata entry: specify the type of organization, industry, founder, products, and official website. That is enough to give the machine a stable identifier and a set of basic attributes.

Brands without a Wikidata entry face a structural disadvantage. The answer system first checks whether the entity exists in the knowledge graph, and only then decides whether the site’s content is worth citing. If that check fails, the model will be more cautious in recommendations — or bypass the brand entirely [5].

Knowledge Graph: the map AI uses to navigate

Google Knowledge Graph is not a standalone product but an infrastructure layer on which Knowledge Panel, AI Overviews, and AI Mode are built. It contains billions of entities and trillions of relationships among them. When a user asks a question, AI does not simply search for relevant documents — it first identifies entities through the knowledge graph and then selects sources for the answer.

For a brand, that means inclusion in the Knowledge Graph is not a bonus but a foundation. Without it, the answer system has to spend additional compute resources just to understand who you are. Researchers call this a “comprehension budget”: the cheaper it is for the machine to identify your entity, the higher the probability of citation [5].

What to do right now

Check whether the brand is present in Wikidata (wikidata.org). If there is no entry, create one with the basic properties: P31 (entity type), P452 (industry), P856 (official website), P112 (founder). This takes 15–30 minutes and requires no technical skills.

If the brand meets Wikipedia’s notability criteria, prepare or improve the article. If it does not, do not force it: Wikidata already provides a basic level of identification. Make sure the site’s Schema.org markup (Organization, sameAs) points to the Wikidata Q-ID and other official profiles. That creates a closed identification loop that is easiest for the knowledge graph to verify.

Maintain consistency: the brand’s name, description, and category should be the same in Wikidata, on the website, in Google Business Profile, and across all external directories.

What seems well established

Wikipedia is the most cited source in ChatGPT and the second most frequent across all LLMs. Wikidata feeds directly into the Google Knowledge Graph. Brands with a Wikidata entry have a structural advantage when answer systems identify an entity.

What still remains uncertain

The exact weight of Wikipedia and Wikidata relative to other trust signals varies by platform and is not fully disclosed. Having a Wikipedia page does not guarantee citation — the quality and freshness of the article also matter.

What this changes in practice

Creating or improving a Wikidata entry is one of the fastest and least expensive ways to strengthen a brand’s machine identification. It is a “15 minutes of work with potentially long-term effects” kind of action.

Sources

[1] Semrush / Status Labs. Analysis of 680M ChatGPT citations: Wikipedia at 47.9% of top-10. 2025
[2] Status Labs. How AI Models Use Wikipedia as a Truth Anchor. 2026
[3] ALLMO. Wikipedia-ChatGPT symbiotic loop: ChatGPT became Wikipedia's top referrer, June 2025
[4] Google. Knowledge Graph documentation; Wikidata as primary source. 2026
[5] LinkSurge. Entity Authority and AI Search Visibility. 2026

Related materials

Research article 7 min

External authority versus the brand’s own site: which sources really create the right to be recommended

Which external signals and independent sources help a brand earn the right to be recommended in AI answers — and why the brand's own site without them is not enough.

Open the material →
Guide 8 min

Practical action map: how to strengthen a brand’s machine distinctness

Six sequential steps for improving AI visibility: from identity verification through language reassembly and trust contour to monitoring.

Open the material →
Next step

Check how AI identifies your brand

Presence in the knowledge graph is the foundation of visibility. AI100 tests whether the model names your brand in neutral scenarios. The report shows whether the system links your entity to the right category.

Open the sample report →