The Walled Garden of the Surveilled Web

Abstract

The open web is not disappearing because publishing has become impossible; it is disappearing because discovery is being absorbed into vendor specific information environments. Google is the central case because of its dominance in web search, but the pattern is broader: crawlers, indexes, operating systems, browsers, assistants, DNS resolvers, VPNs, advertising systems, and policy processes are converging into private gardens that present themselves as the web. Findability inside these gardens depends less on public availability than on compatibility with their measurement, monetization, legal, and editorial machinery. Once LLMs train on and retrieve through those filtered layers, exclusion no longer affects only search traffic; it shapes the corpus from which future answers are generated.

The point is not to catalogue individual removals, demotions, or misclassifications. A catalogue would turn the argument into a list of symptoms. The concern here is the generic behavioral shift observed over decades: discovery moves from open publication toward measurable participation in infrastructure controlled by a small number of intermediaries.

The excluded layer is what is often called the small web: personal sites, independent archives, hobbyist documentation, technical notes, volunteer projects, and other noncommercial knowledge that can be public without being easily measured.

Googlecentrismus

This article’s Googlecentrismus is intentional because the risk is asymmetrical. Google does not merely operate one search interface among many; it dominates worldwide web search, with StatCounter reporting 90.04% global search engine share in April 2026, and it describes Search as a fully automated system whose crawlers regularly explore the web, discover pages, and add them to a large index¹. Exclusion from Google is therefore not a failure to appear in one directory. It is exclusion from the dominant public discovery layer.

Google is not unique in kind. A short list of other engines also tries to maintain broad web discovery infrastructure: Bing, Yandex, Baidu, and local engines with substantial national presence such as Naver and Seznam. They crawl, index, rank, filter, and govern documents through similar classes of mechanisms, although at smaller global scale or within narrower linguistic, national, or commercial markets². The argument keeps this Googlecentrismus because Google is where these pressures are largest, not because the pattern belongs only to Google.

The recent rise of alternative search engines does not remove this structure. One family is front end, metasearch, or hybrid search: DuckDuckGo³ largely sources traditional links and images from Bing while combining them with its own crawler and specialised sources; Kagi combines its own indexes, anonymised calls to major providers, specialised engines, and manually shaped ranking tools⁴; Brave Search uses its own index for web search results⁵.

The pattern also appears when the direction is reversed. Brave began from the browser side and added Brave Search in 2021⁶. DuckDuckGo ties search to its own browser and subscription VPN⁷. Kagi has Orion, a privacy focused browser⁸. Better incentives do not change the structural convergence: search, browser, and network surfaces move into one information environment.

These systems can improve incentives, privacy, and ranking taste, but they remain constrained by the same scarcity: there are only a few large web indexes to draw from.

The other family is independent crawling with smaller or deliberately scoped indexes. Yep maintains its own web index through Ahrefs infrastructure⁹; Wiby explicitly says it is not meant to index the entire web and prefers human seeded discovery¹⁰; Marginalia is an independent open source search engine that foregrounds obscure, noncommercial, and small web discovery¹¹.

These engines matter because they preserve alternative discovery routes. They do not yet change the fact that exclusion at Google scale still defines practical invisibility for most readers.

The hard problem for small web search is not coverage alone. At the signal level, a human niche website and an automated LLM generated site full of nonsense can look uncomfortably similar: small traffic, few backlinks, no institutional domain, irregular publication, weak metadata, and little external authority. Large search engines tend to resolve this ambiguity through popularity and authority proxies, which can bury both together.

Small web engines often resolve it through curation. Kagi’s public small web list accepts personal feeds under rules that exclude automated, LLM generated, and spam content¹²; Marginalia uses a public GitHub submission repository for sites to be added before crawl cycles¹³; Wiby accepts individual page submissions under broad qualitative rules and states that, in most cases, only the submitted page will be crawled¹⁴.

This is valuable, but it is a different model: public and community aided selection at small scale, not a general ranking system that can safely distinguish rare human work from synthetic filler across the whole web.

The Index as Measurement System

A search engine appears to be a catalogue: crawlers discover documents, ranking systems estimate relevance, and users receive an ordered list of results. That image is too simple for the modern web. Search ranking is also a measurement problem. A page is not only evaluated by its text, links, structure, and reputation; it is evaluated through observed user behavior, page experience data, engagement patterns, freshness signals, and the larger commercial surface around the site.

This changes the political economy of visibility. A document can exist, be publicly reachable, contain original expertise, and still fail to become visible if it does not participate in the measurement layer. The failure is not necessarily a manual act of suppression. It can be an ordinary consequence of a system in which the absence of telemetry is difficult to distinguish from the absence of value.

The central problem is therefore not that Google has built a crude content filter. The problem is subtler: Google has built an information environment in which the most measurable web becomes the most findable web, and the most findable web becomes the basis for future measurement. Once search, advertising, browser telemetry, and LLM grounding reinforce one another, the index ceases to represent the open web as such. It represents the portion of the web compatible with Google’s instruments.

Surveillance Compatibility

The first wall is surveillance compatibility. Modern ranking systems draw value from behavioral observation: clicks, long clicks, return behavior, navigational paths, popularity signals, and field data from real browsers. Public reporting from the Google search antitrust record and the 2024 Content API Warehouse leak describes systems such as NavBoost and Chrome derived fields that track user interaction and site level visibility¹⁵.

The Chrome UX Report shows the same structure in a more public form. CrUX is a dataset of real world Chrome user experience data. Google states that it is used by Google Search to inform the page experience ranking factor, and its methodology excludes origins and pages that do not meet a sufficient popularity threshold¹⁶. A site cannot submit itself into this dataset by being technically correct, carefully written, or socially useful. It must first be observed often enough by eligible Chrome users.

The client layer is not limited to the browser. In Google’s case, Android and the surrounding Google services create another operating system level telemetry surface. Android usage and diagnostics can report how the device is used and how it works, including app use frequency and network connection quality; Web & App Activity can save activity from Google services and, when enabled, include Chrome history and activity from sites, apps, and devices that use Google services¹⁷.

The surveillance surface is broader than browser telemetry. CDNs, shared JavaScript and CSS libraries, hosted fonts, analytics endpoints, performance probes, centralized DNS resolvers, and VPN operators all observe fragments of the user’s path before any site local tracker is considered.

PageSpeed Insights reinforces this infrastructure model by combining CrUX field data with Lighthouse diagnostics, while Lighthouse flags scripts and stylesheets that block first paint and points developers toward delivery, caching, deferral, and dependency changes¹⁸. The technical advice is often sound as performance engineering; the political consequence is that speed optimization normalizes dependence on shared infrastructure whose operators can observe traffic patterns.

This is network level surveillance rather than page level tracking. The vendor does not need JavaScript inside the document: control over resolution, delivery, or tunneling can reveal domains, requested assets, destination addresses, timing, and traffic volume. DNS sees names; CDNs see the requests they terminate or serve; VPNs see destinations and flow metadata. Even when page bodies remain encrypted, these traces can be enough to reconstruct what users actually read at site, topic, or document granularity.

DNS is the clearest example because the resolver does not need the page body to learn the shape of browsing. A public resolver such as 1.1.1.1 is in a position to observe client address metadata and the names being resolved; encrypted DNS protects the query from the local network but can centralize the same information at the resolver operator.

The privacy concern is not theoretical: the earlier DNS over TLS article framed the problem as a shift of surveillance from the local provider to public DNS operators rather than as a complete removal of metadata leakage¹⁹.

OS integrated privacy relays and VPNs make the problem sharper because they can turn routing into credentialed routing. Apple’s iCloud Private Relay is not a general VPN, but it is sold as an easy privacy layer inside Apple operating systems; Apple tells site operators that Private Relay validates the client as an Apple device, validates that the customer has a valid iCloud+ subscription, and presents a coarse location through relay IP addresses²⁰. VPN by Google is similarly built into supported Pixel devices and depends on account and device eligibility²¹. In parallel, Google Play Integrity exposes app, device, and account verdicts²², while Apple App Attest certifies that a key belongs to a valid app instance²³. The user sees a privacy switch; the site may see traffic with claims backed by the vendor.

The same pattern can extend from device legitimacy to age and identity claims. Apple already exposes Wallet based age and identity verification for apps that need to prove a person’s age or identity²⁴. Google Wallet’s digital credentials flow can give apps and websites cryptographically signed identity and age attributes, including threshold claims such as 16+, 18+, or 21+, depending on the relevant content and jurisdiction²⁵. The extension is small: a relay or VPN built into an operating system and vendor account can be expanded beyond a privacy pipe. It can become a channel through which a visited site receives a vendor backed statement that the user is old enough, is not a minor in the relevant jurisdiction, or is located where the vendor says the user is located. The visited site can trust that statement because it is attached to the same account, device, platform, and network stack that already mediates access.

LLM mediation adds another surveillance surface. Search engines, browsers, and privacy products increasingly include chatbots, page summarizers, answer boxes, and browser assistants. The issue is not limited to the largest vendors: DuckDuckGo offers Duck.ai, Brave offers Leo, and Kagi offers Assistant and Summarize²⁶. Even when such services are designed with privacy constraints, the interaction changes what can be observed. A vendor can learn the query, the URL or text sent for summarization, the page selected for explanation, and the follow up questions that reveal what the reader considered important.

The conflict over ad blockers intensifies this sorting. Chrome’s Manifest V3 migration removed webRequestBlocking for most extensions and pushed blocking extensions toward declarativeNetRequest²⁷. Google’s extension timeline records Manifest V2 being disabled for ordinary users in 2025²⁸.

For technically savvy audiences, the practical response is migration toward privacy oriented Chromium forks, such as Ungoogled Chromium²⁹, or toward the Firefox ecosystem and its own forks.

The distinction barely matters for ranking telemetry: both paths make browsing less visible to Chrome derived datasets at precisely the same time that the sites these readers value are often small, noncommercial, and unique. A technical niche can therefore become doubly invisible: its publishers do not instrument users, and its readers leave the browser whose telemetry feeds the measurement layer.

This produces the privacy paradox in its shortest form: every choice can be reasonable, yet their combination creates a structural absence in a ranking system that treats observed behavior as evidence of usefulness.

Monetization Compatibility

The second wall is monetization compatibility. The commercial web is rich in signals: advertising impressions, analytics events, search engine optimization campaigns, sponsored distribution, public relations, social amplification, affiliate incentives, and institutional backlinks. These signals are not merely decorative. They create the surrounding evidence by which relevance, authority, popularity, and demand become legible to search systems.

The small web is structurally weaker on this axis. Independent research, hobbyist expertise, small technical archives, dissenting analysis, and volunteer documentation often lack budgets for SEO, publicity, paid discovery, or continuous content operations. Their value is concentrated in the document itself, not in the commercial machinery around it.

Google’s market position makes this more than an ordinary ranking imperfection. United States federal courts have found unlawful monopoly conduct in Google search and search advertising, and in Google’s open web advertising technology markets³⁰.

The legal findings matter because the same company controls the dominant search engine, the dominant browser, major advertising infrastructure, a major mobile platform, YouTube, and Gemini. When the measurement layer and the monetization layer sit inside the same corporate structure, the visible web tends to converge with the profitable web.

This does not require a simple claim that using Google advertising directly buys organic ranking. The mechanism is broader. Sites built for commercial capture usually generate the surrounding evidence that search systems can read: attention, links, returning users, structured marketing data, and institutional recognition. Sites built for independence generate less of that evidence. The ranking system can remain formally neutral while the signal economy is not neutral at all.

State Removal and Private Editorial Power

The third wall is state removal. Governments can ask Google to remove content from search, YouTube, Blogger, Play, and other services. Google publishes a Transparency Report for these requests, which makes this the most visible form of exclusion³¹.

Visibility, however, does not make the mechanism harmless. A search index with global reach must decide how to handle local illegality, national security claims, defamation complaints, hate speech laws, election rules, copyright demands, and court orders. Each decision shifts the practical boundary of accessible knowledge.

The fourth wall is private editorial power. Unlike government removal, private intervention is much harder to observe. Public reporting has documented manual adjustments, blacklists, and special handling for sensitive search surfaces³².

Some interventions may be justified as spam control, fraud prevention, election integrity, child safety, copyright compliance, or protection against coordinated manipulation. The problem is not that every intervention is illegitimate. The problem is that the public cannot see the criteria, the complainants, the internal deliberation, or the failed appeals.

This asymmetry creates a hierarchy of audibility. States have formal channels. Large corporations have counsel, lobbyists, and platform contacts. Major publishers can create reputational pressure. Organized activists can create public controversy. Independent authors, small scale researchers, and privacy preserving communities usually have none of these channels. They can be excluded by policy, by silence, or by the ordinary indifference of a platform too large to answer them.

The Convergence

These mechanisms do not operate as a checklist. They compound into a Catch-22: a page needs measurable traffic to earn visibility, but it needs visibility to produce the measurable traffic that ranking systems treat as evidence. Privacy preserving and noncommercial publication starts at the disadvantaged end of that loop: weak telemetry, weak commercial trace, and little institutional leverage.

This is why the garden is difficult to perceive from the inside. A user who searches Google sees results, snippets, knowledge panels, maps, videos, and LLM summaries. The experience feels abundant. Missing material has no visual form. A search result page does not show the documents that were never indexed, the sites below telemetry thresholds, the pages removed after complaints, or the knowledge sources that never became visible enough to train future systems. Absence is rendered as completion.

LLM Amplification

The stakes change once search becomes an input to LLMs. LLMs are trained on web documents, and systems grounded by search use live search results to attach current information to generated answers. Google’s Gemini 1.5 technical report states that its pretraining data includes “web documents”, and Google Cloud documentation describes grounding Gemini responses with Google Search³³.

This is not merely dependence on inherited search indexes. LLM companies build their own crawlers around the same crawl everything ambition at web scale: discover as much public web as policy, robots rules, and economics permit; classify pages; decide which sources may be used to train models; decide which sources may ground answers; then expose the filtered corpus through a conversational interface.

OpenAI’s crawler documentation is a clean example because it defines separate robots for search visibility and training data discovery: OAI-SearchBot is used for ChatGPT search, while GPTBot is used for content that may enter foundation model training³⁴. OpenAI’s ChatGPT Atlas shows the same direction from the browser side: a Chromium based browser with ChatGPT built in, where browsed content can be used for model training when the user enables the relevant training controls, while pages that opt out of GPTBot are excluded from that training path³⁵.

The search index and the training corpus are not identical, but search is a major discovery, filtering, and grounding layer. Systematic exclusions in search therefore become easier to reproduce inside LLM systems.

The effect is larger than one company’s model. Search engines have long served as discovery infrastructure for crawlers, researchers, ranking tools, and downstream information systems. When Google reduces access to deeper result sets, as happened with the widely reported removal or disablement of the num=100 search parameter in September 2025, the long tail becomes harder to inspect at scale³⁶. Systems that depend on Google Search as a discovery layer inherit the shape of Google’s visibility boundary.

For LLMs, the consequence is not only that some pages receive less traffic. The deeper consequence is epistemic homogenization. If models learn from and retrieve through a web filtered by surveillance compatibility, commercial signal production, state pressure, and private editorial judgment, then model outputs will overrepresent the same surfaces. The LLM mediated information layer will appear broad while narrowing toward mainstream, institutionally legible, commercially amplified sources.

This matters most for knowledge that is rare rather than popular: primary source reconstruction, independent technical research, unfashionable legal analysis, small language documentation, local history, dissenting institutional critique, and other work that may be valuable precisely because it is not produced inside the machinery of attention.

The Walled Garden

Google is the main case in this article, but the garden form is per vendor. Each vendor combines its own discovery system, client software, network surface, assistant layer, monetization model, and policy process into a distinct information bubble. Google’s version has four walls: surveillance compatibility, monetization compatibility, state removal, and private editorial power.

The first rewards sites and audiences that can be observed, including through operating system telemetry, browser telemetry, DNS, CDN, VPN, browser assistant, and LLM summarization surfaces. The second rewards the commercial signal economy. The third reflects formal legal and political pressure. The fourth reflects opaque platform judgment and private influence.

Each wall can be defended in isolation. User behavior can improve relevance; commercial signals can correlate with legitimacy; state removal can enforce law; editorial intervention can reduce abuse. The danger lies in their combination under monopoly conditions. Together they transform the open web into a filtered environment whose boundaries are invisible to the ordinary user.

Afterwords

The old promise of the web was that publication and findability could be separated from institutional permission. A person, small group, or independent organization could publish knowledge, and a search engine could make it discoverable to anyone who needed it. That promise did not require perfect equality of attention, but it required a meaningful path from public availability to public discovery.

The surveilled web breaks that path. Visibility increasingly belongs to content that produces behavioral telemetry, commercial traces, legal safety, and editorial acceptability. Content outside those conditions may remain online while becoming absent from the practical information layer. Once LLMs are trained on and grounded through that layer, the exclusion no longer affects only today’s search traffic³⁷. It shapes the knowledge that future systems can retrieve, summarize, and treat as real.

The walled garden is therefore not merely a search problem. It is a problem of epistemic infrastructure. A web ordered by surveillance, monetization, state pressure, and opaque editorial judgment cannot truthfully present itself as the open web. It is a curated garden with private walls.

The newer pressure is that legal and fraud risk can make the price of admission literal. A legitimate site that must prove age, detect bots, prevent fraud, satisfy payment rules, or respect jurisdictional duties may find it safer to accept only traffic that arrives with claims certified or notarized by a platform vendor. Utah’s S.B. 142 is a clean example of that direction because its definition of an app store is not limited to Apple or Google storefronts. It covers a publicly available website, software application, or electronic service that allows users to download third party apps onto a mobile device; within that distribution channel, the provider verifies age category and shares age category and parental consent status with developers, while developers verify that status through the app store’s data sharing methods³⁸. The exact duty varies by country, state, and content category, but the incentive points toward vendor authenticated access.

Nobody has to force users into one garden or several gardens for this to matter. The pressure can be indirect. Banks, government portals, airlines, booking sites, ticketing systems, marketplaces, and other services with high liability may decide that uncertified traffic is too expensive to handle because it carries legal, fraud, bot, or abuse risk. The result would be an Internet of gardens: not a formal ban on the open web, but a practical rule that important services work only through vendor controlled browsers, operating systems, wallets, relays, or attestation channels.

StatCounter, Search Engine Market Share Worldwide, reported Google at 90.04% worldwide search engine share for April 2026; Google Search Central, In-depth guide to how Google Search works, describes Search as an automated crawler based system that discovers pages, downloads content, indexes it, and serves results. ↩︎
Microsoft Support, How Bing delivers search results, describes Bing crawling the web, building an index, and ranking results; Yandex Webmaster, How does Yandex search work?, describes crawling, indexing, database construction, and result generation; Baidu Help Center, About Baiduspider, describes Baiduspider as the crawler that creates Baidu’s index; IndexNow, Documentation for search engines, describes participating search engines as having noticeable presence in at least one market. ↩︎
DuckDuckGo, Where do DuckDuckGo search results come from?, describes its use of Bing for traditional links and images, plus DuckDuckBot, internal indexes, and specialised sources. ↩︎
Kagi, Search Sources, describes its own indexes, external provider calls, specialised engines, and small web initiatives. ↩︎
Brave, Brave Search removes last remnant of Bing from search results page, describes Brave Search as using its own index for web search results. ↩︎
Brave, Brave Search beta now available in Brave browser, June 22, 2021, describes Brave Search as available in Brave Browser and built on an independent index. ↩︎
DuckDuckGo, Does DuckDuckGo make a browser?, describes the DuckDuckGo Browser for Mac, Windows, iOS, and Android; DuckDuckGo, What is DuckDuckGo VPN?, describes its subscription VPN and browser based management. ↩︎
Kagi, Orion Browser by Kagi, describes Orion as a privacy focused WebKit browser with zero telemetry. ↩︎
Ahrefs, Where do you get the data from?, states that Yep maintains its index using a separate crawler called YepBot. ↩︎
Wiby, Build your own Search Engine, states that Wiby is not meant to index the entire web and prefers human submissions. ↩︎
Marginalia Search, About Marginalia Search, describes itself as an independent open source Internet search engine focused on discovery for the free and independent web. ↩︎
Kagi, Kagi Small Web, maintains a public feed list whose inclusion rules reject automated, LLM generated, and spam content. ↩︎
Marginalia Search, Submit websites to be crawled by Marginalia Search, documents a public GitHub based site submission process. ↩︎
Wiby, Submit to the Wiby Web, describes page level submission rules and states that in most cases only the submitted page will be crawled. ↩︎
Rand Fishkin, SparkToro, An Anonymous Source Shared Thousands of Leaked Google Search API Documents with Me; Everyone in SEO Should See Them, May 27, 2024; Mike King, iPullRank, Secrets from the Google Algorithm Leak: Search’s Internal Engineering Documentation and What it Means, May 2024. These sources describe the leaked Content API Warehouse material, including NavBoost, chromeInTotal, and Chrome related fields; they are evidence of collection and system design, not a public statement of exact ranking weights. ↩︎
Google Chrome Developers, Overview of CrUX and CrUX methodology, describe CrUX as field data from Chrome users, state that Google Search uses it for the page experience ranking factor, and note that origins and pages below the popularity threshold are not included. ↩︎
Google Android Help, Share usage & diagnostics information with Google, describes Android usage and diagnostics data, including app use frequency and network connection quality; Google Account Help, Find & control your Web & App Activity, describes Web & App Activity and the optional inclusion of Chrome history and activity from sites, apps, and devices that use Google services. ↩︎
Google for Developers, About PageSpeed Insights, describes PSI as using CrUX real user data and Lighthouse lab data; Chrome for Developers, Eliminate render-blocking resources, describes Lighthouse flagging render blocking scripts and stylesheets, and recommending inlining, deferring, or removing resources. ↩︎
On centralized DNS privacy concerns, see Enforcing DNS-over-TLS on Local DNS Resolver with Random Upstream, especially the “Privacy Concerns” section, which cites Geoff Huston and Bert Hubert on the privacy consequences of browser and application DNS centralization. ↩︎
Apple Developer, Prepare your network or web server for iCloud Private Relay, describes Private Relay as validating that the client is an Apple device, that the customer has a valid iCloud+ subscription, and that relay IP addresses represent coarse location. ↩︎
Google Pixel Phone Help, Connect to VPN by Google on your Pixel device, documents Pixel VPN account and device eligibility requirements. ↩︎
Android Developers, Integrity verdicts, documents Play Integrity verdicts for app, device, and account state. ↩︎
Apple Developer, Establishing your app’s integrity, describes App Attest as certifying that a key belongs to a valid instance of an app. ↩︎
Apple Developer, Get started with the Verify with Wallet API, describes age and identity verification through IDs stored in Apple Wallet, including “Age Over N Flag” and age in years as available request data. ↩︎
Google for Developers, Verify with Google Wallet, describes online requests for verifiable proof of identity and age from Google Wallet or another compliant wallet; the Online Acceptance of Digital Credentials guide uses age_over_18 as an example credential request, not as a universal age threshold. ↩︎
DuckDuckGo, Duck.ai, describes private conversations with third party LLM chat models and text summarization; Brave, Brave Leo, describes a browser assistant that can summarize pages, translate, analyze text, and chat with a tab; Kagi, Kagi Assistant and Kagi Summarize, describe LLM assistant and summarization surfaces. ↩︎
Google Chrome Developers, chrome.webRequest, states that Manifest V3 no longer makes webRequestBlocking available to most extensions and points developers toward declarativeNetRequest. ↩︎
Google Chrome Developers, Manifest V2 support timeline, states that Manifest V2 was disabled for all Chrome channels with Chrome 138 on July 24, 2025 and ceases to function for users upgrading to Chrome 139 and later. ↩︎
The Ungoogled Chromium project describes itself as Chromium without dependency on Google web services and as a drop in Chromium replacement with privacy, control, and transparency changes; see ungoogled-software/ungoogled-chromium. ↩︎
The U.S. Department of Justice, Department of Justice Wins Significant Remedies Against Google, September 2, 2025, summarizes the search monopoly remedies after the August 2024 liability ruling; the Department also reported its ad tech victory in Department of Justice Prevails in Landmark Antitrust Case Against Google, April 17, 2025. ↩︎
Google, Government requests to remove content, reports formal state requests and Google’s responses across product areas and jurisdictions. ↩︎
The Wall Street Journal, How Google Interferes With Its Search Algorithms and Changes Your Results, November 15, 2019; Mike Wacker, Google’s Manual Interventions in Search Results, describes internal blacklists and manual intervention mechanisms using public reporting and leaked internal material. ↩︎
Gemini Team, Gemini 1.5: Unlocking multimodal understanding across millions of tokens, arXiv:2403.05530, describes web documents as part of the pretraining data mixture; Google Cloud, Grounding with Google Search, documents the use of Google Search as a grounding source for Gemini responses. ↩︎
OpenAI, Overview of OpenAI Crawlers, describes OAI-SearchBot as controlling appearance in ChatGPT search results and GPTBot as crawling content that may be used for foundation model training. ↩︎
OpenAI, Introducing ChatGPT Atlas, describes ChatGPT Atlas as a browser with ChatGPT built in; OpenAI Help Center, Setting up the Atlas browser, describes Atlas as a Mac browser built on Chromium; OpenAI Help Center, ChatGPT Atlas - Data Controls and Privacy, describes model training controls for browsed content and the GPTBot opt out. ↩︎
Search Engine Journal, Google Modifies Search Results Parameter, Affecting SEO Tools, September 15, 2025; Botify, What Google’s Removal of num=100 Means for Your Brand, October 15, 2025, describe the disruption caused by Google’s removal or disablement of the 100 results per page parameter. ↩︎
The practical trigger for writing this version was mundane: about a month before publication, this site disappeared from Google’s index entirely. That incident is not used here as proof; it is only a local example of the larger visibility problem. ↩︎
Utah Legislature, S.B. 142, App Store Accountability Act, signed March 26, 2025, defines an app store as a publicly available website, software application, or electronic service that allows users to download third party apps onto a mobile device; it requires app store providers to verify age categories, provide age category and parental consent status to developers, and requires developers to verify age category and consent status through the app store’s data sharing methods. ↩︎