How much of the crawled web is a JavaScript shell?
Sample of 1,041,733 real (200 text/html) pages from CC-MAIN-2026-08 (88 WARC files, 12,000/file) · generated 2026-06-21
empty app mount
(threshold-free)
0.45%
text-based shells
(<300c + marker)
1.21%
content in inline JSON
(not a shell)
0.9%
A shell is still plenty of HTML: shells average 52.9 KB of raw HTML (bundles, markup) versus 137.8 KB across all pages, but almost none of it is readable text. Pages whose content actually sits in inline JSON (Next.js __NEXT_DATA__ and friends) are counted as content-present, not shells.
Year over year (same February crawl, 2025 vs 2026)
The threshold-free empty-mount rate rose from 0.38% (2025, 1,175,276 pages, 40 files) to 0.45% (2026, 1,041,733 pages, 88 files), +18% relative. The text-based estimate moved 0.94% → 1.21% over the same window. Both signals point the same way: JavaScript shells are a growing share of the crawled web. The jump is sharpest among popular sites: the top-1k shell rate went 1.6% → 2.47%.
How the text-based estimate depends on the cutoff (sensitivity)
This is why a single "tiny text" cutoff isn't trustworthy on its own: raise it and you sweep in short real pages that use a framework. The headline number to trust is the empty app mount (0.45%), which needs no cutoff: a server-rendered page fills its mount, a client-rendered one leaves it empty.
Confirmed shells by framework (% of all crawled pages)
Unattributed SPA
5,904 0.57%
jQuery (onload)
2,962 0.28%
Framework prevalence across all crawled pages (% of pages)
Shell rate by site popularity (Majestic rank of the domain)
top 1k
2.47% of 60,292 pages
1k-10k
2.76% of 65,157 pages
10k-100k
2.06% of 77,958 pages
100k-1M
1.21% of 139,061 pages
unranked
0.86% of 699,265 pages
% = share of that tier's crawled pages that are confirmed shells. The counts are pages, not sites: one popular domain contributes many crawled pages, so the top-1k tier still holds tens of thousands of pages. The tiers partition the whole sample by the Majestic rank of each page's domain.
Top registered domains among confirmed shells
| pixnet.net | 188 |
| gov.co | 112 |
| com.br | 105 |
| go.jp | 99 |
| qq.com | 92 |
| co.uk | 90 |
| tvp.pl | 83 |
| europa.eu | 81 |
| parktons.com | 79 |
| imgur.com | 79 |
| co.jp | 77 |
| discord.com | 76 |
| bsky.app | 74 |
| co.kr | 67 |
| siemens.com | 60 |
| gov.au | 58 |
| usda.gov | 56 |
| com.au | 53 |
| ligazakon.net | 48 |
| err.ee | 41 |
Example shells (visible-text chars · URL)
Method. Streams Common Crawl WARC files and keeps 200 text/html responses. Two shell measures: (1) empty app mount — the page's framework container (<div id="root">, <app-root>, etc.) is present but empty in the captured HTML, the definitive client-rendered-not-server-rendered signature, with no text threshold; this is the trustworthy headline. (2) text-based — visible text under a cutoff AND a client-render signature; reported with a sensitivity sweep because the cutoff matters. Content hiding in inline JSON (Next.js __NEXT_DATA__) counts as content-present, not a shell. Detection is from raw HTML, so the by-framework split is approximate and the rates are sample estimates, conservative by design. The exact list of WARC files used is in cc-shell-survey.json for independent spot-checking.