How much of the crawled web is a JavaScript shell?

Sample of 1,041,733 real (200 text/html) pages from CC-MAIN-2026-08 (88 WARC files, 12,000/file) · generated 2026-06-21
pages sampled
1,041,733
empty app mount
(threshold-free)
0.45%
text-based shells
(<300c + marker)
1.21%
content in inline JSON
(not a shell)
0.9%
A shell is still plenty of HTML: shells average 52.9 KB of raw HTML (bundles, markup) versus 137.8 KB across all pages, but almost none of it is readable text. Pages whose content actually sits in inline JSON (Next.js __NEXT_DATA__ and friends) are counted as content-present, not shells.

Year over year (same February crawl, 2025 vs 2026)

empty mount
2025
0.38%
empty mount
2026
0.45%
change
+18%
The threshold-free empty-mount rate rose from 0.38% (2025, 1,175,276 pages, 40 files) to 0.45% (2026, 1,041,733 pages, 88 files), +18% relative. The text-based estimate moved 0.94% → 1.21% over the same window. Both signals point the same way: JavaScript shells are a growing share of the crawled web. The jump is sharpest among popular sites: the top-1k shell rate went 1.6% → 2.47%.

How the text-based estimate depends on the cutoff (sensitivity)

< 50 chars
0.44%
< 100 chars
0.72%
< 200 chars
0.99%
< 300 chars
1.21%
< 500 chars
1.76%
< 1000 chars
3.57%
This is why a single "tiny text" cutoff isn't trustworthy on its own: raise it and you sweep in short real pages that use a framework. The headline number to trust is the empty app mount (0.45%), which needs no cutoff: a server-rendered page fills its mount, a client-rendered one leaves it empty.

Confirmed shells by framework (% of all crawled pages)

Unattributed SPA
5,904 0.57%
jQuery (onload)
2,962 0.28%
Next.js
1,272 0.12%
AngularJS
774 0.07%
Angular
766 0.07%
React
464 0.04%
Vue
213 0.02%
Svelte
99 0.01%
Nuxt
68 0.01%
Preact
23 0%
Ember
13 0%
SolidJS
2 0%

Framework prevalence across all crawled pages (% of pages)

jQuery
698,299 67.03%
Bootstrap
195,843 18.8%
Next.js
24,988 2.4%
React
21,902 2.1%
Vue
16,384 1.57%
Nuxt
7,183 0.69%
AngularJS
5,363 0.51%
Angular
4,800 0.46%
Svelte
1,580 0.15%
Preact
367 0.04%
Ember
315 0.03%
SolidJS
97 0.01%

Shell rate by site popularity (Majestic rank of the domain)

top 1k
2.47% of 60,292 pages
1k-10k
2.76% of 65,157 pages
10k-100k
2.06% of 77,958 pages
100k-1M
1.21% of 139,061 pages
unranked
0.86% of 699,265 pages
% = share of that tier's crawled pages that are confirmed shells. The counts are pages, not sites: one popular domain contributes many crawled pages, so the top-1k tier still holds tens of thousands of pages. The tiers partition the whole sample by the Majestic rank of each page's domain.

Top registered domains among confirmed shells

pixnet.net188
gov.co112
com.br105
go.jp99
qq.com92
co.uk90
tvp.pl83
europa.eu81
parktons.com79
imgur.com79
co.jp77
discord.com76
bsky.app74
co.kr67
siemens.com60
gov.au58
usda.gov56
com.au53
ligazakon.net48
err.ee41

Example shells (visible-text chars · URL)

jQuery (onload)
119c http://blog.zorangagic.com/2018/12/investment-compound-interest-formula-to.html
107c http://jejutheatre.com/bbs/password.php?w=x&bo_table=tl_gallery&comment_id=35&page=
209c http://mathstem.pbf.hr/?page_id=493
83c http://remarketp.co.kr/bbs/board.php?bo_table=tel&wr_id=374&sst=wr_datetime&sod=asc&sop=and&page=12
Unattributed SPA
58c http://certusfoodsafety.com/product-finder128d.html?field_onsale=1&search_api_fulltext=&sort_by=nid&sort_order=ASC&states=0&search_key=1
0c http://kearnymesachryslerdodgejeepram.com/lander
117c http://www.donsvintagecats.com/groupsleds.html
227c http://www.tomdeater.com/photos/kathyha/
React
102c http://sophrometz.com/
66c https://my.matterport.com/show/?m=tgLa1QrHaSy
13c https://play.hubspotvideo.com/v/53/id/194333715110?playButtonColor=ff4800&renderContext=onload-placeholder&parentOrigin=https%3A%2F%2Fbr.hubspot.com&pageId=194851552376&locale=pt-br
23c https://powered-by-13676465520.us-west1.run.app/
AngularJS
8c http://www.esehospitaldesantotomas-atlantico.gov.co/noticias/ansiedad
133c https://bip.malopolska.pl/mopsiwrdt,m,300077,zarzadzenia-dyrektora-mopsiwr.html
210c https://code.google.com/archive/p/gdata-scala-client
44c https://fama.us.es/discovery/fulldisplay/alma991005475999704987/34CBUA_US:VU1
Angular
36c https://axiompublishers.scholasticahq.com/articles?tag=academic%20development
184c https://clinicaltrials.gov/study/NCT02846857
13c https://lasteekraan.err.ee/1609912169/luise-ja-oliver
12c https://michollo.com/chollo-xiaomi-mijia-quitapelusas-135879/
Next.js
0c https://clubecertosaude.com.br/clubecerto?from=goods.php%3Fid%3D1307940-12014%26name%3D%E6%96%B0%E5%93%81%E6%9C%AA%E4%BD%BF%E7%94%A8%E3%80%80THE%20NORTH%20FACE%20%E3%82%B9%E3%83%AD%E3%83%BC%E3%83%A1%E3%83%A2%E3%83%AA%E3%83%BC%E3%83%8F%E3%82%A4%E3%82%AF%E3%83%9F%E3%83%83%E3%83%89%E3%80%8029%E3%8E%9D&l=29901130794000&channel=87a3f6
30c https://dexkit.com/pt/our-token/contract-addresses
62c https://johnsuming.pixnet.net/albums/216158548/photos/2151242894
42c https://novgo11.pixnet.net/albums/508829606/photos/5169849300
Svelte
281c https://infosnel.nl/namen/naam/Kecey
0c https://clubcastingandalucia.es/foro/index.php?sid=4683bc2c5b2e5234f5385914b83cf4a4
294c https://commits.toino.pt/RO/reviews/
141c https://quranicaudio.com/stream.m3u/16900
Vue
32c https://lodz.tvp.pl/71735111/swieto-sasiada-w-gminie-bedkow
46c https://bialystok.tvp.pl/77113056/trwal-final-charytatywnej-akcji-pola-nadziei
274c https://image.delivery/page/pzunped
143c https://en.ways2help.com/charity/2399/centrepoint-soho
Preact
265c https://status.serverturk.net/en/maintenance
250c https://status.chatbro.cn/incidents
160c https://support.oracle.com/knowledgefs?docId=2782341
160c https://support.oracle.com/knowledgefs?docId=3050799
Nuxt
245c https://db.auto.sohu.com/model_1553/picture_id_29771479
100c https://auth.lapresselibre.info/realms/lpl/protocol/openid-connect/auth?response_type=code&scope=openid&client_id=rue89lyon&state=e21348fb4f3dac145deeec9f03ca0c33&redirect_uri=https%3A%2F%2Fwww.rue89lyon.fr%2Fopenid-connect-authorize
10c https://retailers.eleanorrigbyhome.com/en/find-custom-furniture-in-poway-246249
17c https://app.corvee.com/loginall?utm_source=corvee-website&utm_medium=blog&utm_content=federal-tax-planning&utm_campaign=corvee-blog
Ember
37c https://catalogue-hrm.opendata.arcgis.com/items/2ebec647e6ea42e893fbd41a5a77ecb6
18c https://cabarrus-county-america-thrives-here-cabarrus.opendata.arcgis.com/datasets/458802-sid-1
15c https://geoportal.gov.mb.ca/items/420d169c9349437789d202b302bc2732
149c https://subsplash.com/greatermetrochurch/media
SolidJS
37c https://solidcars.ae/tr/sports-car-rental/lamborghini-huracan-evo-gt-celebration-1-of-36
142c https://hackernews.ryansolid.workers.dev/users/trollbridge
Method. Streams Common Crawl WARC files and keeps 200 text/html responses. Two shell measures: (1) empty app mount — the page's framework container (<div id="root">, <app-root>, etc.) is present but empty in the captured HTML, the definitive client-rendered-not-server-rendered signature, with no text threshold; this is the trustworthy headline. (2) text-based — visible text under a cutoff AND a client-render signature; reported with a sensitivity sweep because the cutoff matters. Content hiding in inline JSON (Next.js __NEXT_DATA__) counts as content-present, not a shell. Detection is from raw HTML, so the by-framework split is approximate and the rates are sample estimates, conservative by design. The exact list of WARC files used is in cc-shell-survey.json for independent spot-checking.