Archive.org sends out what are called spiders that follow from link to link to link and save the web pages to a database. The database is searchable by the domain of the site. Sites behind paywalls and passwords are not stored. However, much of the internet is reviewable using this service.

It's interesting, you can watch the progression of development of a particular site over time; observing the design choices and feature upgrades.

Ultimately, it provides the best and only public source the Internet's past.

If you want to save a site locally (meaning on your personal computer), from the point of view of the users, you can use tools such as HTTPTrack, which will provide a carbon copy of the site.

Higher level web sites, such as those that are "progressive web sites" are not captured by either The Wayback Machine nor HTTPTrack.

For those who don't want to be tracked by this service, there is a way to delete your records and remove your site from crawls. Amolith has written detailed instructions for opting-out.

Related posts: