howto: web page archiving rev 10 julo 2019 ....................................................... The Wayback Machine - web.archive.org * saving a page on the WayBack Machine: - go to web.archive.org and put the url in "Save Page Now" box (scroll down). - use bookmarklet from your browser. - in browser type web.archive.org/save/https://example.com more info: - https://help.archive.org/hc/en-us/articles/360001513491-Save-Pages-in-the-Wayback-Machine - https://blog.archive.org/2017/01/25/see-something-save-something/ (more info, and good comments.) * To see all archived pages for a site do like https://web.archive.org/web/*/tibetsun.com or enter a few words related to the website tibet sun news got a lot of hits, but tibetsun.com was the first hit. exile tibetans got a lot of exile tibetan websites. exiletibetans got nothing. exiletibetans.com got the full archive. * one archived page would look like: https://web.archive.org/web/20190516140606/https://indianexpress.com/elections/to-be-or-not-to-be-voting-rights-for-tibetans-in-india-decoded-2019-lok-sabha-elections-5728161/ * To see all archives of a page, do like https://web.archive.org/web/*/https://indianexpress.com/elections/to-be-or-not-to-be-voting-rights-for-tibetans-in-india-decoded-2019-lok-sabha-elections-5728161/ * Getting Wayback Machine to crawl a site: - doesn't provide a way on its own: "Internet Archive's crawls tend to find sites that are well linked from other sites." - There are ways to archive a whole site: o "If you wish to archive a small website, the Archive Team maintains the ArchiveBot, an IRC bot where you can request to crawl websites. The Archive Team will then submit the crawled pages to the Internet Archive's Wayback Machine." -- https://webapps.stackexchange.com/questions/115369/how-to-archive-the-whole-website https://www.archiveteam.org/ https://www.archiveteam.org/index.php?title=ArchiveBot o Download the site and upload all to wayback machine: $ wget -m https://example.com/ $ find . -name "*.html" -exec curl -v "https://web.archive.org/save/https://{}" ';' (or -type f if they aren't .html - these days!) -- https://webapps.stackexchange.com/questions/115369/how-to-archive-the-whole-website * Restoring pages or sites from the WayBack Machine: archive.org itself doesn't do this, but There are various services that do this. * Using the Wayback Machine: https://help.archive.org/hc/en-us/articles/360004651732-Using-The-Wayback-Machine * Searching the Wayback Machine: Only by url. There isn't yet full-text searching for archived web pages. * Search - A Basic Guide https://help.archive.org/hc/en-us/articles/360018359991-Search-A-Basic-Guide This is for other contents -like books, etc. * https://www.startpage.com/do/search?lui=english&language=english&cat=web&query=how+to+search+for+pages+on+web.archive.org&nj=&anticache=829784 how to search for pages on web.archive.org gets good hits. [29 jun 2019] * more info: - https://kit.exposingtheinvisible.org/how/web-archive.html - https://kit.exposingtheinvisible.org/how/web.html - https://help.archive.org/hc/en-us _______________________________________________________ begin 29 jun 2019 -- 0 --