howto: web page archiving              rev 10 julo 2019

....................................................... 
The Wayback Machine - web.archive.org 

    * saving a page on the WayBack Machine:
         - go to web.archive.org and put the url in
             "Save Page Now" box (scroll down).
         - use bookmarklet from your browser.
         - in browser type web.archive.org/save/https://example.com
         more info:
            - https://help.archive.org/hc/en-us/articles/360001513491-Save-Pages-in-the-Wayback-Machine
            - https://blog.archive.org/2017/01/25/see-something-save-something/
                (more info, and good comments.)

    * To see all archived pages for a site do like
        https://web.archive.org/web/*/tibetsun.com
        or
        enter a few words related to the website
          tibet sun news
          got a lot of hits, but tibetsun.com was the first hit.
          exile tibetans 
            got a lot of exile tibetan websites.
          exiletibetans 
            got nothing.
          exiletibetans.com
            got the full archive.

    * one archived page would look like:
        https://web.archive.org/web/20190516140606/https://indianexpress.com/elections/to-be-or-not-to-be-voting-rights-for-tibetans-in-india-decoded-2019-lok-sabha-elections-5728161/
    * To see all archives of a page, do like
        https://web.archive.org/web/*/https://indianexpress.com/elections/to-be-or-not-to-be-voting-rights-for-tibetans-in-india-decoded-2019-lok-sabha-elections-5728161/

    * Getting Wayback Machine to crawl a site:
         - doesn't provide a way on its own:
            "Internet Archive's crawls tend to find sites that are well
              linked from other sites."
         - There are ways to archive a whole site:
            o "If you wish to archive a small website, the Archive
               Team maintains the ArchiveBot, an IRC bot where you
               can request to crawl websites. The Archive Team will
               then submit the crawled pages to the Internet Archive's
               Wayback Machine." 
               -- https://webapps.stackexchange.com/questions/115369/how-to-archive-the-whole-website
               https://www.archiveteam.org/
               https://www.archiveteam.org/index.php?title=ArchiveBot

            o Download the site and upload all to wayback machine:
                $ wget -m https://example.com/
                $ find . -name "*.html" -exec curl -v "https://web.archive.org/save/https://{}" ';'
                   (or -type f if they aren't .html - these days!)
               -- https://webapps.stackexchange.com/questions/115369/how-to-archive-the-whole-website


    * Restoring pages or sites from the WayBack Machine:
        archive.org itself doesn't do this, but
        There are various services that do this.

    * Using the Wayback Machine:
        https://help.archive.org/hc/en-us/articles/360004651732-Using-The-Wayback-Machine
    * Searching the Wayback Machine:
        Only by url.
        There isn't yet full-text searching for archived web pages.
    * Search - A Basic Guide 
        https://help.archive.org/hc/en-us/articles/360018359991-Search-A-Basic-Guide
        This is for other contents -like books, etc.

    * https://www.startpage.com/do/search?lui=english&language=english&cat=web&query=how+to+search+for+pages+on+web.archive.org&nj=&anticache=829784
        how to search for pages on web.archive.org
        gets good hits. [29 jun 2019]

    * more info:
       - https://kit.exposingtheinvisible.org/how/web-archive.html
       - https://kit.exposingtheinvisible.org/how/web.html
       - https://help.archive.org/hc/en-us


_______________________________________________________
begin 29 jun 2019
-- 0 --