HTTP
See also HTML/CSS
General
ugh. to sort.
- https://github.com/StevenBlack/hosts - Extending and consolidating hosts files from several well-curated sources like adaway.org, mvps.org, malwaredomainlist.com, someonewhocares.org, and potentially others. You can optionally invoke extensions to block additional sites by category.
- https://httpkit.com/resources/HTTP-from-the-Command-Line/
- http://news.ycombinator.com/item?id=4762886
- HTTPie is a CLI, cURL-like tool for humans
- httpbin(1) - HTTP Request & Response Service
- RIP HTTP - HTTP is fading away, and for good reasons. But ocassionally a website that runs on HTTP can be useful, such as in captive portals. This site serves as a permanent memorial to HTTP and pledges never to switch to HTTPS. But be careful, it could have had ads or malware injected and likely was spied on, by ISPs, governments, hotspots, and other malicious actors. And it also suffers a ranking punishment.
Addressing
URI;
<scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ]
- http://www.w3.org/Addressing/
- https://eager.io/blog/the-history-of-the-url-path-fragment-query-auth [3]
- https://en.wikipedia.org/wiki/Uniform_resource_identifier - URI = URL+URN, URL or URN
- http://www.ietf.org/rfc/rfc3987.txt - Internationalized Resource Identifiers (IRIs)
Headers
user-agent
auth
2012;
- https://datatracker.ietf.org/doc/draft-prodromou-dialback/ - pump.io
- https://github.com/evanp/dialback-example
URLs
- YOURLS stands for Your Own URL Shortener. It is a small set of PHP scripts that will allow you to run your own URL shortening service (a la TinyURL or bitly).
- ShadyURL - Don't just shorten your URL, make it suspicious and frightening.
- Heaps legit links - Turn any link into a suspicious looking one
Services
- expiring.link - Create Temporary Links that expire after n visits or on a date
well-known
API
- stackoverflow: What should a developer know before building a public web site? - 1240 votes
- ProgrammableWeb - API Dashboard
- Mashape - Free APIs
- http://developer.echonest.com/ - music related
Google Feed API
- https://developers.google.com/feed/v1/devguide
- https://developers.google.com/feed/v1/reference
- https://code.google.com/apis/ajax/playground/#load_feed
REST
- https://en.wikipedia.org/wiki/HATEOAS - Hypermedia As The Engine Of Application State (HATEOAS) is a constraint of the REST application architecture that distinguishes it from other network application architectures. With HATEOAS, a client interacts with a network application that application servers provide dynamically entirely through hypermedia. A REST client needs no prior knowledge about how to interact with an application or server beyond a generic understanding of hypermedia. By contrast, clients and servers in some service-oriented architectures (SOA) interact through a fixed interface shared through documentation or an interface description language (IDL). The way that the HATEOAS constraint decouples client and server enables the server functionality to evolve independently.
- http://www.infoq.com/articles/rest-introduction
- http://stackoverflow.com/questions/671118/what-exactly-is-restful-programming [7]
- http://jacobian.org/writing/rest-worst-practices/
- http://news.ycombinator.com/item?id=3538134
- http://stackoverflow.com/questions/1619152/how-to-create-rest-urls-without-verbs/1619677#1619677
- http://blog.steveklabnik.com/posts/2012-02-13-an-api-ontology
- http://blog.mugunthkumar.com/articles/restful-api-server-doing-it-the-right-way-part-1/
- http://www.spire.io/posts/rest-tutorial.html
- http://blog.apigee.com/detail/restful_api_design/
- http://restcookbook.com/
- http://shkspr.mobi/blog/index.php/2012/03/api-design-is-ui-for-developers/
- https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm
- https://github.com/zapier/resthooks - pubsub
- https://github.com/Corvusoft/restbed - a comprehensive and consistent programming model for building applications that require seamless and secure communication over HTTP, with the ability to model a range of business processes, designed to target mobile, tablet, desktop and embedded production environments.
SOAP
Software
GraphQL
- GraphQL - a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL provides a complete and understandable description of the data in your API, gives clients the power to ask for exactly what they need and nothing more, makes it easier to evolve APIs over time, and enables powerful developer tools.
JSON-API
- JSON:API - a specification for APIs that use JSON
Hydra
- Hydra Core Vocabulary - a lightweight vocabulary to create hypermedia-driven Web APIs. By specifying a number of concepts commonly used in Web APIs it enables the creation of generic API clients.
User and group
See also *nix#Users
Servers
- gist: web-servers.md - Big list of http static server one-liners
Python 3
python -m http.server 5674
PHP <5.4
php -S localhost:8000
Ruby
ruby -run -e httpd -- --port=8080 .
- https://github.com/rhardih/serve - as above with added gzip and http2
- h5ai - makes browsing directories on HTTP web servers more pleasant. Directory listings get styled in a modern way and browsing through the directories is enhanced by different views, a breadcrumb and a tree overview.
Nginx
- nginx [engine x] is a free, open-source, high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server. Igor Sysoev started development of Nginx in 2002, with the first public release in 2004. Nginx now hosts nearly 12.18% (22.2M) of active sites across all domains. Nginx is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption.
Guides
- Nginx Primer - Jul 19, 2010
- Nginx Primer 2: From Apache to Nginx - Feb 04th, 2011
- agentzh's Nginx Tutorials
- The Architecture of Open Source Applications (Volume 2): Nginx
- https://github.com/dslatten/nginsane
- Varnish as reverse proxy with nginx as web server and SSL terminator - Or, if you like, the nginx-Varnish-nginx sandwich. Jul 24th, 2012
Webfonts
Nginx uses a file’s mime type declaration to decide whether or not to apply compression to that file, and so we must first ensure that the four types of web font files have mime types configured.
/etc/nginx/mime.types
application/vnd.ms-fontobject eot; application/x-font-ttf ttf; font/opentype ott; font/x-woff woff;
remove;
application/octet-stream eot; application/vnd.oasis.opendocument.text-template ott;
Configuration
Listening on specific IP will override a wildcard IP catch-all.
Site setup
- http://nginx.org/en/docs/http/server_names.html
- http://stackoverflow.com/questions/7947030/nginx-no-www-to-www-and-www-to-no-www
Default site folder location can vary.
/var/www # debian /etc/nginx/html # arch linux
Create 'Server Block' (vhost) config file in
/etc/nginx/sites-available
and symlink to them in
/etc/nginx/sites-enabled
- ServerBlockExample - Basic examples
Enable logging in vhost conf;
error_log /var/log/nginx-vhostnamehere.log error;
Modules
- Nginx modules must be selected during compile, run-time selection of modules is not currently supported.
Compression
gzip_types text/plain text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript application/json image/svg+xml application/vnd.ms-fontobject application/x-font-ttf font/opentype;
FastCGI
- HttpFastcgiModule
- FcgiExample - FastCGI example. Keep seperate and include.
- PHPFcgiExample
- How to Solve “No input file specified” with PHP and Nginx - Jan 19, 2011
Connections
- HttpLimitConnModule - This module makes it possible to limit the number of concurrent connections for a defined key such as, for example, an ip address.
- HttpRewriteModule - rewriting urls
- HttpAuthBasicModule - for directory passwords
location / { auth_basic "Restricted"; auth_basic_user_file /etc/nginx/conf.d/htpasswd; }
printf "John:$(openssl passwd -crypt V3Ry)\n" >> .htpasswd # this example uses crypt encryption printf "Mary:$(openssl passwd -apr1 SEcRe7)\n" >> .htpasswd # this example uses apr1 (Apache MD5) encryption printf "Jane:$(openssl passwd -1 V3RySEcRe7)\n" >> .htpasswd # this example uses MD5 encryption (PWD="SEcRe7PwD";SALT="$(openssl rand -base64 3)";SHA1=$(printf "$PWD$SALT" | openssl dgst -binary -sha1 | \ sed 's#$#'"$SALT"'#' | base64);printf "Jim:{SSHA}$SHA1\n" >> .htpasswd) # this example uses SSHA encryptio
Security
SSL
- Create .key and .csr
openssl req -new -newkey rsa:4096 -nodes -keyout server.key -out server.csr
- Or using StartSSL control panel wizard, with additional key decrypt step after
- Provide .csr to certificate authority, certificate authority returns .crt
- .crt is concatenated with intermediate cert. .key can also be added; nginx will not send it
cat ssl.crt sub.class1.server.ca.pem ca.pem > /etc/nginx/conf/ssl-unified.crt
- Nginx config points to .crt and .key
- http://www.startssl.com/?app=42
- http://blog.chrismeller.com/creating-and-managing-ssl-certificates-with-nginx
Make sure wildcard SSL is in http stanza and that any specific server is listening on 443.
server { listen 80; listen [::]:80; listen 443 default ssl; server_name www.example.com; ssl_certificate /path/to/my/cert; ssl_certificate_key /path/to/my/key; if ($ssl_protocol = "") { rewrite ^ https://$server_name$request_uri? permanent; } }
ssl_ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA2:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-RC4-SHA:ECDH-ECDSA-RC4-SHA:ECDH-RSA-RC4-SHA:ECDHE-RSA-AES256-SHA;
- http://serverfault.com/questions/375621/nginx-override-global-ssl-directives-for-specific-servers
- http://publications.jbfavre.org/web/nginx-vhosts-automatiques-avec-SSL-et-authentification.en
- http://stackoverflow.com/questions/8768946/dealing-with-nginx-400-the-plain-http-request-was-sent-to-https-port-error
- https://www.digitalocean.com/community/articles/how-to-set-up-multiple-ssl-certificates-on-one-ip-with-nginx-on-ubuntu-12-04
- http://www.igvita.com/2013/12/16/optimizing-nginx-tls-time-to-first-byte/
- https://blog.kamalnasser.net/post/nginx-and-ssl-root-key-security
- https://raymii.org/s/tutorials/Pass_the_SSL_Labs_Test_on_NGINX_(Mitigate_the_CRIME_and_BEAST_attack_-_Disable_SSLv2_-_Enable_PFS).html
- http://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
Proxy
Info
- https://github.com/pagespeed/ngx_pagespeed - Automatic [Google] PageSpeed optimization module for Nginx
- https://developers.google.com/speed/pagespeed/ngx
Directory listing
autoindex on;
- browsepy - The simple web file browser.
Tools
- https://github.com/perusio/nginx_ensite - nginx_ensite and nginx_dissite for quick virtual host enabling and disabling
Lua
Forks / patches
Apache
- The 5G Blacklist helps reduce the number of malicious URL requests that hit your website. It’s one of many ways to improve the security of your site and protect against evil exploits, bad requests, and other nefarious garbage.
.htaccess
lighttpd
Other
- http://www.pugo.org:8080/ - postscript [11]
- Caddy - The HTTP/2 Web Server with Automatic HTTPS
- https://bitbucket.org/naviserver/naviserver - a versatile multiprotocol (httpd et al) server written in C/Tcl. It can be easily extended in either language to create interesting web sites and services.
- fserv - an HTTP server that can serve static files, act as a forward and reverse HTTP proxy server, stream media content (SHOUTcast). It has a very high speed on all supported platforms due to its asynchronous I/O event architecture. It can easily process thousands of requests in parallel. The functionality of fserv can be extended by adding new modules. When configured as a proxy server, it caches responses from upstream servers on a local filesystem, dramatically speeding up the response time when the same request is received again. fserv can act as an Internet radio trasmitter or it can re-transmit other Internet radio stations. In the latter case it also can record songs into MP3 files on a local machine, which makes it a useful music-grabber.
- ServerMe.H - An embeddable single-file C++ web server
- webfs - a simple http server for mostly static content. You can use it to serve the content of a ftp server via http for example. It is also nice to export some files the quick way by starting a http server in a few seconds, without editing some config file first.
- https://github.com/skx/httpd - Trivial HTTP-server for static-files only, written in go
- https://github.com/DirectoryLister/DirectoryLister - easy way to expose the contents of any web-accessable folder for browsing and sharing, PHP7
unix-web
CORS
- https://developer.mozilla.org/en-US/docs/HTTP/Access_control_CORS
- http://en.wikipedia.org/wiki/Cross-origin_resource_sharing
- cors - "This post aims to demystify CORS and show its lighter side–as a specification that didn’t set out to hamper the aspirations of web developers everywhere, but instead to loose us from the grip of the same-origin policy. We’ll go through each of the headers necessary to properly satisfy CORS constraints, and also discuss a couple places where CORS is now relevant but which may surprise you." [15]
- https://cors-anywhere.herokuapp.com/
- https://github.com/Rob--W/cors-anywhere - a NodeJS reverse proxy which adds CORS headers to the proxied request.
Proxy
Privoxy
- Privoxy is a non-caching web proxy with advanced filtering capabilities for enhancing privacy, modifying web page data and HTTP headers, controlling access, and removing ads and other obnoxious Internet junk. Privoxy has a flexible configuration and can be customized to suit individual needs and tastes. It has application for both stand-alone systems and multi-user networks.
Glype
- Glype is a web-based proxy script written in PHP which focuses on features, functionality, and ease of use. Webmasters use Glype to quickly and easily set up their own proxy sites. Glype helps users to defeat Internet censorship and be anonymous while web browsing.
ngrok
- ngrok - Introspected tunnels to localhost- securely expose a local web server to the internet and capture all traffic for detailed inspection and replay
uProxy
- uProxy - was an open source project led by the University of Washington and seeded by Jigsaw. Although the project is no longer supported, the code is still available on GitHub.
Lantern
- Lantern - a software application for desktop and mobile that delivers fast, reliable and secure access to blocked websites and apps.
dispatch-proxy
- https://github.com/Morhaus/dispatch-proxy - Combine internet connections, increase your download speed [17]
peroxide
- https://github.com/creativemarket/peroxide - A simple, configurable proxy server that will hit a chain of sources before failing.
- https://github.com/elazarl/goproxy - An HTTP proxy library for Go
MITM
mitmproxy
Hyperfox
Extensions
px
- https://github.com/genotrance/px - An HTTP proxy server to automatically authenticate through an NTLM proxy
Testing
- Is it down? Check at Down for Everyone or Just Me - web service to check if a site is on-line
- httping - like 'ping' but for http-requests. Give it an url, and it'll show you how long it takes to connect, send a request and retrieve the reply (only the headers). Be aware that the transmission across the network also takes time! So it measures the latency of the webserver + network.It supports, of course, IPv6.
- RED - a robot that checks HTTP resources to see how they'll behave, pointing out common problems and suggesting improvements. Although it is not a HTTP conformance tester, it can find a number of HTTP-related issues.
- Browser SOA Debugger - Depending on the view of things this is just an enhanced HTTP output formatter for tcpdump streams, or the ultimate debugger for complex HTTP oriented SOA architectures which visualizes the full HTTP interactions in a readable, reproducible way so that you can see what is actually going on in your backend.
- Hyper - an experimental middleware architecture for HTTP servers written in PureScript. Its main focus is correctness and type-safety, using type-level information to enforce correct composition and abstraction for web servers. The Hyper project is also a breeding ground for higher-level web server constructs, which tend to fall under the “framework” category.
Clients
- Postman - a powerful HTTP client to help test web services easily and efficiently. Postman let's you craft simple as well as complex HTTP requests quickly. It also saves requests for future use so that you never have to repeat your keystrokes ever again. Postman is designed to save you and your team tons of time. Check out more features below or just install from the Chrome Web Store to get started. [20]
- https://github.com/hazbo/httpu - The terminal-first http client
- RESTClient - a debugger for RESTful web services.
Compression
Load
A/B
API
Performance
See also Server#Performance
- Use Server Cache Control to Improve Performance
- Cache Control Directives Demystified - July 2008
- The Importance of the WordPress Expires Header
Guides
- Google Dev: Make the Web Faster
- Even Faster Web Sites - Book by Steve Souders
- Caching Tutorial for Web Authors and Webmasters
- A Primer on Web Caching - Jun 23rd, 2012
- How we made Portent.com really freaking fast - May 23, 2012
- serverfault: The strange case of Mr. Time To First Byte
- One Sub Domain Doesn’t Make a CDN - Jan 3, 2011
- if using not using www., use a subdomain cname on another domain
Cookieless
Cookie free domain for static content so cooke isn't sent with request. Root domain cookies apply to all subdomain cookies, though using www. (ugh!) works. Use another domain A record to point to the site.(?)
ETag
- stackoverflow: ETag vs Header Expires - Feb 1, 2009
- Speed Tips: Turn Off ETags - for multiserver
- http://joshua.schachter.org/2006/11/apache-etags.html
- http://davidwalsh.name/yslow-htaccess
- REST Tip: Deep etags give you more benefits. - Mar 2007
- ETags Revisited - 31 Jan 2011. best overview article.
Caching
- http://www.quora.com/What-is-the-fundamental-difference-between-varnish-and-squid-caching-architectures
- http://deserialized.com/caching/reverse-proxy-performance-varnish-vs-squid-part-1/
Varnish
- Varnish is a web application accelerator. You install it in front of your web application and it will speed it up significantly.
Load balancing
CDN
- http://www.jsdelivr.com/ - js
- backed by maxcdn.com
- CoralCDN is a decentralized, self-organizing, peer-to-peer web-content distribution network. CoralCDN leverages the aggregate bandwidth of volunteers running the software to absorb and dissipate most of the traffic for web sites using the system.
.nyud.net
p2p;
Testing
- YSlow
- YSlow: Yahoo's Problems Are Not Your Problems - Aug 15, 2007
- etc.
- GTmetrix uses Google Page Speed and Yahoo! YSlow to grade your site's performance and provides actionable recommendations to fix these issues.
- Blitz does cloud based load and performance testing ising Sinatra, Rails and node.js.
- Free: Sprint all you want, Rush all you want, 250 concurrent users, 1 minute rushes
- Engulf is a scalable, distributed HTTP benchmarker, designed to let you spin up and coordinate a cluster of workers with nothing more than a single JAR. Engulf's backend is written in clojure, the frontend in javascript.
Logging
Combined Log Format
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined
old skool;
Apache
Nginx
error_log /var/log/nginx/domain.name/error.log; access_log /var/log/nginx/domain.name/access.log;
GoAccess
- GoAccess is an open source real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems. It provides fast and valuable HTTP statistics for system administrators that require a visual server report on the fly.
HAR
- https://github.com/janodvarko/harviewer
- https://github.com/ericduran/chromeHAR
- https://github.com/jarib/har
Control panels
Analytics
Referer
Piwik
- Piwik is downloadable, Free/Libre (GPLv3 licensed) real time web analytics software. It provides you with detailed reports on your website visitors; the search engines and keywords they used, the language they speak, your popular pages, and much more.
- User Guide
- Log Analytics - combined format logs (apache, nginx)
WordPress
Clicky
GoatCounter
- GoatCounter - Simple web statistics. No tracking of personal data.
Other
Upload
Quick screipts for uploading;
- https://code.google.com/p/html5uploader/
- https://github.com/mihaild/jquery-html5-upload
- http://www.plupload.com/
- tus - resumable file uploads
WebDAV
- WebDAV stands for Web Distributed Authoring and Versionin, see RFC 2518
"Linux users can mount WebDAV shares using the davfs2 and the fusedav file system modules which mount them as Coda or FUSE filesystems. KDE has native WebDAV support as part of kio_http. This enables Dolphin, Konqueror, and every other KDE application to interact directly with WebDAV servers. Nautilus also has WebDAV support built in. Many Linux distributions also include the cadaver command-line client interface, which provides an FTP-like command set. The Apache HTTP Server provides WebDAV modules based on both davfs and Apache Subversion (svn)."
- WebDAV Resources - This site is being produced for the WebDAV community as a central resource for documentation, specifications, software, mailing lists, and other useful items.
- Use Linux and WebDAV to Facilitate Online Collaboration - Apache method
- http://linuxsagas.digitaleagle.net/2008/09/09/webdav-and-fstab/
File system
Davfs
- davfs2 provides the ability to access such resources like a typical filesystem, allowing for use by standard applications with no built-in support for WebDAV.
fusedav
- fusedav is a Linux userspace file system driver for mounting WebDAV shares. It makes use of FUSE as userspace file system API and neon as WebDAV API.
- http://owncloud.org/ - Version 5.0 Expected August 2012
Software
- neon is an HTTP and WebDAV client library, with a C interface
- jsDAV allows you to easily add WebDAV support to a NodeJS application.
- DAV-pocket Lab is a small project team developping WebDAV Server on Google App Engine.
- mod_dav is an Apache module to provide DAV capabilities (RFC 2518) for your Apache web server.
- http://www.webdav.org/goliath/
- cadaver is a command-line WebDAV client for Unix. It supports file upload, download, on-screen display, namespace operations (move/copy), collection creation and deletion, and locking operations.
Saving
Wget
- Wget - a free software package for retrieving files using HTTP, HTTPS, FTP and FTPS the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc.
wget -O myzip.zip https://github.com/username/project/zipball/master wget -m http://example.com --mirror wget -mk http://example.com --convert-links wget -mk -w 20 http://example.com # with delay of 20 seconds between requests wget -E -H -k -K -p -nd http://example.com to mirror a single page --adjust-extension --span-hosts --convert-links --backup-converted --page-requisites --no-directories - httpd access permission issues. to try next time; -nH wget -r -np -l 1 -Azip http://example.com/download/ # download all links to .zip files on a given web page [26]
- http://superuser.com/questions/55040/save-a-single-web-page-with-background-images-with-wget
- http://fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/
- http://stackoverflow.com/questions/6145641/wget-how-to-mirror-only-a-section-of-a-website
- http://stackoverflow.com/questions/10712344/mirror-http-website-excluding-certain-files - downloads /then/ filters, often not handy...
- https://code.google.com/p/wgetremote - php web interface for wget download manager. with wgetremote you can remotely access installed wget on your local computer.
cURL
- curl is a tool to transfer data from or to a server, using one of the supported protocols (DICT, FILE, FTP, FTPS, GOPHER, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMTP, SMTPS, TELNET and TFTP). The command is designed to work without user interaction.
curl http://www.google.com/search.js -o /path/to/local/file.js
curl http://site.{one,two,three}.com
curl ftp://ftp.numericals.com/file[1-100].txt ftp://ftp.numericals.com/file[001-100].txt (with leading zeros) ftp://ftp.letters.com/file[a-z].txt sequences of alphanumeric series by using []
curl http://any.org/archive[1996-1999]/vol[1-4]/part{a,b,c}.html Nested sequences are not supported, but you can use several ones next to each other:
curl http://www.numericals.com/file[1-100:10].txt http://www.letters.com/file[a-z:2].txt multiple urls + specify a step counter for the ranges to get every Nth number or letter:
- saldl is a lightweight well-featured CLI downloader optimized for speed and early preview. based on libcurl.
Other
- mulk - Multi-connection command line tool for downloading Internet sites with image filtering and Metalink support. Similar to wget and cURL, but it manages up to 50 simultaneous and parallel links. Main features are: HTML code parsing, recursive fetching, Metalink retrieving, segmented download and image filtering by width and height. It is based on libcurl, liburiparser, libtidy, libmetalink and libcrypto.
- aria2 is a lightweight multi-protocol & multi-source command-line download utility. It supports HTTP/HTTPS, FTP, BitTorrent and Metalink. aria2 can be manipulated via built-in JSON-RPC and XML-RPC interfaces.
- HTTrack allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads.
httrack "https://example.com" -O ExampleMirrorDirectory \ "-*" \ "+https://example.com/images/*" \ "-*.swf"
- https://github.com/jjjake/internetarchive - A Python and Command-Line Interface to Archive.org
- https://github.com/kanishka-linux/reminiscence - Self-hosted Bookmark and Archive manage [27]
Scraping
See also Data#Scraping
- http://search.cpan.org/~ether/WWW-Mechanize-1.75/lib/WWW/Mechanize.pm
- https://pypi.python.org/pypi/mechanize/
- https://code.google.com/archive/p/flying-saucer/ - render html to pdf
- https://github.com/Y2Z/monolith - Save HTML pages with ease
404
Other
HTTP/2 (SPDY)
- http://www.mnot.net/blog/2012/08/04/http_vancouver
- http://bitsup.blogspot.co.uk/2012/08/the-road-to-http2.html The Road to HTTP/2] - Aug 6, 2012
- http://blog.jgc.org/2012/12/speeding-up-http-with-minimal-protocol.html
HTTP/3 (QUIC)