HTTP

Things and Stuff Wiki - An organically evolving personal wiki knowledge base. An on-the-fly taxonomy containing a patchwork trail of topic outlines, descriptions, notes, stubs and breadcrumbs, with links to sites, systems, software, manuals, organisations, people, articles, guides, slides, papers, books, comments, videos, screencasts, webcasts, scratchpads and more. Content is orientated towards mostly free/libre/open, mostly Linux. Quality and age varies drastically. Sometimes old things are first, sometimes last. Use the Table of Contents menu to navigate long pages. Zoom in if text is too small. Dead link? Wayback Machine. I probably need to fix the theme CSS after an update. See also libreav.org. Chat to msg me (not checking tho atm). e

General

Addressing

URL Standard - The URL Standard defines URLs, domains, IP addresses, the application/x-www-form-urlencoded format, and their API.
- https://github.com/whatwg/url

URI;

<scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ]

http://www.w3.org/Addressing

https://eager.io/blog/the-history-of-the-url-path-fragment-query-auth [3]

https://en.wikipedia.org/wiki/Uniform_resource_identifier - URI = URL+URN, URL or URN

https://en.wikipedia.org/wiki/URI_scheme

https://en.wikipedia.org/wiki/Uniform_resource_locator

https://en.wikipedia.org/wiki/Uniform_resource_name

https://en.wikipedia.org/wiki/Fragment_identifier

https://en.wikipedia.org/wiki/Extensible_Resource_Identifier

https://en.wikipedia.org/wiki/Dereferenceable_Uniform_Resource_Identifier

http://www.ietf.org/rfc/rfc3987.txt - Internationalized Resource Identifiers (IRIs)
- https://en.wikipedia.org/wiki/Internationalized_resource_identifier

http://www.w3.org/wiki/UriSchemes

http://url.spec.whatwg.org

https://en.wikipedia.org/wiki/Media_resource_locator

URNs, Namespaces and Registries

https://en.wikipedia.org/wiki/Web_server_directory_index - When an HTTP client (generally a web browser, requests a URL that points to a directory structure instead of an actual web page within the directory structure, the web server will generally serve a default page, which is often referred to as a main or "index" page. A common filename for such a page is index.html, but most modern HTTP servers offer a configurable list of filenames that the server can use as an index. If a server is configured to support server-side scripting, the list will usually include entries allowing dynamic content to be used as the index page (e.g. index.cgi, index.pl, index.php

Headers

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers

https://en.wikipedia.org/wiki/List_of_HTTP_header_fields

http://blog.keithcirkel.co.uk/the-ups-and-downs-of-the-http-header/ [4]

http://livehttpheaders.mozdev.org/

https://en.wikipedia.org/wiki/Content_negotiation - refers to mechanisms defined as a part of HTTP that make it possible to serve different versions of a document (or more generally, representations of a resource) at the same URI, so that user agents can specify which version fits their capabilities the best. One classical use of this mechanism is to serve an image in GIF or PNG format, so that a browser that cannot display PNG images (e.g. MS Internet Explorer 4) will be served the GIF version. A resource may be available in several different representations; for example, it might be available in different languages or different media types. One way of selecting the most appropriate choice is to give the user an index page and let them select the most appropriate choice; however it is often possible to automate the choice based on some selection criteria.

Fetch

Fetch Standard - The Fetch standard defines requests, responses, and the process that binds them: fetching.
- https://github.com/whatwg/fetch

user-agent

https://en.wikipedia.org/wiki/User_agent

https://news.ycombinator.com/item?id=6372913

http://msdn.microsoft.com/en-us/library/ie/hh869301%28v=vs.85%29.aspx [5]

http://webaim.org/blog/user-agent-string-history/

Chrome Phasing out Support for User Agent - [6]

User-Agent Client Hints - This document defines a set of Client Hints that aim to provide developers with the ability to perform agent-based content negotiation when necessary, while avoiding the historical baggage and passive fingerprinting surface exposed by the venerable `User-Agent` header.
- https://github.com/WICG/ua-client-hints

auth

http://en.wikipedia.org/wiki/Basic_access_authentication

http://www.ietf.org/rfc/rfc2069.txt
- http://en.wikipedia.org/wiki/Digest_access_authentication
- http://en.wikipedia.org/wiki/Digest_access_authentication#Alternative_authentication_protocols

http://www.ietf.org/mail-archive/web/http-auth/current/maillist.html

2012;

http://tools.ietf.org/html/draft-ietf-httpauth-mutual-00

URI

RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax

https://en.wikipedia.org/wiki/Uniform_Resource_Identifier - a string of characters that unambiguously identifies a particular resource. To guarantee uniformity, all URIs follow a predefined set of syntax rules, but also maintain extensibility through a separately defined hierarchical naming scheme (e.g. http://).Such identification enables interaction with representations of the resource over a network, typically the World Wide Web, using specific protocols. Schemes specifying a concrete syntax and associated protocols define each URI. The most common form of URI is the Uniform Resource Locator (URL), frequently referred to informally as a web address. More rarely seen in usage is the Uniform Resource Name (URN), which was designed to complement URLs by providing a mechanism for the identification of resources in particular namespaces.

RFC 3987 - Internationalized Resource Identifiers (IRIs)

https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier - an internet protocol standard which builds on the Uniform Resource Identifier (URI) protocol by greatly expanding the set of permitted characters. It was defined by the Internet Engineering Task Force (IETF) in 2005 in RFC 3987. While URIs are limited to a subset of the ASCII character set, IRIs may additionally contain most characters from the Universal Character Set (Unicode/ISO 10646), including Chinese, Japanese, Korean, and Cyrillic characters.

https://en.wikipedia.org/wiki/List_of_URI_schemes - This article lists common URI schemes. A Uniform Resource Identifier helps identify a source without ambiguity.

https://en.wikipedia.org/wiki/Uniform_Resource_Name - a Uniform Resource Identifier (URI) that uses the urn scheme. A URI is a string of characters used to identify a name or resource. URIs are used in many Internet protocols to refer to and access information resources. URI schemes include the familiar http, as well as hundreds of others.

URNs were originally conceived to be part of a three-part information architecture for the Internet, along with Uniform Resource Locators (URLs) and Uniform Resource Characteristics (URCs), a metadata framework. As described in the 1994 RFC 1737,, and later in the 1997 RFC 2141, URNs were distinguished from URLs, which identify resources by specifying their locations in the context of a particular access protocol, such as HTTP or FTP. In contrast, URNs were conceived as persistent, location-independent identifiers assigned within defined namespaces, typically by an authority responsible for the namespace, so that they are globally unique and persistent over long periods of time, even after the resource which they identify ceases to exist or becomes unavailable.

URCs never progressed past the conceptual stage, and other technologies such as the Resource Description Framework later took their place. Since RFC 3986[5] in 2005, use of the terms "Uniform Resource Name" and "Uniform Resource Locator" has been deprecated in technical standards in favor of the term Uniform Resource Identifier (URI), which encompasses both, a view proposed in 2001 by a joint working group between the World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF).

https://en.wikipedia.org/wiki/URI_fragment - a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier points to the subordinate resource. The fragment identifier introduced by a hash mark # is the optional last part of a URL for a document. It is typically used to identify a portion of that document. The generic syntax is specified in RFC 3986. The hash mark separator in URIs is not part of the fragment identifier.

https://en.wikipedia.org/wiki/Fragment_identifier - a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier points to the subordinate resource.The fragment identifier introduced by a hash mark # is the optional last part of a URL for a document. It is typically used to identify a portion of that document. The generic syntax is specified in RFC 3986. The hash-mark separator in URIs is not part of the fragment identifier.

https://en.wikipedia.org/wiki/Percent-encoding - also known as URL encoding, is a mechanism for encoding information in a Uniform Resource Identifier (URI) under certain circumstances. Although it is known as URL encoding, it is, in fact, used more generally within the main Uniform Resource Identifier (URI) set, which includes both Uniform Resource Locator (URL) and Uniform Resource Name (URN). As such, it is also used in the preparation of data of the application/x-www-form-urlencoded media type, as is often used in the submission of HTML form data in HTTP requests.

https://en.wikipedia.org/wiki/Internationalized_domain_name - an Internet domain name that contains at least one label that is displayed in software applications, in whole or in part, in a language-specific script or alphabet, such as Arabic, Chinese, Cyrillic, Devanagari, Hebrew or the Latin alphabet-based characters with diacritics or ligatures, such as French. These writing systems are encoded by computers in multibyte Unicode. Internationalized domain names are stored in the Domain Name System (DNS) as ASCII strings using Punycode transcription.

https://en.wikipedia.org/wiki/Punycode - a representation of Unicode with the limited ASCII character subset used for Internet hostnames. Using Punycode, host names containing Unicode characters are transcoded to a subset of ASCII consisting of letters, digits, and hyphens, which is called the Letter-Digit-Hyphen (LDH) subset. For example, München (German name for Munich) is encoded as Mnchen-3ya.While the Domain Name System (DNS) technically supports arbitrary sequences of octets in domain name labels, the DNS standards recommend the use of the LDH subset of ASCII conventionally used for host names, and require that string comparisons between DNS domain names should be case-insensitive. The Punycode syntax is a method of encoding strings containing Unicode characters, such as internationalized domain names (IDNA), into the LDH subset of ASCII favored by DNS. It is specified in IETF Request for Comments 3492.

http://en.wikipedia.org/wiki/Web_resource - or simply resource, is any identifiable thing, whether digital, physical, or abstract. Resources are identified using Uniform Resource Identifiers. In the Semantic Web, web resources and their semantic properties are described using the Resource Description Framework.

Cool URIs for the Semantic Web - The Resource Description Framework RDF allows users to describe both Web documents and concepts from the real world—people, organisations, topics, things—in a computer-processable way. Publishing such descriptions on the Web creates the Semantic Web. URIs (Uniform Resource Identifiers) are very important, providing both the core of the framework itself and the link between RDF and the Web. This document presents guidelines for their effective use. It discusses two strategies, called 303 URIs and hash URIs. It gives pointers to several Web sites that use these solutions, and briefly discusses why several other proposals have problems.

https://news.ycombinator.com/item?id=23865484

Sense and Reference on the Web - Harry Halpin, 2009. "This thesis builds a foundation for the philosophy of the Web by examining the crucial question: What does a Uniform Resource Identifier (URI) mean? Does it have a sense, and can it refer to things? A philosophical and historical introduction to the Web explains the primary purpose of the Web as a universal information space for naming and accessing information via URIs. A terminology, based on distinctions in philosophy, is employed to define precisely what is meant by information, language, representation, and reference. These terms are then employed to create a foundational ontology and principles of Web architecture. From this perspective, the Semantic Web is then viewed as the application of the principles of Web architecture to knowledge representation. However, the classical philosophical problems of sense and reference that have been the source of debate within the philosophy of language return. Three main positions are inspected: the logicist position, as exemplified by the descriptivist theory of reference and the first-generation Semantic Web, the direct reference position, as exemplified by Putnam and Kripke's causal theory of reference and the second-generation Linked Data initiative, and a Wittgensteinian position that views the Semantic Web as yet another public language. After identifying the public language position as the most promising, a solution of using people's everyday use of search engines as relevance feedback is proposed as a Wittgensteinian way to determine sense of URIs. This solution is then evaluated on a sample of the Semantic Web discovered by via using queries from a hypertext search engine query log. The results are evaluated and the technique of using relevance feedback from hypertext Web searches to determine relevant Semantic Web URIs in response to user queries is shown to considerably improve baseline performance. Future work for the Web that follows from our argument and experiments is detailed, and outlines of a future philosophy of the Web laid out."

Using relative URLs - Design Issues

Any claim without a URl should be treated as suspicious – Terence Eden’s Blog - [7]

CURIE

https://en.wikipedia.org/wiki/CURIE - or Compact URI, defines a generic, abbreviated syntax for expressing Uniform Resource Identifiers (URIs). It is an abbreviated URI expressed in a compact syntax, and may be found in both XML and non-XML grammars. A CURIE may be considered a datatype.An example of CURIE syntax: [isbn:0393315703]

URL shortners

www. is deprecated. -

tantek / Whistle - an open source, algorithmically reversible, personal URL shortener.

YOURLS - stands for Your Own URL Shortener. It is a small set of PHP scripts that will allow you to run your own URL shortening service (a la TinyURL or bitly).
- https://github.com/YOURLS/YOURLS

ShadyURL - Mike Lacher
ShadyURL - Don't just shorten your URL, make it suspicious and frightening. Dead.

Heaps legit links - Turn any link into a suspicious looking one

https://daniel.haxx.se/blog/2017/01/30/one-url-standard-please/ [8]

Services

http://3.ly/

http://is.gd/

http://tinyurl.com/

https://bitly.com/

http://ow.ly/

http://tinyarrows.com/

https://github.com/ehamberg/9m

expiring.link - Create Temporary Links that expire after n visits or on a date

well-known

https://news.ycombinator.com/item?id=18618193

API

https://en.wikipedia.org/wiki/Web_API - an application programming interface for either a web server or a web browser. It is a web development concept, usually limited to a web application's client-side (including any web frameworks being used), and thus usually does not include web server or browser implementation details such as SAPIs or APIs unless publicly accessible by a remote web application.

https://en.wikipedia.org/wiki/Open_API - often referred to as a public API, is a publicly available application programming interface that provides developers with programmatic access to a proprietary software application or web service. APIs are sets of requirements that govern how one application can communicate and interact with another. APIs can also allow developers to access certain internal functions of a program, although this is not typically the case for web APIs. In the simplest terms, an API allows one piece of software to interact with another piece of software, whether within a single computer via a mechanism provided by the operating system or over an internal or external TCP/IP-based or non-TCP/IP-based network.[3] In the late 2010s, many APIs are provided by organisations for access with HTTP. APIs may be used by both developers inside the organisation that published the API or by any developers outside that organisation who wish to register for access to the interface.

ProgrammableWeb - API Dashboard

http://developer.echonest.com - music related

Apify - Web Scraping, Data Extraction and Automation
- https://github.com/apifytech/apify-js - The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

Diffbot - Knowledge Graph, AI Web Data Extraction and Crawling

Insomnia - The open-source, cross-platform API client for GraphQL, REST, WebSockets and gRPC. An easy way to design, debug, and test APIs Build better APIs faster and collaboratively with a dev-friendly interface, built-in automation, and an extensible plugin ecosystem.
- https://github.com/Kong/insomnia

CGI

RFC 3875 - The Common Gateway Interface (CGI) Version 1.1 - a simple interface for running external programs, software or gateways under an information server in a platform-independent manner. Currently, the supported information servers are HTTP servers. The interface has been in use by the World-Wide Web (WWW) since 1993. This specification defines the 'current practice' parameters of the 'CGI/1.1' interface developed and documented at the U.S. National Centre for Supercomputing Applications. This document also defines the use of the CGI/1.1 interface on UNIX(R) and other, similar systems.

https://en.wikipedia.org/wiki/Common_Gateway_Interface - an interface specification that enables web servers to execute an external program to process HTTP/S user requests. Such programs are often written in a scripting language and are commonly referred to as CGI scripts, but they may include compiled programs.

A typical use case occurs when a web user submits a web form on a web page that uses CGI. The form's data is sent to the web server within an HTTP request with a URL denoting a CGI script. The web server then launches the CGI script in a new computer process, passing the form data to it. The output of the CGI script, usually in the form of HTML, is returned by the script to the Web server, and the server relays it back to the browser as its response to the browser's request. Developed in the early 1990s, CGI was the earliest common method available that allowed a web page to be interactive. Due to a necessity to run CGI scripts in a separate process every time the request comes in from a client various alternatives were developed.

https://github.com/rwhaling/dinosaur - Web "framework" for Scala Native with the power of RFC 3875: The Common Gateway Interface

RESTful

https://en.wikipedia.org/wiki/Representational_State_Transfer - a software architectural style that defines a set of constraints to be used for creating Web services. Web services that conform to the REST architectural style, called RESTful Web services, provide interoperability between computer systems on the Internet. RESTful Web services allow the requesting systems to access and manipulate textual representations of Web resources by using a uniform and predefined set of stateless operations. Other kinds of Web services, such as SOAP Web services, expose their own arbitrary sets of operations.

"Web resources" were first defined on the World Wide Web as documents or files identified by their URLs. However, today they have a much more generic and abstract definition that encompasses every thing or entity that can be identified, named, addressed, or handled, in any way whatsoever, on the Web. In a RESTful Web service, requests made to a resource's URI will elicit a response with a payload formatted in HTML, XML, JSON, or some other format. The response can confirm that some alteration has been made to the stored resource, and the response can provide hypertext links to other related resources or collections of resources. When HTTP is used, as is most common, the operations (HTTP methods) available are GET, HEAD, POST, PUT, PATCH, DELETE, CONNECT, OPTIONS and TRACE. By using a stateless protocol and standard operations, RESTful systems aim for fast performance, reliability, and the ability to grow by reusing components that can be managed and updated without affecting the system as a whole, even while it is running.

https://chrome.google.com/webstore/detail/rest/flkpngnnmfhmdcoggeompbgbpocpfmgk

http://tomayko.com/writings/rest-to-my-wife

http://mikehadlow.blogspot.co.uk/2012/08/rest-epic-semantic-fail.html

http://broadcast.oreilly.com/2011/06/the-good-the-bad-the-ugly-of-rest-apis.html

http://ql.io/

http://gist.io/3169140

http://apify.heroku.com/resources

http://static.matthewlmcclure.com/s/2012/11/24/hypermedia-is-the-new-rest.html

http://www.vinaysahni.com/best-practices-for-a-pragmatic-restful-api [10]

http://www.onebigfluke.com/2013/08/throw-away-the-rest.html

https://github.com/zapier/resthooks - pubsub

https://news.ycombinator.com/item?id=10765148

https://news.ycombinator.com/item?id=18485978

https://github.com/Corvusoft/restbed - a comprehensive and consistent programming model for building applications that require seamless and secure communication over HTTP, with the ability to model a range of business processes, designed to target mobile, tablet, desktop and embedded production environments.

https://openresty.org/

https://github.com/openresty/lua-resty-balancer - A generic consistent hash implementation for OpenResty/Lua

http://ramses.tech/

HATEOAS

https://en.wikipedia.org/wiki/HATEOAS - Hypermedia As The Engine Of Application State (HATEOAS) is a constraint of the REST application architecture that distinguishes it from other network application architectures. With HATEOAS, a client interacts with a network application that application servers provide dynamically entirely through hypermedia. A REST client needs no prior knowledge about how to interact with an application or server beyond a generic understanding of hypermedia. By contrast, clients and servers in some service-oriented architectures (SOA) interact through a fixed interface shared through documentation or an interface description language (IDL). The way that the HATEOAS constraint decouples client and server enables the server functionality to evolve independently.

https://github.com/badgateway/ketting - The HATEOAS client for javascript

SOAP

http://www.thetwowayweb.com/soapmeetsrss

Google Feed API

GraphQL

GraphQL - a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL provides a complete and understandable description of the data in your API, gives clients the power to ask for exactly what they need and nothing more, makes it easier to evolve APIs over time, and enables powerful developer tools.

https://news.ycombinator.com/item?id=18045532

JSON-API

JSON:API - a specification for APIs that use JSON
- https://github.com/json-api/json-apiq

https://www.iana.org/assignments/media-types/application/vnd.api+json

Hydra

Hydra Core Vocabulary - a lightweight vocabulary to create hypermedia-driven Web APIs. By specifying a number of concepts commonly used in Web APIs it enables the creation of generic API clients.

User and group

Servers

gist: web-servers.md - Big list of http static server one-liners

Python 3

python -m http.server 5674

PHP <5.4

php -S localhost:8000

Ruby

ruby -run -e httpd -- --port=8080 .

https://github.com/rhardih/serve - as above with added gzip and http2

http://socialcompare.com/en/comparison/comparison-of-web-servers

h5ai - makes browsing directories on HTTP web servers more pleasant. Directory listings get styled in a modern way and browsing through the directories is enhanced by different views, a breadcrumb and a tree overview.
- https://github.com/lrsjng/h5ai

https://github.com/fangfufu/httpdirfs - HTTP Directory Filesystem with a permanent cache, and Airsonic / Subsonic server support!

Nginx

nginx [engine x] is a free, open-source, high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server. Igor Sysoev started development of Nginx in 2002, with the first public release in 2004. Nginx now hosts nearly 12.18% (22.2M) of active sites across all domains. Nginx is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption.

http://wiki.nginx.org/WhyUseIt

Wiki
- Resources
- http://nginx.org/en/docs/http/request_processing.html

Guides

Nginx Primer - Jul 19, 2010
Nginx Primer 2: From Apache to Nginx - Feb 04th, 2011

http://carrot.is/coding/nginx_introduction

Varnish as reverse proxy with nginx as web server and SSL terminator - Or, if you like, the nginx-Varnish-nginx sandwich. Jul 24th, 2012

http://www.shkschneider.me/blog/1323/nginx-security-configuration

Webfonts

http://blog.bigdinosaur.org/gzipping-at-font-face-with-nginx/

Nginx uses a file’s mime type declaration to decide whether or not to apply compression to that file, and so we must first ensure that the four types of web font files have mime types configured.

/etc/nginx/mime.types

application/vnd.ms-fontobject      eot;
application/x-font-ttf             ttf;
font/opentype                      ott;
font/x-woff                        woff;

remove;

application/octet-stream          eot;
application/vnd.oasis.opendocument.text-template  ott;

Configuration

Pitfalls

Listening on specific IP will override a wildcard IP catch-all.

https://github.com/h5bp/server-configs-nginx

nginxconfig.io
- https://github.com/digitalocean/nginxconfig.io

Site setup

Default site folder location can vary.

/var/www
  # debian

/etc/nginx/html
  # arch linux

Create 'Server Block' (vhost) config file in

/etc/nginx/sites-available

and symlink to them in

/etc/nginx/sites-enabled

ServerBlockExample - Basic examples

Enable logging in vhost conf;

error_log /var/log/nginx-vhostnamehere.log error;

https://github.com/perusio/drupal-with-nginx

Modules

Nginx modules must be selected during compile, run-time selection of modules is not currently supported.

NginxHttpCoreModule

Compression

http://wiki.nginx.org/HttpGzipModule

gzip_types text/plain text/css application/x-javascript text/xml application/xml application/xml+rss
text/javascript application/json image/svg+xml application/vnd.ms-fontobject application/x-font-ttf
font/opentype;

FastCGI

HttpFastcgiModule
- FcgiExample - FastCGI example. Keep seperate and include.
- PHPFcgiExample
- How to Solve “No input file specified” with PHP and Nginx - Jan 19, 2011

Connections

HttpLimitConnModule - This module makes it possible to limit the number of concurrent connections for a defined key such as, for example, an ip address.

HttpRewriteModule - rewriting urls

HttpAuthBasicModule - for directory passwords
- How do I generate an .htpasswd file without having Apache tools installed?

location / { 
   auth_basic            "Restricted";
   auth_basic_user_file  /etc/nginx/conf.d/htpasswd;
 }

printf "John:$(openssl passwd -crypt V3Ry)\n" >> .htpasswd # this example uses crypt encryption
printf "Mary:$(openssl passwd -apr1 SEcRe7)\n" >> .htpasswd # this example uses apr1 (Apache MD5) encryption
printf "Jane:$(openssl passwd -1 V3RySEcRe7)\n" >> .htpasswd # this example uses MD5 encryption
(PWD="SEcRe7PwD";SALT="$(openssl rand -base64 3)";SHA1=$(printf "$PWD$SALT" | openssl dgst -binary -sha1 | \ sed 's#$#'"$SALT"'#' | base64);printf "Jim:{SSHA}$SHA1\n" >> .htpasswd) # this example uses SSHA encryptio

https://github.com/taythebot/lightpath - CDN written in Lua using Openresty and Redis

https://github.com/ledgetech/ledge - An RFC compliant and ESI capable HTTP cache for Nginx / OpenResty, backed by Redis

https://github.com/openresty/lua-resty-limit-traffic - provides several Lua modules to help OpenResty/ngx_lua users to control and limit the traffic, either request rate or request concurrency (or both).

Security

http://pastebin.com/dBC7E8Jd

https://www.owasp.org/index.php/OWASP_NAXSI_Project
- https://github.com/nbs-system/naxsi

SSL

https://raymii.org/s/tutorials/Strong_SSL_Security_On_nginx.html

Create .key and .csr
1. openssl req -new -newkey rsa:4096 -nodes -keyout server.key -out server.csr
2. Or using StartSSL control panel wizard, with additional key decrypt step after
Provide .csr to certificate authority, certificate authority returns .crt
.crt is concatenated with intermediate cert. .key can also be added; nginx will not send it
1. cat ssl.crt sub.class1.server.ca.pem ca.pem > /etc/nginx/conf/ssl-unified.crt
Nginx config points to .crt and .key

Make sure wildcard SSL is in http stanza and that any specific server is listening on 443.

How to force or redirect to SSL in nginx?

server {
   listen   80;
   listen   [::]:80;
   listen   443 default ssl;

   server_name www.example.com;

   ssl_certificate        /path/to/my/cert;
   ssl_certificate_key  /path/to/my/key;

   if ($ssl_protocol = "") {
      rewrite ^   https://$server_name$request_uri? permanent;
   }
}

ssl_ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA2:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-RC4-SHA:ECDH-ECDSA-RC4-SHA:ECDH-RSA-RC4-SHA:ECDHE-RSA-AES256-SHA;

autoindex on;

http://wiki.nginx.org/NgxFancyIndex
- https://github.com/aperezdc/ngx-fancyindex

https://github.com/Naereen/Nginx-Fancyindex-Theme

http://larsjung.de/h5ai/

browsepy - The simple web file browser.
- https://gitlab.com/ergoithz/browsepy

Tools

https://github.com/perusio/nginx_ensite - nginx_ensite and nginx_dissite for quick virtual host enabling and disabling

http://www.anilcetin.com/convert-apache-htaccess-to-nginx/

https://github.com/perusio/nginx-cache-inspector

https://github.com/lebinh/ngxtop [11]

https://github.com/nginx/unit - a lightweight and versatile open-source server that has two primary capabilities: serves static media assets, runs application code in seven languages. Unit compresses several layers of the modern application stack into a potent, coherent solution with a focus on performance, low latency, and scalability. It is intended as a universal building block for any web architecture regardless of its complexity, from enterprise-scale deployments to your pet's homepage. Its native RESTful JSON API enables dynamic updates with zero interruptions and flexible configuration, while its out-of-the-box productivity reliably scales to production-grade workloads. We achieve that with a complex, asynchronous, multithreading architecture comprising multiple processes to ensure security and robustness while getting the most out of today's computing platforms.

Lua

http://devblog.mixlr.com/2012/09/01/nginx-lua/

Lapis - a framework for building web applications using MoonScript or Lua that runs inside of a customized version of Nginx called OpenResty.

Forks / patches

http://tengine.taobao.org/

https://github.com/yaoweibin/nginx_syslog_patch

OpenRest - a dynamic web platform based on NGINX and LuaJIT.
- https://github.com/openresty

Apache

Apache HTTP Server
- http://httpd.apache.org/docs/current/mod/

Issuing Correct HTTP Headers

The 5G Blacklist helps reduce the number of malicious URL requests that hit your website. It’s one of many ways to improve the security of your site and protect against evil exploits, bad requests, and other nefarious garbage.
- http://perishablepress.com/6g-beta/

http://www.askapache.com/

http://linux.die.net/man/1/ab

.htaccess

Stupid htaccess Tricks

http://httpd.apache.org/docs/2.2/mod/mod_expires.html
- http://www.askapache.com/hacking/speed-site-caching-cache-control.html
- http://www.xpertdeveloper.com/2011/08/set-expire-headers-using-htaccess/

http://wiki.dreamhost.com/Htaccess

lighttpd

lighttpd (lighty)
- lighttpd - wikipedia

Other

http://monkey-project.com/

http://cortesi.github.com/pathod

http://nibble.develsec.org/projects/sw.html

http://sintaxi.com/introducing-harp

https://news.ycombinator.com/item?id=7505535

https://github.com/cesanta/mongoose

https://github.com/nodeapps/http-server

https://github.com/moserrya/knod [12]

https://news.ycombinator.com/item?id=8130849 - gwan

http://www.pugo.org:8080/ - postscript [13]

https://github.com/kazuho/h2o [14]

https://github.com/lpereira/lwan [15]

Caddy - The HTTP/2 Web Server with Automatic HTTPS
- https://github.com/mholt/caddy

https://code.facebook.com/projects/676603015770415
- https://code.facebook.com/posts/1503205539947302/introducing-proxygen-facebook-s-c-http-framework/ [16]

http://corte.si/posts/devd/intro

https://github.com/parkomat/parkomat

https://github.com/phusion/passenger

https://github.com/diracdeltas/FastestWebsiteEver

https://github.com/cortesi/devd

https://bitbucket.org/naviserver/naviserver - a versatile multiprotocol (httpd et al) server written in C/Tcl. It can be easily extended in either language to create interesting web sites and services.
- http://wiki.tcl.tk/2090
- https://en.wikipedia.org/wiki/NaviServer

https://2ton.com.au/rwasa

fserv - an HTTP server that can serve static files, act as a forward and reverse HTTP proxy server, stream media content (SHOUTcast). It has a very high speed on all supported platforms due to its asynchronous I/O event architecture. It can easily process thousands of requests in parallel. The functionality of fserv can be extended by adding new modules. When configured as a proxy server, it caches responses from upstream servers on a local filesystem, dramatically speeding up the response time when the same request is received again. fserv can act as an Internet radio trasmitter or it can re-transmit other Internet radio stations. In the latter case it also can record songs into MP3 files on a local machine, which makes it a useful music-grabber.

ServerMe.H - An embeddable single-file C++ web server

webfs - a simple http server for mostly static content. You can use it to serve the content of a ftp server via http for example. It is also nice to export some files the quick way by starting a http server in a few seconds, without editing some config file first.

https://github.com/skx/httpd - Trivial HTTP-server for static-files only, written in go

https://github.com/DirectoryLister/DirectoryLister - easy way to expose the contents of any web-accessable folder for browsing and sharing, PHP7

unix-web

https://github.com/tjgillies/unix-web

CORS

http://www.w3.org/TR/cors/

http://enable-cors.org/

cors - "This post aims to demystify CORS and show its lighter side–as a specification that didn’t set out to hamper the aspirations of web developers everywhere, but instead to loose us from the grip of the same-origin policy. We’ll go through each of the headers necessary to properly satisfy CORS constraints, and also discuss a couple places where CORS is now relevant but which may surprise you." [17]

http://docs.aws.amazon.com/AmazonS3/latest/dev/cors.html

https://github.com/jpillora/xdomain [18]

https://cors-anywhere.herokuapp.com/
- https://github.com/Rob--W/cors-anywhere - a NodeJS reverse proxy which adds CORS headers to the proxied request.

Testing

Is it down? Check at Down for Everyone or Just Me - web service to check if a site is on-line

httping - like 'ping' but for http-requests. Give it an url, and it'll show you how long it takes to connect, send a request and retrieve the reply (only the headers). Be aware that the transmission across the network also takes time! So it measures the latency of the webserver + network.It supports, of course, IPv6.
- https://github.com/flok99/httping

RED - a robot that checks HTTP resources to see how they'll behave, pointing out common problems and suggesting improvements. Although it is not a HTTP conformance tester, it can find a number of HTTP-related issues.

Browser SOA Debugger - Depending on the view of things this is just an enhanced HTTP output formatter for tcpdump streams, or the ultimate debugger for complex HTTP oriented SOA architectures which visualizes the full HTTP interactions in a readable, reproducible way so that you can see what is actually going on in your backend.

https://en.wikipedia.org/wiki/HTTP_Debugger_(software)

http://www.telerik.com/fiddler - .NET, freeware
- https://en.wikipedia.org/wiki/Fiddler_(software)

Hyper - an experimental middleware architecture for HTTP servers written in PureScript. Its main focus is correctness and type-safety, using type-level information to enforce correct composition and abstraction for web servers. The Hyper project is also a breeding ground for higher-level web server constructs, which tend to fall under the “framework” category.
- https://github.com/owickstrom/hyper

Clients

Postman - a powerful HTTP client to help test web services easily and efficiently. Postman let's you craft simple as well as complex HTTP requests quickly. It also saves requests for future use so that you never have to repeat your keystrokes ever again. Postman is designed to save you and your team tons of time. Check out more features below or just install from the Chrome Web Store to get started. [19]

https://github.com/hazbo/httpu - The terminal-first http client

https://github.com/ffuf/ffuf - Fast web fuzzer written in Go

Insomnia REST Client

RESTClient - a debugger for RESTful web services.

Compression

http://www.whatsmyip.org/http-compression-test/

Load

http://httpd.apache.org/docs/2.0/programs/ab.html

http://grinder.sourceforge.net/

http://locust.io/

https://github.com/tsenart/vegeta

http://engulf-project.org/

A/B

http://visualwebsiteoptimizer.com/

API

WireMock - [20]

Performance

Guides

Google Dev: Make the Web Faster
- https://developers.google.com/speed/docs/best-practices/caching#LeverageBrowserCaching

YDN: Exceptional Performance

Even Faster Web Sites - Book by Steve Souders

Caching Tutorial for Web Authors and Webmasters
A Primer on Web Caching - Jun 23rd, 2012
How we made Portent.com really freaking fast - May 23, 2012

serverfault: The strange case of Mr. Time To First Byte

One Sub Domain Doesn’t Make a CDN - Jan 3, 2011
- if using not using www., use a subdomain cname on another domain

http://www.askapache.com/hacking/speed-site-caching-cache-control.html

https://groups.google.com/forum/?fromgroups#!topic/page-speed-discuss/oBpoPPaeWXk/discussion

Cookies

https://news.ycombinator.com/item?id=23111109 -

Cookieless

Cookie free domain for static content so cooke isn't sent with request. Root domain cookies apply to all subdomain cookies, though using www. (ugh!) works. Use another domain A record to point to the site.(?)

How to configure cookieless virtual host in Apache2?

ETag

http://en.wikipedia.org/wiki/HTTP_ETag

stackoverflow: ETag vs Header Expires - Feb 1, 2009
- http://httpd.apache.org/docs/2.2/mod/core.html#FileETag

Speed Tips: Turn Off ETags - for multiserver
http://joshua.schachter.org/2006/11/apache-etags.html
http://davidwalsh.name/yslow-htaccess
REST Tip: Deep etags give you more benefits. - Mar 2007
ETags Revisited - 31 Jan 2011. best overview article.

Caching

http://en.wikipedia.org/wiki/Web_cache

http://www.squid-cache.org/

BBC Digital Media Distribution: How we improved throughput by 4x [21]

Varnish

Varnish is a web application accelerator. You install it in front of your web application and it will speed it up significantly.

http://www.mediawiki.org/wiki/Manual:Varnish_caching
- http://labs.creativecommons.org/2011/03/18/caching-mediawiki-with-varnish/
- Setting Up Varnish & Memcache with Aegir

http://asm89.github.com/2012/09/26/context-aware-http-caching.html

Load balancing

http://news.ycombinator.com/item?id=5055478

CDN

http://en.wikipedia.org/wiki/Content_delivery_network

http://www.cloudflare.com/
- http://blog.cloudflare.com/cloudflares-free-cdn-and-you
- http://www.quora.com/Web-Development/Are-Modernizr-Respond-js-Selectivizr-and-Categorizr-compatible-with-Cloudflare

http://www.jsdelivr.com/ - js
- backed by maxcdn.com

CoralCDN is a decentralized, self-organizing, peer-to-peer web-content distribution network. CoralCDN leverages the aggregate bandwidth of volunteers running the software to absorb and dissipate most of the traffic for web sites using the system.

.nyud.net

http://www.akamai.com/

p2p;

Testing

http://www.webpagetest.org/

YSlow
- YSlow: Yahoo's Problems Are Not Your Problems - Aug 15, 2007

Google Page Speed

etc.

GTmetrix uses Google Page Speed and Yahoo! YSlow to grade your site's performance and provides actionable recommendations to fix these issues.

Blitz does cloud based load and performance testing ising Sinatra, Rails and node.js.
- Free: Sprint all you want, Rush all you want, 250 concurrent users, 1 minute rushes

Engulf is a scalable, distributed HTTP benchmarker, designed to let you spin up and coordinate a cluster of workers with nothing more than a single JAR. Engulf's backend is written in clojure, the frontend in javascript.

http://www.gidnetwork.com/tools/gzip-test.php

Logging

Combined Log Format

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined

old skool;

Apache

http://httpd.apache.org/docs/1.3/logs.html

Nginx

http://nginx.org/en/docs/ngx_core_module.html

http://wiki.nginx.org/HttpLogModule

error_log  /var/log/nginx/domain.name/error.log;
access_log  /var/log/nginx/domain.name/access.log;

http://articles.slicehost.com/2010/8/27/reading-nginx-web-logs

http://gadelkareem.com/2012/07/01/nginx-error-log-reader/

GoAccess

GoAccess is an open source real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems. It provides fast and valuable HTTP statistics for system administrators that require a visual server report on the fly.
- man goaccess

HAR

https://en.wikipedia.org/wiki/.har

https://dvcs.w3.org/hg/webperf/raw-file/tip/specs/HAR/Overview.html

Control panels

https://github.com/rtCamp/easyengine/

Analytics

https://github.com/alphagov/government-service-design-manual/blob/master/service-manual/making-software/analytics-tools.md

Referer

http://5f5.org/ruminations/dark-social-dubious.html

http://smerity.com/articles/2013/where_did_all_the_http_referrers_go.html [22]

Piwik

Piwik is downloadable, Free/Libre (GPLv3 licensed) real time web analytics software. It provides you with detailed reports on your website visitors; the search engines and keywords they used, the language they speak, your popular pages, and much more.
- http://piwik.org/latest.zip
- Trac

User Guide
- Log Analytics - combined format logs (apache, nginx)

Plugins

http://dev.piwik.org/trac/ticket/2041

WordPress

http://wordpress.org/extend/plugins/wp-piwik/

plausible

https://plausible.io/ - built for privacy-conscious website owners. Here’s what makes it a great Google Analytics alternative

Clicky

http://getclicky.com/

GoatCounter

GoatCounter - Simple web statistics. No tracking of personal data.
- https://github.com/zgoat/goatcounter

Google

Google Analytics
- What is the Google Analytics Asynchronous Tracking Code?

http://blog.arkency.com/2012/12/google-analytics-for-developers/

http://drawingablank.me/blog/fix-your-bounce-rate.html

https://github.com/shinnn/isogram

Other

http://segmentio.github.com/analytics.js/
- https://github.com/segmentio/analytics.js

https://sumall.com/

http://www.clicktale.com/

https://rakam.io/

E-mail

https://blog.mailchimp.com/how-gmails-image-caching-affects-open-tracking/

Upload

Quick screipts for uploading;

https://github.com/blueimp/jQuery-File-Upload

http://www.thecssninja.com/javascript/drag-and-drop-upload

tus - resumable file uploads
- https://github.com/tus

Saving

http://curl.haxx.se/docs/comparison-table.html

https://daniel.haxx.se/docs/curl-vs-wget.html [24]

Wget

Wget - a free software package for retrieving files using HTTP, HTTPS, FTP and FTPS the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc.
- GNU Wget Manual
- http://en.wikipedia.org/wiki/Wget

wget -O myzip.zip https://github.com/username/project/zipball/master

wget -m http://example.com
  --mirror

wget -mk http://example.com
  --convert-links

wget -mk -w 20 http://example.com
  # with delay of 20 seconds between requests

wget -E -H -k -K -p -nd http://example.com
  to mirror a single page
  --adjust-extension
  --span-hosts
  --convert-links
  --backup-converted
  --page-requisites
  --no-directories - httpd access permission issues. to try next time; -nH

wget -r -np -l 1 -Azip http://example.com/download/
  # download all links to .zip files on a given web page [25]

http://www.unixmen.com/wget-command-line-cheatsheet/

https://code.google.com/p/wgetremote - php web interface for wget download manager. with wgetremote you can remotely access installed wget on your local computer.

cURL

curl is a tool to transfer data from or to a server, using one of the supported protocols (DICT, FILE, FTP, FTPS, GOPHER, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMTP, SMTPS, TELNET and TFTP). The command is designed to work without user interaction.
- http://curl.haxx.se/docs/manpage.html

curl http://www.google.com/search.js -o /path/to/local/file.js

curl http://site.{one,two,three}.com

curl ftp://ftp.numericals.com/file[1-100].txt
     ftp://ftp.numericals.com/file[001-100].txt (with leading zeros)
     ftp://ftp.letters.com/file[a-z].txt
 sequences of alphanumeric series by using []

curl http://any.org/archive[1996-1999]/vol[1-4]/part{a,b,c}.html
  Nested sequences are not supported, but you can use several ones next to each other:

curl http://www.numericals.com/file[1-100:10].txt  http://www.letters.com/file[a-z:2].txt
  multiple urls + specify a step counter for the ranges to get every Nth number or letter:

saldl is a lightweight well-featured CLI downloader optimized for speed and early preview. based on libcurl.

wget2

https://github.com/rockdaboot/wget2 - the successor of GNU Wget, a file and recursive website downloader.Designed and written from scratch it wraps around libwget, that provides the basic functions needed by a web client.Wget2 works multi-threaded and uses many features to allow fast operation.In many cases Wget2 downloads much faster than Wget1.x due to HTTP2, HTTP compression, parallel connections and use of If-Modified-Since HTTP header.

gwget

Gwget - Download Manager for Gnome2. It uses wget as a backend.

mulk

mulk - Multi-connection command line tool for downloading Internet sites with image filtering and Metalink support. Similar to wget and cURL, but it manages up to 50 simultaneous and parallel links. Main features are: HTML code parsing, recursive fetching, Metalink retrieving, segmented download and image filtering by width and height. It is based on libcurl, liburiparser, libtidy, libmetalink and libcrypto.

aria2

aria2 - a lightweight multi-protocol & multi-source command-line download utility. It supports HTTP/HTTPS, FTP, BitTorrent and Metalink. aria2 can be manipulated via built-in JSON-RPC and XML-RPC interfaces.

HTTrack

HTTrack - allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads.

httrack "https://example.com" -O ExampleMirrorDirectory \
"-*" \
"+https://example.com/images/*" \
"-*.swf"

Other

http://en.wikipedia.org/wiki/MHTML

https://gist.github.com/446302

https://github.com/jjjake/internetarchive - A Python and Command-Line Interface to Archive.org

https://github.com/kanishka-linux/reminiscence - Self-hosted Bookmark and Archive manage [26]

Scraping

See Scraping

404

http://notfound.org/

http://leftlogic.com/info/articles/404

https://github.com/ColdSauce/CosmosBrowserAndroid

Other

http://www.morevisibility.com/analyticsblog/from-__utma-to-__utmz-google-analytics-cookies.html

https://github.com/uams/geturl

http://nodeknockout.com/teams/waving

https://news.ycombinator.com/item?id=8758196

https://en.wikipedia.org/wiki/Hyper_Text_Coffee_Pot_Control_Protocol - a facetious communication protocol for controlling, monitoring, and diagnosing coffee pots. It is specified in RFC 2324, published on 1 April 1998 as an April Fools' Day RFC, as part of an April Fools prank. An extension, HTCPCP-TEA, was published as RFC 7168 on 1 April 2014 to support brewing teas, which is also an April Fools' Day RFC.

HTTP Signatures

draft-cavage-http-signatures-10 - Signing HTTP Messages

HTTP Signatures Guide | HTTP Signatures Guide [master | Tribestream]

Ensuring Message Integrity with HTTP Signatures - Sathya Bandara - Medium

serverless-http-signatures.md

HTTP/2 (SPDY)

https://en.wikipedia.org/wiki/HTTP/2

http://www.mnot.net/blog/2012/08/04/http_vancouver
http://bitsup.blogspot.co.uk/2012/08/the-road-to-http2.html The Road to HTTP/2] - Aug 6, 2012
http://blog.jgc.org/2012/12/speeding-up-http-with-minimal-protocol.html

http://blog.tabini.ca/the-7-bit-internet/

https://payswarm.com/specs/source/http-keys

https://news.ycombinator.com/item?id=7023033

https://news.ycombinator.com/item?id=8549348

https://github.com/http2/http2-spec/wiki/Implementations

http://nghttp2.org/

HTTP/3 (QUIC)

https://en.wikipedia.org/wiki/HTTP/3

HTTP/3 | daniel.haxx.se - [27] [28]

Introduction · HTTP/3 explained - [29]

Errata Security: Some notes about HTTP/3 - [30]

HTTP

General

Addressing

Headers

Fetch

user-agent

auth

URI

CURIE

URL shortners

Services

well-known

API

CGI

RESTful

HATEOAS

SOAP

Google Feed API

GraphQL

JSON-API

Hydra

User and group

Servers

Nginx

Guides

Webfonts

Configuration

Site setup

Modules

Compression

FastCGI

Connections

Security

SSL

Proxy

Info

Directory listing

Tools

Lua

Forks / patches

Apache

.htaccess

lighttpd

Other

unix-web

CORS

Testing

Clients

Compression

Load

A/B

API

Performance

Guides

Cookies

Cookieless

ETag

Caching

Varnish

Load balancing

CDN

Testing

Logging

Apache

Nginx

GoAccess

HAR

Control panels

Analytics

Referer

Piwik

WordPress

plausible

Clicky

GoatCounter

Google

Other

E-mail

Upload

Saving