access_log
Receives access logs on a UNIX socket in JSON format and stores them on a database. It intentionally doesn't collect IP addresses. It doesn't respect the Do Not Track (DNT) header though, because we're not collecting personally identifiable data. Referrer collection is optional but we strongly suggest using a referrer policy that doesn't collect full addresses.
See the Rails migration for the database schema, the Nginx configuration, and the site configuration.
It supports SQlite3 and PostgreSQL databases :)
Sustainable Web Design
When enabled, you can track CO2 emissions using Sustainable Web Design "Calculating Digital Emissions" method. The algorithm and data are based on CO2.js.
It follows the calculations with the added --optional-- feature of using the origin country of the visit for the "consumer device" segment. To enable this, see Nginx configuration.
# For a datacenter using renewable energy on Costa Rica
access_log --swd --renewable --datacenter CR
Average vs marginal intensity
CO2.js explains this better. In practice, using average intensity data will give lower results and mostly use the global intensity, since the data by country is missing most countries.
access_log uses marginal data by default.
Update the data
-
Go to the
data/outputdirectory on co2.js repository for the latest version released (in this example 0.16.2) -
Download the
average-intensities.jsonfile. -
Run
src/average_intensities_by_country.crwith this file as stdin.
crystal run src/average_intensities_by_country.cr < average-intensities.json >> src/swd/average_intensity.cr
- Modify
src/swd/average_intensity.crto fix the data.
Create database
sqlite3 access_log.sqlite3 < contrib/create.sql
Build
Install zlib, sqlite3 and ssl development files (it varies between distributions).
Install Crystal and the development tools (also varies).
Run:
make
Build for Alpine
make alpine-build
Database configuration
Create an access_logs database with the following schema:
| Field | Type | Reference | Index? |
|---|---|---|---|
| id | String | UUID | Unique |
| host | String | Host name | Yes |
| msec | Float | Unix timestamp of visit | ? |
| server_protocol | String | HTTP/Version | ? |
| request_method | String | GET/POST/etc. | ? |
| request_completion | String | "OK" | ? |
| uri | String | Request | True |
| query_string | String | Arguments | ? |
| status | Integer | HTTP status | ? |
| sent_http_content_type | String | MIME type of response | ? |
| sent_http_content_encoding | String | Compression | ? |
| sent_http_etag | String | ETag header | ? |
| sent_http_last_modified | String | Last modified date | ? |
| http_accept | String | MIME types requested | ? |
| http_accept_encoding | String | Compression accepted | ? |
| http_accept_language | String | Languages supported | ? |
| http_pragma | String | Pragma header | ? |
| http_cache_control | String | Cache requested | ? |
| http_if_none_match | String | ETag requested | ? |
| http_dnt | String | Do Not Track header | ? |
| http_user_agent | String | User Agent | Yes |
| http_origin | String | Request origin | Yes |
| http_referer | String | Referer (see Referrer Policy) | Yes |
| request_time | Float | Request duration | ? |
| bytes_sent | Integer | Bytes sent | ? |
| body_bytes_sent | Integer | Bytes sent not including headers | ? |
| request_length | Integer | Headers | ? |
| http_connection | String | Connection status | ? |
| pipe | String | Connection was multiplexed | ? |
| connection_requests | Integer | Requests done on the same connection | ? |
| geoip2_data_country_name | String | Country according to GeoIP | Yes |
| geoip2_data_city_name | String | City according to GeoIP | Yes |
| ssl_server_name | String | SNI | ? |
| ssl_protocol | String | SSL/TLS version used | ? |
| ssl_early_data | String | TLSv1.3 early data used | ? |
| ssl_session_reused | String | TLS session reused | ? |
| ssl_curves | String | Curves used | ? |
| ssl_ciphers | String | Ciphers available | ? |
| ssl_cipher | String | Cipher used | ? |
| sent_http_x_xss_protection | String | XSS Protection sent | ? |
| sent_http_x_frame_options | String | Frame protection sent | ? |
| sent_http_x_content_type_options | String | Content protection sent | ? |
| sent_http_strict_transport_security | String | HSTS sent | ? |
| nginx_version | String | Server version | ? |
| pid | Integer | Server PID | ? |
| crawler | Boolean | Web crawler detected | ? |
| remote_user | String | HTTP Basic auth user | ? |
Nginx configuration
Configure Nginx to format access log as JSON. You can configure
http_referer.policy as one of unsafe-url, no-referrer, origin,
origin-when-cross-origin, same-origin, strict-origin,
strict-origin-when-cross-origin, no-referrer-when-downgrade.
{
"http_referer": {
"referrer": "$http_referer",
"origin": "$http_origin",
"policy": "origin-when-cross-origin"
}
}
Note: The internal key is referrer but the parent is
http_referer (double and single "r" respectively, the second is a typo
on the HTTP specification).
Install daemonize and run access_logd. By default it creates a UNIX
socket on /tmp/access_log.socket so Nginx writes can write to it using
its syslog support.
Check /var/log/nginx/error.log for debugging.
ACCESS_LOG_FLAGS is the env variable to pass flags to access_logd.
For a working example check our Nginx
container.
log_format main escape=json '{"host":"$host","msec":$msec,"server_protocol":"$server_protocol","request_method":"$request_method","request_completion":"$request_completion","uri":"$uri","query_string":"$query_string","status":$status,"sent_http_content_type":"$sent_http_content_type","sent_http_content_encoding":"$sent_http_content_encoding","sent_http_etag":"$sent_http_etag","sent_http_last_modified":"$sent_http_last_modified","http_accept":"$http_accept","http_accept_encoding":"$http_accept_encoding","http_accept_language":"$http_accept_language","http_pragma":"$http_pragma","http_cache_control":"$http_cache_control","http_if_none_match":"$http_if_none_match","http_dnt":"$http_dnt","http_user_agent":"$http_user_agent","http_origin":"$http_origin","http_referer":{"origin":"$http_origin","referrer":"$http_referer","policy":"origin-when-cross-origin"},"request_time":$request_time,"bytes_sent":$bytes_sent,"body_bytes_sent":$body_bytes_sent,"request_length":$request_length,"http_connection":"$http_connection","pipe":"$pipe","connection_requests":$connection_requests,"geoip2_data_country_name":"$geoip2_data_country_name","geoip2_data_city_name":"$geoip2_data_city_name","ssl_server_name":"$ssl_server_name","ssl_protocol":"$ssl_protocol","ssl_early_data":"$ssl_early_data","ssl_session_reused":"$ssl_session_reused","ssl_curves":"$ssl_curves","ssl_ciphers":"$ssl_ciphers","ssl_cipher":"$ssl_cipher","sent_http_x_xss_protection":"$sent_http_x_xss_protection","sent_http_x_frame_options":"$sent_http_x_frame_options","sent_http_x_content_type_options":"$sent_http_x_content_type_options","sent_http_strict_transport_security":"$sent_http_strict_transport_security","nginx_version":"$nginx_version","pid":"$pid","remote_user":""}';
access_log syslog=unix:/tmp/access_log.socket,nohostname main;
Add origin country of visit to SWD
Add a $geoip2_data_country_iso_code variable on Nginx and the
corresponding variable to the JSON log format.
geoip2 /usr/share/GeoIP/GeoLite2-Country.mmdb {
$geoip2_data_country_iso_code country iso_code;
}
log_format main escape=json '{"host":"$host","msec":$msec,"server_protocol":"$server_protocol","request_method":"$request_method","request_completion":"$request_completion","uri":"$uri","query_string":"$query_string","status":$status,"sent_http_content_type":"$sent_http_content_type","sent_http_content_encoding":"$sent_http_content_encoding","sent_http_etag":"$sent_http_etag","sent_http_last_modified":"$sent_http_last_modified","http_accept":"$http_accept","http_accept_encoding":"$http_accept_encoding","http_accept_language":"$http_accept_language","http_pragma":"$http_pragma","http_cache_control":"$http_cache_control","http_if_none_match":"$http_if_none_match","http_dnt":"$http_dnt","http_user_agent":"$http_user_agent","http_origin":"$http_origin","http_referer":{"origin":"$http_origin","referrer":"$http_referer","policy":"origin-when-cross-origin"},"request_time":$request_time,"bytes_sent":$bytes_sent,"body_bytes_sent":$body_bytes_sent,"request_length":$request_length,"http_connection":"$http_connection","pipe":"$pipe","connection_requests":$connection_requests,"geoip2_data_country_name":"$geoip2_data_country_name","geoip2_data_city_name":"$geoip2_data_city_name","ssl_server_name":"$ssl_server_name","ssl_protocol":"$ssl_protocol","ssl_early_data":"$ssl_early_data","ssl_session_reused":"$ssl_session_reused","ssl_curves":"$ssl_curves","ssl_ciphers":"$ssl_ciphers","ssl_cipher":"$ssl_cipher","sent_http_x_xss_protection":"$sent_http_x_xss_protection","sent_http_x_frame_options":"$sent_http_x_frame_options","sent_http_x_content_type_options":"$sent_http_x_content_type_options","sent_http_strict_transport_security":"$sent_http_strict_transport_security","nginx_version":"$nginx_version","pid":"$pid","remote_user":"","geoip2_data_country_iso_code":"$geoip2_data_country_iso_code"}';
Then run the program with the required flags enabled:
access_log --swd --device-country
ASN database
If you want to keep track of ASN for each visitor, for instance for grouping possible attacks or IA crawls, create a database based on https://iptoasn.com/:
./contrib/asn_database.sh
And start the server with the --asn-database= flag.
Crawler user agents
Download the crawler user agents
database and feed it
as argument to access_log. It'll try to detect if a UA belongs to
a web crawler.
TODO
- Make some fields optional