# access_log Receives access logs on a UNIX socket in JSON format and stores them on a database. It **intentionally** doesn't collect IP addresses. It doesn't respect the Do Not Track (DNT) header though, because we're not collecting personally identifiable data. Referrer collection is optional but we **strongly** suggest using a referrer policy that doesn't collect full addresses. See the [Rails migration](https://0xacab.org/sutty/sutty/blob/rails/db/migrate/20200118155319_create_access_log.rb) for the database schema, the [Nginx configuration](https://0xacab.org/sutty/containers/nginx/blob/master/nginx/nginx.conf), and the [site configuration](https://0xacab.org/sutty/ansible-sutty/blob/master/templates/sites.conf.j2). It supports SQlite3 and PostgreSQL databases :) ## Sustainable Web Design When enabled, you can track CO2 emissions using [Sustainable Web Design "Calculating Digital Emissions" method](https://sustainablewebdesign.org/calculating-digital-emissions/). The algorithm and data are based on [CO2.js](https://github.com/thegreenwebfoundation/co2.js). It follows the calculations with the added --optional-- feature of using the origin country of the visit for the "consumer device" segment. To enable this, see Nginx configuration. ```bash # For a datacenter using renewable energy on Costa Rica access_log --swd --renewable --datacenter CR ``` ### Average vs marginal intensity [CO2.js explains this better](https://developers.thegreenwebfoundation.org/co2js/data/). In practice, using average intensity data will give lower results and mostly use the global intensity, since the data by country is missing most countries. `access_log` uses marginal data by default. ## Create database ```bash sqlite3 access_log.sqlite3 < contrib/create.sql ``` ## Build Install zlib, sqlite3 and ssl development files (it varies between distributions). Install Crystal and the development tools (also varies). Run: ```bash make ``` ## Build for Alpine ```bash make alpine-build ``` ## Database configuration Create an `access_logs` database with the following schema: | Field | Type | Reference | Index? | | ----- | ---- | --------- | ------ | | id | String | UUID | Unique | | host | String | Host name | Yes | | msec | Float | Unix timestamp of visit | ? | | server_protocol | String | HTTP/Version | ? | | request_method | String | GET/POST/etc. | ? | | request_completion | String | "OK" | ? | | uri | String | Request | True | | query_string | String | Arguments | ? | | status | Integer | HTTP status | ? | | sent_http_content_type | String | MIME type of response | ? | | sent_http_content_encoding | String | Compression | ? | | sent_http_etag | String | ETag header | ? | | sent_http_last_modified | String | Last modified date | ? | | http_accept | String | MIME types requested | ? | | http_accept_encoding | String | Compression accepted | ? | | http_accept_language | String | Languages supported | ? | | http_pragma | String | Pragma header | ? | | http_cache_control | String | Cache requested | ? | | http_if_none_match | String | ETag requested | ? | | http_dnt | String | Do Not Track header | ? | | http_user_agent | String | User Agent | Yes | | http_origin | String | Request origin | Yes | | http_referer | String | Referer (see Referrer Policy) | Yes | | request_time | Float | Request duration | ? | | bytes_sent | Integer | Bytes sent | ? | | body_bytes_sent | Integer | Bytes sent not including headers | ? | | request_length | Integer | Headers | ? | | http_connection | String | Connection status | ? | | pipe | String | Connection was multiplexed | ? | | connection_requests | Integer | Requests done on the same connection | ? | | geoip2_data_country_name | String | Country according to GeoIP | Yes | | geoip2_data_city_name | String | City according to GeoIP | Yes | | ssl_server_name | String | SNI | ? | | ssl_protocol | String | SSL/TLS version used | ? | | ssl_early_data | String | TLSv1.3 early data used | ? | | ssl_session_reused | String | TLS session reused | ? | | ssl_curves | String | Curves used | ? | | ssl_ciphers | String | Ciphers available | ? | | ssl_cipher | String | Cipher used | ? | | sent_http_x_xss_protection | String | XSS Protection sent | ? | | sent_http_x_frame_options | String | Frame protection sent | ? | | sent_http_x_content_type_options | String | Content protection sent | ? | | sent_http_strict_transport_security | String | HSTS sent | ? | | nginx_version | String | Server version | ? | | pid | Integer | Server PID | ? | | crawler | Boolean | Web crawler detected | ? | | remote_user | String | HTTP Basic auth user | ? | ## Nginx configuration Configure Nginx to format access log as JSON. You can configure `http_referer.policy` as one of `unsafe-url`, `no-referrer`, `origin`, `origin-when-cross-origin`, `same-origin`, `strict-origin`, `strict-origin-when-cross-origin`, `no-referrer-when-downgrade`. ```json { "http_referer": { "referrer": "$http_referer", "origin": "$http_origin", "policy": "origin-when-cross-origin" } } ``` **Note:** The internal key is `referrer` but the parent is `http_referer` (double and single "r" respectively, the second is a typo on the HTTP specification). Install `daemonize` and run `access_logd`. By default it creates a UNIX socket on `/tmp/access_log.socket` so Nginx writes can write to it using its [syslog support](https://nginx.org/en/docs/syslog.html). Check `/var/log/nginx/error.log` for debugging. `ACCESS_LOG_FLAGS` is the env variable to pass flags to `access_logd`. For a working example check our [Nginx container](https://0xacab.org/sutty/containers/nginx/). ``` log_format main escape=json '{"host":"$host","msec":$msec,"server_protocol":"$server_protocol","request_method":"$request_method","request_completion":"$request_completion","uri":"$uri","query_string":"$query_string","status":$status,"sent_http_content_type":"$sent_http_content_type","sent_http_content_encoding":"$sent_http_content_encoding","sent_http_etag":"$sent_http_etag","sent_http_last_modified":"$sent_http_last_modified","http_accept":"$http_accept","http_accept_encoding":"$http_accept_encoding","http_accept_language":"$http_accept_language","http_pragma":"$http_pragma","http_cache_control":"$http_cache_control","http_if_none_match":"$http_if_none_match","http_dnt":"$http_dnt","http_user_agent":"$http_user_agent","http_origin":"$http_origin","http_referer":{"origin":"$http_origin","referrer":"$http_referer","policy":"origin-when-cross-origin"},"request_time":$request_time,"bytes_sent":$bytes_sent,"body_bytes_sent":$body_bytes_sent,"request_length":$request_length,"http_connection":"$http_connection","pipe":"$pipe","connection_requests":$connection_requests,"geoip2_data_country_name":"$geoip2_data_country_name","geoip2_data_city_name":"$geoip2_data_city_name","ssl_server_name":"$ssl_server_name","ssl_protocol":"$ssl_protocol","ssl_early_data":"$ssl_early_data","ssl_session_reused":"$ssl_session_reused","ssl_curves":"$ssl_curves","ssl_ciphers":"$ssl_ciphers","ssl_cipher":"$ssl_cipher","sent_http_x_xss_protection":"$sent_http_x_xss_protection","sent_http_x_frame_options":"$sent_http_x_frame_options","sent_http_x_content_type_options":"$sent_http_x_content_type_options","sent_http_strict_transport_security":"$sent_http_strict_transport_security","nginx_version":"$nginx_version","pid":"$pid","remote_user":""}'; access_log syslog=unix:/tmp/access_log.socket,nohostname main; ``` ### Add origin country of visit to SWD Add a `$geoip2_data_country_iso_code` variable on Nginx and the corresponding variable to the JSON log format. ```nginx geoip2 /usr/share/GeoIP/GeoLite2-Country.mmdb { $geoip2_data_country_iso_code country iso_code; } log_format main escape=json '{"host":"$host","msec":$msec,"server_protocol":"$server_protocol","request_method":"$request_method","request_completion":"$request_completion","uri":"$uri","query_string":"$query_string","status":$status,"sent_http_content_type":"$sent_http_content_type","sent_http_content_encoding":"$sent_http_content_encoding","sent_http_etag":"$sent_http_etag","sent_http_last_modified":"$sent_http_last_modified","http_accept":"$http_accept","http_accept_encoding":"$http_accept_encoding","http_accept_language":"$http_accept_language","http_pragma":"$http_pragma","http_cache_control":"$http_cache_control","http_if_none_match":"$http_if_none_match","http_dnt":"$http_dnt","http_user_agent":"$http_user_agent","http_origin":"$http_origin","http_referer":{"origin":"$http_origin","referrer":"$http_referer","policy":"origin-when-cross-origin"},"request_time":$request_time,"bytes_sent":$bytes_sent,"body_bytes_sent":$body_bytes_sent,"request_length":$request_length,"http_connection":"$http_connection","pipe":"$pipe","connection_requests":$connection_requests,"geoip2_data_country_name":"$geoip2_data_country_name","geoip2_data_city_name":"$geoip2_data_city_name","ssl_server_name":"$ssl_server_name","ssl_protocol":"$ssl_protocol","ssl_early_data":"$ssl_early_data","ssl_session_reused":"$ssl_session_reused","ssl_curves":"$ssl_curves","ssl_ciphers":"$ssl_ciphers","ssl_cipher":"$ssl_cipher","sent_http_x_xss_protection":"$sent_http_x_xss_protection","sent_http_x_frame_options":"$sent_http_x_frame_options","sent_http_x_content_type_options":"$sent_http_x_content_type_options","sent_http_strict_transport_security":"$sent_http_strict_transport_security","nginx_version":"$nginx_version","pid":"$pid","remote_user":"","geoip2_data_country_iso_code":"$geoip2_data_country_iso_code"}'; ``` Then run the program with the required flags enabled: ```bash access_log --swd --device-country ``` ## Crawler user agents Download the [crawler user agents database](https://github.com/monperrus/crawler-user-agents) and feed it as argument to `access_log`. It'll try to detect if a UA belongs to a web crawler. ## TODO * [ ] Make some fields optional