2025-11-16 13:12:12 -03:00
2025-07-24 12:34:57 -03:00
2025-11-16 13:12:12 -03:00
2020-01-25 20:22:57 -03:00
2025-11-01 02:24:39 -03:00
2023-04-11 15:34:21 -03:00
2020-08-04 12:50:47 -03:00
2020-01-25 20:22:57 -03:00
2020-01-25 20:22:57 -03:00
2025-11-16 12:14:45 -03:00
2025-11-07 11:40:57 -03:00
2025-11-16 13:12:12 -03:00

access_log

Receives access logs on a UNIX socket in JSON format and stores them on a database. It intentionally doesn't collect IP addresses. It doesn't respect the Do Not Track (DNT) header though, because we're not collecting personally identifiable data. Referrer collection is optional but we strongly suggest using a referrer policy that doesn't collect full addresses.

See the Rails migration for the database schema, the Nginx configuration, and the site configuration.

It supports SQlite3 and PostgreSQL databases :)

Sustainable Web Design

When enabled, you can track CO2 emissions using Sustainable Web Design "Calculating Digital Emissions" method. The algorithm and data are based on CO2.js.

It follows the calculations with the added --optional-- feature of using the origin country of the visit for the "consumer device" segment. To enable this, see Nginx configuration.

# For a datacenter using renewable energy on Costa Rica
access_log --swd --renewable --datacenter CR

Average vs marginal intensity

CO2.js explains this better. In practice, using average intensity data will give lower results and mostly use the global intensity, since the data by country is missing most countries.

access_log uses marginal data by default.

Update the data

  1. Go to the data/output directory on co2.js repository for the latest version released (in this example 0.16.2)

  2. Download the average-intensities.json file.

  3. Run src/average_intensities_by_country.cr with this file as stdin.

crystal run src/average_intensities_by_country.cr < average-intensities.json >> src/swd/average_intensity.cr
  1. Modify src/swd/average_intensity.cr to fix the data.

Create database

sqlite3 access_log.sqlite3 < contrib/create.sql

Build

Install zlib, sqlite3 and ssl development files (it varies between distributions).

Install Crystal and the development tools (also varies).

Run:

make

Build for Alpine

make alpine-build

Database configuration

Create an access_logs database with an schema similar to create.sql

Nginx configuration

Configure Nginx to format access log as JSON. You can configure http_referer.policy as one of unsafe-url, no-referrer, origin, origin-when-cross-origin, same-origin, strict-origin, strict-origin-when-cross-origin, no-referrer-when-downgrade.

{
  "http_referer": {
    "referrer": "$http_referer",
    "origin": "$http_origin",
    "policy": "origin-when-cross-origin"
  }
}

Note: The internal key is referrer but the parent is http_referer (double and single "r" respectively, the second is a typo on the HTTP specification).

Install daemonize and run access_logd. By default it creates a UNIX socket on /tmp/access_log.socket so Nginx writes can write to it using its syslog support.

Check /var/log/nginx/error.log for debugging.

ACCESS_LOG_FLAGS is the env variable to pass flags to access_logd. For a working example check our Nginx container.

log_format main escape=json '{"host":"$host","msec":$msec,"server_protocol":"$server_protocol","request_method":"$request_method","request_completion":"$request_completion","uri":"$uri","query_string":"$query_string","status":$status,"sent_http_content_type":"$sent_http_content_type","sent_http_content_encoding":"$sent_http_content_encoding","sent_http_etag":"$sent_http_etag","sent_http_last_modified":"$sent_http_last_modified","http_accept":"$http_accept","http_accept_encoding":"$http_accept_encoding","http_accept_language":"$http_accept_language","http_pragma":"$http_pragma","http_cache_control":"$http_cache_control","http_if_none_match":"$http_if_none_match","http_dnt":"$http_dnt","http_user_agent":"$http_user_agent","http_origin":"$http_origin","http_referer":{"origin":"$http_origin","referrer":"$http_referer","policy":"origin-when-cross-origin"},"request_time":$request_time,"bytes_sent":$bytes_sent,"body_bytes_sent":$body_bytes_sent,"request_length":$request_length,"http_connection":"$http_connection","pipe":"$pipe","connection_requests":$connection_requests,"geoip2_data_country_name":"$geoip2_data_country_name","geoip2_data_city_name":"$geoip2_data_city_name","ssl_server_name":"$ssl_server_name","ssl_protocol":"$ssl_protocol","ssl_early_data":"$ssl_early_data","ssl_session_reused":"$ssl_session_reused","ssl_curves":"$ssl_curves","ssl_ciphers":"$ssl_ciphers","ssl_cipher":"$ssl_cipher","sent_http_x_xss_protection":"$sent_http_x_xss_protection","sent_http_x_frame_options":"$sent_http_x_frame_options","sent_http_x_content_type_options":"$sent_http_x_content_type_options","sent_http_strict_transport_security":"$sent_http_strict_transport_security","nginx_version":"$nginx_version","pid":"$pid","remote_user":""}';

access_log syslog=unix:/tmp/access_log.socket,nohostname main;

Add origin country of visit to SWD

Add a $geoip2_data_country_iso_code variable on Nginx and the corresponding variable to the JSON log format.

geoip2 /usr/share/GeoIP/GeoLite2-Country.mmdb {
  $geoip2_data_country_iso_code country iso_code;
}

log_format main escape=json '{"host":"$host","msec":$msec,"server_protocol":"$server_protocol","request_method":"$request_method","request_completion":"$request_completion","uri":"$uri","query_string":"$query_string","status":$status,"sent_http_content_type":"$sent_http_content_type","sent_http_content_encoding":"$sent_http_content_encoding","sent_http_etag":"$sent_http_etag","sent_http_last_modified":"$sent_http_last_modified","http_accept":"$http_accept","http_accept_encoding":"$http_accept_encoding","http_accept_language":"$http_accept_language","http_pragma":"$http_pragma","http_cache_control":"$http_cache_control","http_if_none_match":"$http_if_none_match","http_dnt":"$http_dnt","http_user_agent":"$http_user_agent","http_origin":"$http_origin","http_referer":{"origin":"$http_origin","referrer":"$http_referer","policy":"origin-when-cross-origin"},"request_time":$request_time,"bytes_sent":$bytes_sent,"body_bytes_sent":$body_bytes_sent,"request_length":$request_length,"http_connection":"$http_connection","pipe":"$pipe","connection_requests":$connection_requests,"geoip2_data_country_name":"$geoip2_data_country_name","geoip2_data_city_name":"$geoip2_data_city_name","ssl_server_name":"$ssl_server_name","ssl_protocol":"$ssl_protocol","ssl_early_data":"$ssl_early_data","ssl_session_reused":"$ssl_session_reused","ssl_curves":"$ssl_curves","ssl_ciphers":"$ssl_ciphers","ssl_cipher":"$ssl_cipher","sent_http_x_xss_protection":"$sent_http_x_xss_protection","sent_http_x_frame_options":"$sent_http_x_frame_options","sent_http_x_content_type_options":"$sent_http_x_content_type_options","sent_http_strict_transport_security":"$sent_http_strict_transport_security","nginx_version":"$nginx_version","pid":"$pid","remote_user":"","geoip2_data_country_iso_code":"$geoip2_data_country_iso_code"}';

Then run the program with the required flags enabled:

access_log --swd --device-country

ASN database

If you want to keep track of ASN for each visitor, for instance for grouping possible attacks or IA crawls, create a database based on https://iptoasn.com/:

./contrib/asn_database.sh

And start the server with the --asn-database= flag.

Crawler user agents

Download the crawler user agents database and feed it as argument to access_log. It'll try to detect if a UA belongs to a web crawler.

TODO

  • Make some fields optional
Description
No description provided
Readme 210 KiB
Languages
Crystal 93.7%
PLpgSQL 3.5%
Shell 1.3%
Makefile 0.9%
Dockerfile 0.6%