2025-11-06 18:07:56 -03:00
2025-11-06 17:42:42 -03:00
2025-07-24 12:34:57 -03:00
2025-11-06 18:07:56 -03:00
2020-01-25 20:22:57 -03:00
2025-11-01 02:24:39 -03:00
2023-04-11 15:34:21 -03:00
2020-08-04 12:50:47 -03:00
2020-01-25 20:22:57 -03:00
2020-01-25 20:22:57 -03:00
2025-07-24 11:35:05 -03:00
2022-03-04 00:14:00 -03:00
2025-11-06 18:07:56 -03:00

access_log

Receives access logs on a UNIX socket in JSON format and stores them on a database. It intentionally doesn't collect IP addresses. It doesn't respect the Do Not Track (DNT) header though, because we're not collecting personally identifiable data. Referrer collection is optional but we strongly suggest using a referrer policy that doesn't collect full addresses.

See the Rails migration for the database schema, the Nginx configuration, and the site configuration.

It supports SQlite3 and PostgreSQL databases :)

Sustainable Web Design

When enabled, you can track CO2 emissions using Sustainable Web Design "Calculating Digital Emissions" method. The algorithm and data are based on CO2.js.

It follows the calculations with the added --optional-- feature of using the origin country of the visit for the "consumer device" segment. To enable this, see Nginx configuration.

# For a datacenter using renewable energy on Costa Rica
access_log --swd --renewable --datacenter CR

Average vs marginal intensity

CO2.js explains this better. In practice, using average intensity data will give lower results and mostly use the global intensity, since the data by country is missing most countries.

access_log uses marginal data by default.

Update the data

  1. Go to the data/output directory on co2.js repository for the latest version released (in this example 0.16.2)

  2. Download the average-intensities.json file.

  3. Run src/average_intensities_by_country.cr with this file as stdin.

crystal run src/average_intensities_by_country.cr < average-intensities.json >> src/swd/average_intensity.cr
  1. Modify src/swd/average_intensity.cr to fix the data.

Create database

sqlite3 access_log.sqlite3 < contrib/create.sql

Build

Install zlib, sqlite3 and ssl development files (it varies between distributions).

Install Crystal and the development tools (also varies).

Run:

make

Build for Alpine

make alpine-build

Database configuration

Create an access_logs database with the following schema:

Field Type Reference Index?
id String UUID Unique
host String Host name Yes
msec Float Unix timestamp of visit ?
server_protocol String HTTP/Version ?
request_method String GET/POST/etc. ?
request_completion String "OK" ?
uri String Request True
query_string String Arguments ?
status Integer HTTP status ?
sent_http_content_type String MIME type of response ?
sent_http_content_encoding String Compression ?
sent_http_etag String ETag header ?
sent_http_last_modified String Last modified date ?
http_accept String MIME types requested ?
http_accept_encoding String Compression accepted ?
http_accept_language String Languages supported ?
http_pragma String Pragma header ?
http_cache_control String Cache requested ?
http_if_none_match String ETag requested ?
http_dnt String Do Not Track header ?
http_user_agent String User Agent Yes
http_origin String Request origin Yes
http_referer String Referer (see Referrer Policy) Yes
request_time Float Request duration ?
bytes_sent Integer Bytes sent ?
body_bytes_sent Integer Bytes sent not including headers ?
request_length Integer Headers ?
http_connection String Connection status ?
pipe String Connection was multiplexed ?
connection_requests Integer Requests done on the same connection ?
geoip2_data_country_name String Country according to GeoIP Yes
geoip2_data_city_name String City according to GeoIP Yes
ssl_server_name String SNI ?
ssl_protocol String SSL/TLS version used ?
ssl_early_data String TLSv1.3 early data used ?
ssl_session_reused String TLS session reused ?
ssl_curves String Curves used ?
ssl_ciphers String Ciphers available ?
ssl_cipher String Cipher used ?
sent_http_x_xss_protection String XSS Protection sent ?
sent_http_x_frame_options String Frame protection sent ?
sent_http_x_content_type_options String Content protection sent ?
sent_http_strict_transport_security String HSTS sent ?
nginx_version String Server version ?
pid Integer Server PID ?
crawler Boolean Web crawler detected ?
remote_user String HTTP Basic auth user ?

Nginx configuration

Configure Nginx to format access log as JSON. You can configure http_referer.policy as one of unsafe-url, no-referrer, origin, origin-when-cross-origin, same-origin, strict-origin, strict-origin-when-cross-origin, no-referrer-when-downgrade.

{
  "http_referer": {
    "referrer": "$http_referer",
    "origin": "$http_origin",
    "policy": "origin-when-cross-origin"
  }
}

Note: The internal key is referrer but the parent is http_referer (double and single "r" respectively, the second is a typo on the HTTP specification).

Install daemonize and run access_logd. By default it creates a UNIX socket on /tmp/access_log.socket so Nginx writes can write to it using its syslog support.

Check /var/log/nginx/error.log for debugging.

ACCESS_LOG_FLAGS is the env variable to pass flags to access_logd. For a working example check our Nginx container.

log_format main escape=json '{"host":"$host","msec":$msec,"server_protocol":"$server_protocol","request_method":"$request_method","request_completion":"$request_completion","uri":"$uri","query_string":"$query_string","status":$status,"sent_http_content_type":"$sent_http_content_type","sent_http_content_encoding":"$sent_http_content_encoding","sent_http_etag":"$sent_http_etag","sent_http_last_modified":"$sent_http_last_modified","http_accept":"$http_accept","http_accept_encoding":"$http_accept_encoding","http_accept_language":"$http_accept_language","http_pragma":"$http_pragma","http_cache_control":"$http_cache_control","http_if_none_match":"$http_if_none_match","http_dnt":"$http_dnt","http_user_agent":"$http_user_agent","http_origin":"$http_origin","http_referer":{"origin":"$http_origin","referrer":"$http_referer","policy":"origin-when-cross-origin"},"request_time":$request_time,"bytes_sent":$bytes_sent,"body_bytes_sent":$body_bytes_sent,"request_length":$request_length,"http_connection":"$http_connection","pipe":"$pipe","connection_requests":$connection_requests,"geoip2_data_country_name":"$geoip2_data_country_name","geoip2_data_city_name":"$geoip2_data_city_name","ssl_server_name":"$ssl_server_name","ssl_protocol":"$ssl_protocol","ssl_early_data":"$ssl_early_data","ssl_session_reused":"$ssl_session_reused","ssl_curves":"$ssl_curves","ssl_ciphers":"$ssl_ciphers","ssl_cipher":"$ssl_cipher","sent_http_x_xss_protection":"$sent_http_x_xss_protection","sent_http_x_frame_options":"$sent_http_x_frame_options","sent_http_x_content_type_options":"$sent_http_x_content_type_options","sent_http_strict_transport_security":"$sent_http_strict_transport_security","nginx_version":"$nginx_version","pid":"$pid","remote_user":""}';

access_log syslog=unix:/tmp/access_log.socket,nohostname main;

Add origin country of visit to SWD

Add a $geoip2_data_country_iso_code variable on Nginx and the corresponding variable to the JSON log format.

geoip2 /usr/share/GeoIP/GeoLite2-Country.mmdb {
  $geoip2_data_country_iso_code country iso_code;
}

log_format main escape=json '{"host":"$host","msec":$msec,"server_protocol":"$server_protocol","request_method":"$request_method","request_completion":"$request_completion","uri":"$uri","query_string":"$query_string","status":$status,"sent_http_content_type":"$sent_http_content_type","sent_http_content_encoding":"$sent_http_content_encoding","sent_http_etag":"$sent_http_etag","sent_http_last_modified":"$sent_http_last_modified","http_accept":"$http_accept","http_accept_encoding":"$http_accept_encoding","http_accept_language":"$http_accept_language","http_pragma":"$http_pragma","http_cache_control":"$http_cache_control","http_if_none_match":"$http_if_none_match","http_dnt":"$http_dnt","http_user_agent":"$http_user_agent","http_origin":"$http_origin","http_referer":{"origin":"$http_origin","referrer":"$http_referer","policy":"origin-when-cross-origin"},"request_time":$request_time,"bytes_sent":$bytes_sent,"body_bytes_sent":$body_bytes_sent,"request_length":$request_length,"http_connection":"$http_connection","pipe":"$pipe","connection_requests":$connection_requests,"geoip2_data_country_name":"$geoip2_data_country_name","geoip2_data_city_name":"$geoip2_data_city_name","ssl_server_name":"$ssl_server_name","ssl_protocol":"$ssl_protocol","ssl_early_data":"$ssl_early_data","ssl_session_reused":"$ssl_session_reused","ssl_curves":"$ssl_curves","ssl_ciphers":"$ssl_ciphers","ssl_cipher":"$ssl_cipher","sent_http_x_xss_protection":"$sent_http_x_xss_protection","sent_http_x_frame_options":"$sent_http_x_frame_options","sent_http_x_content_type_options":"$sent_http_x_content_type_options","sent_http_strict_transport_security":"$sent_http_strict_transport_security","nginx_version":"$nginx_version","pid":"$pid","remote_user":"","geoip2_data_country_iso_code":"$geoip2_data_country_iso_code"}';

Then run the program with the required flags enabled:

access_log --swd --device-country

ASN database

If you want to keep track of ASN for each visitor, for instance for grouping possible attacks or IA crawls, create a database based on https://iptoasn.com/:

./contrib/asn_database.sh

And start the server with the --asn-database= flag.

Crawler user agents

Download the crawler user agents database and feed it as argument to access_log. It'll try to detect if a UA belongs to a web crawler.

TODO

  • Make some fields optional
Description
No description provided
Readme 154 KiB
Languages
Crystal 93.4%
PLpgSQL 3.7%
Shell 1.4%
Makefile 0.9%
Dockerfile 0.6%