2022-03-04 00:14:14 -03:00
2020-01-25 20:22:57 -03:00
2020-08-04 12:50:47 -03:00
2020-01-25 20:22:57 -03:00
2020-01-25 20:22:57 -03:00
2020-08-15 21:55:41 -03:00
2022-03-04 00:14:00 -03:00
2022-03-04 00:14:14 -03:00

access_log

Receives access logs from stdin in JSON format and stores them on a database. It intentionally doesn't collect IP addresses. It doesn't respect the Do Not Track (DNT) header though, because we're not collecting personally identifiable data. Referrer collection is optional but we strongly suggest using a referrer policy that doesn't collect full addresses.

See the Rails migration for the database schema, the Nginx configuration, and the site configuration.

It supports SQlite3 and PostgreSQL databases :)

Build

Install zlib, sqlite3 and ssl development files (it varies between distributions).

Install Crystal and the development tools (also varies).

Run:

make

Build for Alpine

make alpine-build

Database configuration

Create an access_logs database with the following schema:

Field Type Reference Index?
id String UUID Unique
host String Host name Yes
msec Float Unix timestamp of visit ?
server_protocol String HTTP/Version ?
request_method String GET/POST/etc. ?
request_completion String "OK" ?
uri String Request True
query_string String Arguments ?
status Integer HTTP status ?
sent_http_content_type String MIME type of response ?
sent_http_content_encoding String Compression ?
sent_http_etag String ETag header ?
sent_http_last_modified String Last modified date ?
http_accept String MIME types requested ?
http_accept_encoding String Compression accepted ?
http_accept_language String Languages supported ?
http_pragma String Pragma header ?
http_cache_control String Cache requested ?
http_if_none_match String ETag requested ?
http_dnt String Do Not Track header ?
http_user_agent String User Agent Yes
http_origin String Request origin Yes
http_referer String Referer (see Referrer Policy) Yes
request_time Float Request duration ?
bytes_sent Integer Bytes sent ?
body_bytes_sent Integer Bytes sent not including headers ?
request_length Integer Headers ?
http_connection String Connection status ?
pipe String Connection was multiplexed ?
connection_requests Integer Requests done on the same connection ?
geoip2_data_country_name String Country according to GeoIP Yes
geoip2_data_city_name String City according to GeoIP Yes
ssl_server_name String SNI ?
ssl_protocol String SSL/TLS version used ?
ssl_early_data String TLSv1.3 early data used ?
ssl_session_reused String TLS session reused ?
ssl_curves String Curves used ?
ssl_ciphers String Ciphers available ?
ssl_cipher String Cipher used ?
sent_http_x_xss_protection String XSS Protection sent ?
sent_http_x_frame_options String Frame protection sent ?
sent_http_x_content_type_options String Content protection sent ?
sent_http_strict_transport_security String HSTS sent ?
nginx_version String Server version ?
pid Integer Server PID ?
crawler Boolean Web crawler detected ?
remote_user String HTTP Basic auth user ?

Nginx configuration

Configure Nginx to format access log as JSON. You can configure http_referer.policy as one of unsafe-url, no-referrer, origin, origin-when-cross-origin, same-origin, strict-origin, strict-origin-when-cross-origin, no-referrer-when-downgrade.

{
  "http_referer": {
    "referrer": "$http_referer",
    "origin": "$http_origin",
    "policy": "origin-when-cross-origin"
  }
}

Note: The internal key is referrer but the parent is http_referer (double and single "r" respectively, the second is a typo on the HTTP specification).

Install daemonize and run access_logd to create access.log as a FIFO node, so Nginx writes to it and access_log can read from it. Check /var/log/nginx/error.log for debugging.

ACCESS_LOG_FLAGS is the env variable to pass flags to access_logd. For a working example check our Nginx container.

log_format main escape=json '{"host":"$host","msec":$msec,"server_protocol":"$server_protocol","request_method":"$request_method","request_completion":"$request_completion","uri":"$uri","query_string":"$query_string","status":$status,"sent_http_content_type":"$sent_http_content_type","sent_http_content_encoding":"$sent_http_content_encoding","sent_http_etag":"$sent_http_etag","sent_http_last_modified":"$sent_http_last_modified","http_accept":"$http_accept","http_accept_encoding":"$http_accept_encoding","http_accept_language":"$http_accept_language","http_pragma":"$http_pragma","http_cache_control":"$http_cache_control","http_if_none_match":"$http_if_none_match","http_dnt":"$http_dnt","http_user_agent":"$http_user_agent","http_origin":"$http_origin","http_referer":{"origin":"$http_origin","referrer":"$http_referer","policy":"origin-when-cross-origin"},"request_time":$request_time,"bytes_sent":$bytes_sent,"body_bytes_sent":$body_bytes_sent,"request_length":$request_length,"http_connection":"$http_connection","pipe":"$pipe","connection_requests":$connection_requests,"geoip2_data_country_name":"$geoip2_data_country_name","geoip2_data_city_name":"$geoip2_data_city_name","ssl_server_name":"$ssl_server_name","ssl_protocol":"$ssl_protocol","ssl_early_data":"$ssl_early_data","ssl_session_reused":"$ssl_session_reused","ssl_curves":"$ssl_curves","ssl_ciphers":"$ssl_ciphers","ssl_cipher":"$ssl_cipher","sent_http_x_xss_protection":"$sent_http_x_xss_protection","sent_http_x_frame_options":"$sent_http_x_frame_options","sent_http_x_content_type_options":"$sent_http_x_content_type_options","sent_http_strict_transport_security":"$sent_http_strict_transport_security","nginx_version":"$nginx_version","pid":"$pid","remote_user":""}';

access_log /var/log/nginx/access.log main;

Crawler user agents

Download the crawler user agents database and feed it as argument to access_log. It'll try to detect if a UA belongs to a web crawler.

TODO

  • Make some fields optional
Description
No description provided
Readme 210 KiB
Languages
Crystal 93.7%
PLpgSQL 3.5%
Shell 1.3%
Makefile 0.9%
Dockerfile 0.6%