192 lines
12 KiB
Markdown
192 lines
12 KiB
Markdown
# access_log
|
|
|
|
Receives access logs on a UNIX socket in JSON format and stores them on
|
|
a database. It **intentionally** doesn't collect IP addresses. It
|
|
doesn't respect the Do Not Track (DNT) header though, because we're not
|
|
collecting personally identifiable data. Referrer collection is
|
|
optional but we **strongly** suggest using a referrer policy that
|
|
doesn't collect full addresses.
|
|
|
|
See the [Rails
|
|
migration](https://0xacab.org/sutty/sutty/blob/rails/db/migrate/20200118155319_create_access_log.rb)
|
|
for the database schema, the [Nginx
|
|
configuration](https://0xacab.org/sutty/containers/nginx/blob/master/nginx/nginx.conf),
|
|
and the [site
|
|
configuration](https://0xacab.org/sutty/ansible-sutty/blob/master/templates/sites.conf.j2).
|
|
|
|
It supports SQlite3 and PostgreSQL databases :)
|
|
|
|
## Sustainable Web Design
|
|
|
|
When enabled, you can track CO2 emissions using [Sustainable Web Design
|
|
"Calculating Digital Emissions"
|
|
method](https://sustainablewebdesign.org/calculating-digital-emissions/).
|
|
The algorithm and data are based on
|
|
[CO2.js](https://github.com/thegreenwebfoundation/co2.js).
|
|
|
|
It follows the calculations with the added --optional-- feature of using
|
|
the origin country of the visit for the "consumer device" segment.
|
|
To enable this, see Nginx configuration.
|
|
|
|
```bash
|
|
# For a datacenter using renewable energy on Costa Rica
|
|
access_log --swd --renewable --datacenter CR
|
|
```
|
|
|
|
### Average vs marginal intensity
|
|
|
|
[CO2.js explains this
|
|
better](https://developers.thegreenwebfoundation.org/co2js/data/). In
|
|
practice, using average intensity data will give lower results and
|
|
mostly use the global intensity, since the data by country is missing
|
|
most countries.
|
|
|
|
`access_log` uses marginal data by default.
|
|
|
|
## Create database
|
|
|
|
```bash
|
|
sqlite3 access_log.sqlite3 < contrib/create.sql
|
|
```
|
|
|
|
## Build
|
|
|
|
Install zlib, sqlite3 and ssl development files (it varies between
|
|
distributions).
|
|
|
|
Install Crystal and the development tools (also varies).
|
|
|
|
Run:
|
|
|
|
```bash
|
|
make
|
|
```
|
|
|
|
## Build for Alpine
|
|
|
|
```bash
|
|
make alpine-build
|
|
```
|
|
|
|
## Database configuration
|
|
|
|
Create an `access_logs` database with the following schema:
|
|
|
|
| Field | Type | Reference | Index? |
|
|
| ----- | ---- | --------- | ------ |
|
|
| id | String | UUID | Unique |
|
|
| host | String | Host name | Yes |
|
|
| msec | Float | Unix timestamp of visit | ? |
|
|
| server_protocol | String | HTTP/Version | ? |
|
|
| request_method | String | GET/POST/etc. | ? |
|
|
| request_completion | String | "OK" | ? |
|
|
| uri | String | Request | True |
|
|
| query_string | String | Arguments | ? |
|
|
| status | Integer | HTTP status | ? |
|
|
| sent_http_content_type | String | MIME type of response | ? |
|
|
| sent_http_content_encoding | String | Compression | ? |
|
|
| sent_http_etag | String | ETag header | ? |
|
|
| sent_http_last_modified | String | Last modified date | ? |
|
|
| http_accept | String | MIME types requested | ? |
|
|
| http_accept_encoding | String | Compression accepted | ? |
|
|
| http_accept_language | String | Languages supported | ? |
|
|
| http_pragma | String | Pragma header | ? |
|
|
| http_cache_control | String | Cache requested | ? |
|
|
| http_if_none_match | String | ETag requested | ? |
|
|
| http_dnt | String | Do Not Track header | ? |
|
|
| http_user_agent | String | User Agent | Yes |
|
|
| http_origin | String | Request origin | Yes |
|
|
| http_referer | String | Referer (see Referrer Policy) | Yes |
|
|
| request_time | Float | Request duration | ? |
|
|
| bytes_sent | Integer | Bytes sent | ? |
|
|
| body_bytes_sent | Integer | Bytes sent not including headers | ? |
|
|
| request_length | Integer | Headers | ? |
|
|
| http_connection | String | Connection status | ? |
|
|
| pipe | String | Connection was multiplexed | ? |
|
|
| connection_requests | Integer | Requests done on the same connection | ? |
|
|
| geoip2_data_country_name | String | Country according to GeoIP | Yes |
|
|
| geoip2_data_city_name | String | City according to GeoIP | Yes |
|
|
| ssl_server_name | String | SNI | ? |
|
|
| ssl_protocol | String | SSL/TLS version used | ? |
|
|
| ssl_early_data | String | TLSv1.3 early data used | ? |
|
|
| ssl_session_reused | String | TLS session reused | ? |
|
|
| ssl_curves | String | Curves used | ? |
|
|
| ssl_ciphers | String | Ciphers available | ? |
|
|
| ssl_cipher | String | Cipher used | ? |
|
|
| sent_http_x_xss_protection | String | XSS Protection sent | ? |
|
|
| sent_http_x_frame_options | String | Frame protection sent | ? |
|
|
| sent_http_x_content_type_options | String | Content protection sent | ? |
|
|
| sent_http_strict_transport_security | String | HSTS sent | ? |
|
|
| nginx_version | String | Server version | ? |
|
|
| pid | Integer | Server PID | ? |
|
|
| crawler | Boolean | Web crawler detected | ? |
|
|
| remote_user | String | HTTP Basic auth user | ? |
|
|
|
|
|
|
## Nginx configuration
|
|
|
|
Configure Nginx to format access log as JSON. You can configure
|
|
`http_referer.policy` as one of `unsafe-url`, `no-referrer`, `origin`,
|
|
`origin-when-cross-origin`, `same-origin`, `strict-origin`,
|
|
`strict-origin-when-cross-origin`, `no-referrer-when-downgrade`.
|
|
|
|
```json
|
|
{
|
|
"http_referer": {
|
|
"referrer": "$http_referer",
|
|
"origin": "$http_origin",
|
|
"policy": "origin-when-cross-origin"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Note:** The internal key is `referrer` but the parent is
|
|
`http_referer` (double and single "r" respectively, the second is a typo
|
|
on the HTTP specification).
|
|
|
|
Install `daemonize` and run `access_logd`. By default it creates a UNIX
|
|
socket on `/tmp/access_log.socket` so Nginx writes can write to it using
|
|
its [syslog support](https://nginx.org/en/docs/syslog.html).
|
|
|
|
Check `/var/log/nginx/error.log` for debugging.
|
|
|
|
`ACCESS_LOG_FLAGS` is the env variable to pass flags to `access_logd`.
|
|
For a working example check our [Nginx
|
|
container](https://0xacab.org/sutty/containers/nginx/).
|
|
|
|
```
|
|
log_format main escape=json '{"host":"$host","msec":$msec,"server_protocol":"$server_protocol","request_method":"$request_method","request_completion":"$request_completion","uri":"$uri","query_string":"$query_string","status":$status,"sent_http_content_type":"$sent_http_content_type","sent_http_content_encoding":"$sent_http_content_encoding","sent_http_etag":"$sent_http_etag","sent_http_last_modified":"$sent_http_last_modified","http_accept":"$http_accept","http_accept_encoding":"$http_accept_encoding","http_accept_language":"$http_accept_language","http_pragma":"$http_pragma","http_cache_control":"$http_cache_control","http_if_none_match":"$http_if_none_match","http_dnt":"$http_dnt","http_user_agent":"$http_user_agent","http_origin":"$http_origin","http_referer":{"origin":"$http_origin","referrer":"$http_referer","policy":"origin-when-cross-origin"},"request_time":$request_time,"bytes_sent":$bytes_sent,"body_bytes_sent":$body_bytes_sent,"request_length":$request_length,"http_connection":"$http_connection","pipe":"$pipe","connection_requests":$connection_requests,"geoip2_data_country_name":"$geoip2_data_country_name","geoip2_data_city_name":"$geoip2_data_city_name","ssl_server_name":"$ssl_server_name","ssl_protocol":"$ssl_protocol","ssl_early_data":"$ssl_early_data","ssl_session_reused":"$ssl_session_reused","ssl_curves":"$ssl_curves","ssl_ciphers":"$ssl_ciphers","ssl_cipher":"$ssl_cipher","sent_http_x_xss_protection":"$sent_http_x_xss_protection","sent_http_x_frame_options":"$sent_http_x_frame_options","sent_http_x_content_type_options":"$sent_http_x_content_type_options","sent_http_strict_transport_security":"$sent_http_strict_transport_security","nginx_version":"$nginx_version","pid":"$pid","remote_user":""}';
|
|
|
|
access_log syslog=unix:/tmp/access_log.socket,nohostname main;
|
|
```
|
|
|
|
### Add origin country of visit to SWD
|
|
|
|
Add a `$geoip2_data_country_iso_code` variable on Nginx and the
|
|
corresponding variable to the JSON log format.
|
|
|
|
```nginx
|
|
geoip2 /usr/share/GeoIP/GeoLite2-Country.mmdb {
|
|
$geoip2_data_country_iso_code country iso_code;
|
|
}
|
|
|
|
log_format main escape=json '{"host":"$host","msec":$msec,"server_protocol":"$server_protocol","request_method":"$request_method","request_completion":"$request_completion","uri":"$uri","query_string":"$query_string","status":$status,"sent_http_content_type":"$sent_http_content_type","sent_http_content_encoding":"$sent_http_content_encoding","sent_http_etag":"$sent_http_etag","sent_http_last_modified":"$sent_http_last_modified","http_accept":"$http_accept","http_accept_encoding":"$http_accept_encoding","http_accept_language":"$http_accept_language","http_pragma":"$http_pragma","http_cache_control":"$http_cache_control","http_if_none_match":"$http_if_none_match","http_dnt":"$http_dnt","http_user_agent":"$http_user_agent","http_origin":"$http_origin","http_referer":{"origin":"$http_origin","referrer":"$http_referer","policy":"origin-when-cross-origin"},"request_time":$request_time,"bytes_sent":$bytes_sent,"body_bytes_sent":$body_bytes_sent,"request_length":$request_length,"http_connection":"$http_connection","pipe":"$pipe","connection_requests":$connection_requests,"geoip2_data_country_name":"$geoip2_data_country_name","geoip2_data_city_name":"$geoip2_data_city_name","ssl_server_name":"$ssl_server_name","ssl_protocol":"$ssl_protocol","ssl_early_data":"$ssl_early_data","ssl_session_reused":"$ssl_session_reused","ssl_curves":"$ssl_curves","ssl_ciphers":"$ssl_ciphers","ssl_cipher":"$ssl_cipher","sent_http_x_xss_protection":"$sent_http_x_xss_protection","sent_http_x_frame_options":"$sent_http_x_frame_options","sent_http_x_content_type_options":"$sent_http_x_content_type_options","sent_http_strict_transport_security":"$sent_http_strict_transport_security","nginx_version":"$nginx_version","pid":"$pid","remote_user":"","geoip2_data_country_iso_code":"$geoip2_data_country_iso_code"}';
|
|
```
|
|
|
|
Then run the program with the required flags enabled:
|
|
|
|
```bash
|
|
access_log --swd --device-country
|
|
```
|
|
|
|
## Crawler user agents
|
|
|
|
Download the [crawler user agents
|
|
database](https://github.com/monperrus/crawler-user-agents) and feed it
|
|
as argument to `access_log`. It'll try to detect if a UA belongs to
|
|
a web crawler.
|
|
|
|
## TODO
|
|
|
|
* [ ] Make some fields optional
|