Add a nicer fallback page to Traefik #115

Open
opened 2021-08-11 13:49:11 +00:00 by 3wordchant · 12 comments
Owner

Currently, once you've successfully deployed Traefik, unless you've enabled the dashboard (and sometimes, even if you have), the default "is it up" check fails.

Then, all URLs, except the Traefik dashboard (if it's configured) give SSL errors.

Wondering if it's possible to have a default page up as a fallback? Could be tricky with SSL.

Thanks to @timewarp for the report

Currently, once you've successfully deployed Traefik, unless you've enabled the dashboard (and sometimes, even if you have), the default "is it up" check fails. Then, all URLs, except the Traefik dashboard (if it's configured) give SSL errors. Wondering if it's possible to have a default page up as a fallback? Could be tricky with SSL. Thanks to @timewarp for the report
3wordchant added the
question
label 2021-08-11 13:49:24 +00:00
Owner

Yes! Would be such a huge usability win.

First guess: additional nginx "sidecar" service running inside the traefik stack which holds the root domain with a nice "this is co-op cloud" page (like Cloudron?). This would be disabled if you run compose.headless.yml with traefik. It looks like we can set a priority to a rule (see docs) so the root domain could always be overruled by a "real" app?

Yes! Would be such a huge usability win. First guess: additional nginx "sidecar" service running inside the traefik stack which holds the root domain with a nice "this is co-op cloud" page (like Cloudron?). This would be disabled if you run `compose.headless.yml` with traefik. It looks like we can set a priority to a rule (see [docs](https://doc.traefik.io/traefik/routing/routers/#priority)) so the root domain could always be overruled by a "real" app?
decentral1se added this to the (deleted) milestone 2021-09-08 12:58:05 +00:00
decentral1se added this to the UI / UX testing milestone 2021-09-09 14:26:35 +00:00
decentral1se added this to the Beta release (software) project 2021-10-20 15:42:22 +00:00
Owner
- https://imandrea.me/blog/traefik-custom-404 - https://github.com/traefik/traefik/issues/4218 - https://github.com/tarampampam/error-pages
decentral1se added
enhancement
and removed
question
labels 2021-12-31 15:40:02 +00:00
decentral1se removed this from the UI / UX testing milestone 2022-02-09 17:48:57 +00:00
decentral1se removed this from the Beta release (software) project 2022-02-09 17:49:00 +00:00
decentral1se added this to the (deleted) project 2022-06-17 09:41:24 +00:00
Member

@decentral1se did you ever get the custom error pages working? I made an attempt at a805f5de26 (not working yet) before finding this. Would be useful to be able to display nicer errors when an app is down...

@decentral1se did you ever get the custom error pages working? I made an attempt at https://git.coopcloud.tech/coop-cloud/traefik/commit/a805f5de265a4cc5ff8715f0cd7e3be00d033087 (not working yet) before finding this. Would be useful to be able to display nicer errors when an app is down...
Owner

@mayel didn't, sadly. Please let us know if you get anywhere. Would be amazing to have this sorted.

@mayel didn't, sadly. Please let us know if you get anywhere. Would be amazing to have this sorted.
Author
Owner

I also tried & failed. Maybe it'd be easier with Caddy? #388

I also tried & failed. Maybe it'd be easier with Caddy? #388
Member

found a way with caddy to display an error when an app is down, guess it needs to be translated to use with docker labels:

handle_errors {
    @502 expression `{http.error.status_code} == 502`
    handle @502 {
      respond 503 {
        body "Hello, unfortunately this instance seems to be down. Please try again in a few minutes!"
        close
      }
    }
}

For traefik maybe this plugin can be used: https://github.com/jdel/staticresponse

found a way with caddy to display an error when an app is down, guess it needs to be translated to use with docker labels: ``` handle_errors { @502 expression `{http.error.status_code} == 502` handle @502 { respond 503 { body "Hello, unfortunately this instance seems to be down. Please try again in a few minutes!" close } } } ``` For traefik maybe this plugin can be used: https://github.com/jdel/staticresponse
Member

Here's another attempt using that plugin, not sure what I'm missing to get it working: https://git.coopcloud.tech/coop-cloud/traefik/src/branch/error-messages-attempt/compose.error-pages.yml

Here's another attempt using that plugin, not sure what I'm missing to get it working: https://git.coopcloud.tech/coop-cloud/traefik/src/branch/error-messages-attempt/compose.error-pages.yml
Author
Owner

Some victories! 🎉

Working:

This is using the same tarampampam/error-pages image as Mayel's first attempt.

@mayel, what @decentral1se and I found is that Traefik's custom error pages aren't used for anything unless the error middleware is assigned to an entrypoint or a router. This, of course, isn't mentioned in their documentation 🤬

It's possible to add an error middleware to the web-secure entrypoint (using e.g. entrypoints.web-secure.http.middlewares in file-provider.yml BUT this has the very unfortunate side-effect of overriding all app error pages, including ones that are meant to be machine-readable, or convey app-specific useful information.

It feels like Traefik should have a way to say "only override Traefik's own bare-bones default error pages", but I'm pretty sure it doesn't (as per plaintive post on Traefik forums with zero replies in 2 years).

So, I found a custom Traefik plugin called traefik-error-pages which allows conditionally overriding errors only if they're blank, hacked it to make showing errors conditional on an error response body matching specified text, and configured it to only over-ride 502s which exactly match "Bad Gateway".

This is a bit of a cursed roundabout way of targetting Traefik's built-in error (there are no Traefik-specific headers or other content which would allow but if it matches anything else then it's guaranteed to not be JSON or be including any more useful details.

404s are handled using the same low-priority-router approach from Mayel's attempts.

What's not yet working is showing a nice page instead of SSL errors (e.g. when Traefik is waiting for a newly-deployed app's healthcheck to pass, or when someone visits a random nonexistent domain/subdomain that points to the Co-op Cloud server).

My best suggestion for waiting-for-healthcheck apps is having a separate daemon with access to the Docker socket that listens for health: starting apps and creates temporary services with low-priority Traefik routing rules to display errors. I think it might be worth breaking this out into a separate ticket.

I can't think of any reasonable way of solving "random nonexistent domain/subdomain" right now 🤔 Even switching to Caddy wouldn't be a super-easy fix; the project that I'm involved in that switched to using on-demand SSL started facing rate-limiting issues from ZeroSSL / LetsEncrypt pretty quickly, due to folks scanning for subdomains. Another separate ticket for this case, maybe, or declaring defeat on it for now?

Last steps before closing this ticket, maybe:

  • PR traefik-error-page changes upstream (@decentral1se pls halp with code review if you have a sec? no idea what I'm doing with golang, etc)
  • Improve error pages to be more co-op-cloud-specific (maybe switch to jdel/staticresponse)
  • Mourn the loss of so much of our lives on this ticket 🙃

Mega-thanks @mayel and @decentral1se for helping push on this 💪

Some victories! 🎉 Working: - Replacing default 502 page (e.g. when an app without a healthcheck is starting). Example: https://container.stream-test.coopcloud.tech/ (app with a deliberately-misconfigured traefik router pointing at a port it's not really listening on) - Replacing default 404 page (e.g. when an app has been undeployed). Example: https://usedtoexist.stream-test.coopcloud.tech/ - Apps' own error pages aren't overridden. Example: https://whoami.stream-test.coopcloud.tech/health This is using the same `tarampampam/error-pages` image as Mayel's first attempt. @mayel, what @decentral1se and I found is that Traefik's custom error pages aren't used for anything unless the error `middleware` is assigned to an entrypoint or a router. This, of course, isn't mentioned in [their documentation](https://doc.traefik.io/traefik/middlewares/http/errorpages/) 🤬 It's possible to add an error `middleware` to the `web-secure` entrypoint (using e.g. `entrypoints.web-secure.http.middlewares` in `file-provider.yml` BUT this has the very unfortunate side-effect of overriding all app error pages, including ones that are meant to be machine-readable, or convey app-specific useful information. It feels like Traefik should have a way to say "only override Traefik's own bare-bones default error pages", but I'm pretty sure it doesn't (as per [plaintive post on Traefik forums with zero replies in 2 years](https://community.traefik.io/t/distinguish-between-http-error-code-generated-by-traefik-itself-or-by-service/13362)). So, I found [a custom Traefik plugin called `traefik-error-pages`](https://plugins.traefik.io/plugins/6569fc07ce37949adf28307f/error-pages) which allows conditionally overriding errors only if they're blank, [hacked it](https://github.com/3-w-c/traefik-error-page) to make showing errors conditional on an error response body matching specified text, and configured it to only over-ride 502s which exactly match "Bad Gateway". This is a bit of a cursed roundabout way of targetting Traefik's built-in error (there are no Traefik-specific headers or other content which would allow but if it matches anything else then it's guaranteed to not be JSON or be including any more useful details. 404s are handled using the same low-priority-router approach from Mayel's attempts. **What's not yet working** is showing a nice page instead of SSL errors (e.g. when Traefik is waiting for a newly-deployed app's healthcheck to pass, or when someone visits a random nonexistent domain/subdomain that points to the Co-op Cloud server). My best suggestion for waiting-for-healthcheck apps is having a separate daemon with access to the Docker socket that listens for `health: starting` apps and creates temporary services with low-priority Traefik routing rules to display errors. I think it might be worth breaking this out into a separate ticket. I can't think of any reasonable way of solving "random nonexistent domain/subdomain" right now 🤔 Even switching to Caddy wouldn't be a super-easy fix; the project that I'm involved in that switched to using on-demand SSL started facing rate-limiting issues from ZeroSSL / LetsEncrypt pretty quickly, due to folks scanning for subdomains. Another separate ticket for this case, maybe, or declaring defeat on it for now? Last steps before closing this ticket, maybe: - [ ] PR `traefik-error-page` changes upstream (@decentral1se pls halp with code review if you have a sec? no idea what I'm doing with golang, etc) - [ ] Improve error pages to be more co-op-cloud-specific (maybe switch to `jdel/staticresponse`) - [ ] Mourn the loss of so much of our lives on this ticket 🙃 Mega-thanks @mayel and @decentral1se for helping push on this 💪
3wordchant referenced this issue from a commit 2024-04-01 02:49:26 +00:00
Owner

Unbelievable plumbing work here @3wordchant 👷‍♀️

What's not yet working is showing a nice page instead of SSL errors (e.g. when Traefik is waiting for a newly-deployed app's healthcheck to pass ... My best suggestion for waiting-for-healthcheck apps is having a separate daemon with access to the Docker socket that listens for health: starting apps and creates temporary services with low-priority Traefik routing rules to display errors. I think it might be worth breaking this out into a separate ticket.

I can find literally 0 docs on this but I thought that traefik will wait for the healthcheck to work (status: healthy) before trying to auto-tls. when the healthcheck is starting, it's 404. when the healthcheck is up and it's auto-tls'in, it's 301? could we also hijack these codes inside the traefik middleware?

I can't think of any reasonable way of solving "random nonexistent domain/subdomain" right now 🤔 Even switching to Caddy wouldn't be a super-easy fix; the project that I'm involved in that switched to using on-demand SSL started facing rate-limiting issues from ZeroSSL / LetsEncrypt pretty quickly, due to folks scanning for subdomains. Another separate ticket for this case, maybe, or declaring defeat on it for now?

Is this not also 301 response hijacking?

I can't believe I'm saying this but I could co-hack on this again one day soon...

Unbelievable plumbing work here @3wordchant 👷‍♀️ > What's not yet working is showing a nice page instead of SSL errors (e.g. when Traefik is waiting for a newly-deployed app's healthcheck to pass ... My best suggestion for waiting-for-healthcheck apps is having a separate daemon with access to the Docker socket that listens for health: starting apps and creates temporary services with low-priority Traefik routing rules to display errors. I think it might be worth breaking this out into a separate ticket. I can find literally 0 docs on this but I thought that traefik will wait for the healthcheck to work (status: healthy) before trying to auto-tls. when the healthcheck is starting, it's 404. when the healthcheck is up and it's auto-tls'in, it's 301? could we also hijack these codes inside the traefik middleware? > I can't think of any reasonable way of solving "random nonexistent domain/subdomain" right now 🤔 Even switching to Caddy wouldn't be a super-easy fix; the project that I'm involved in that switched to using on-demand SSL started facing rate-limiting issues from ZeroSSL / LetsEncrypt pretty quickly, due to folks scanning for subdomains. Another separate ticket for this case, maybe, or declaring defeat on it for now? Is this not also 301 response hijacking? I can't believe I'm saying this but I could co-hack on this again one day soon...
Author
Owner

I can find literally 0 docs on this but I thought that traefik will wait for the healthcheck to work (status: healthy) before trying to auto-tls. when the healthcheck is starting, it's 404. when the healthcheck is up and it's auto-tls'in, it's 301? could we also hijack these codes inside the traefik middleware?

As an example, if someone runs abra app deploy a new gitea instance (the recipe has healthcheck on app) and hammers F5 in a browser on https://git.example.com, they'll see:

  1. SSL error (example: https://foobar.coopcloud.tech/)
  2. 200, app is up (once status: healthy)

Or a new wallabag (no healthcheck), hitting https://wallabag.example.com

  1. SSL error
  2. 502 (once Traefik sees the app and generates the cert, but app is still launching)
  3. 200 (app is up)

I'm not aware of a situation where it's 301, unless you mean 301 from http to https, in which case I think Traefik will currently do that conditionally on all URLS (e.g. http://foobar.coopcloud.tech). Halp?

I can't believe I'm saying this but I could co-hack on this again one day soon...

🤯

> I can find literally 0 docs on this but I thought that traefik will wait for the healthcheck to work (status: healthy) before trying to auto-tls. when the healthcheck is starting, it's 404. when the healthcheck is up and it's auto-tls'in, it's 301? could we also hijack these codes inside the traefik middleware? As an example, if someone runs `abra app deploy` a new `gitea` instance (the recipe has healthcheck on `app`) and hammers F5 in a browser on https://git.example.com, they'll see: 1. SSL error (example: https://foobar.coopcloud.tech/) 2. 200, app is up (once `status: healthy`) Or a new `wallabag` (no healthcheck), hitting https://wallabag.example.com 1. SSL error 2. 502 (once Traefik sees the app and generates the cert, but app is still launching) 3. 200 (app is up) I'm not aware of a situation where it's 301, unless you mean 301 from `http` to `https`, in which case I think Traefik will currently do that conditionally on all URLS (e.g. http://foobar.coopcloud.tech). Halp? > I can't believe I'm saying this but I could co-hack on this again one day soon... 🤯
3wordchant referenced this issue from a commit 2024-04-02 01:56:53 +00:00
Owner

Fack this is gruelling. Thanks for that explanation. It does seem like this is just a real time sink and we could try to zoom back out again and do a design sprint on a "web app" portal type app which can provide a lot more information and be the stepping stone to the actual web interface. Because we could speak to traefik via the API or even just inspect the swarm ourselves for information about what's going on... to discuss!

Fack this is gruelling. Thanks for that explanation. It does seem like this is just a real time sink and we could try to zoom back out again and do a design sprint on a "web app" portal type app which can provide a lot more information and be the stepping stone to the actual web interface. Because we could speak to traefik via the API or even just inspect the swarm ourselves for information about what's going on... to discuss!
Author
Owner

inspect the swarm ourselves for information about what's going o

Yeah, this is the only way I can think of to be able to do anything except an SSL error while an app is deploying.

I still think it's worth finishing off this ticket to abolish the unstyled confusing Traefik errors that pop up, though the SSL errors definitely seem like what people are more likely to run into more often.

> inspect the swarm ourselves for information about what's going o Yeah, this is the only way I can think of to be able to do anything except an SSL error while an app is deploying. I still think it's worth finishing off this ticket to abolish the unstyled confusing Traefik errors that pop up, though the SSL errors definitely seem like what people are more likely to run into more often.
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: coop-cloud/organising#115
No description provided.