Add detailed logging & log export #90

New Issue

glyph · 2022-03-26T13:09:40Z

glyph commented

2022-03-26 13:09:40 +00:00

I need to update peach-web in order to emit detailed logs at multiple levels (debug, info, warn and error).

I'd also like to learn how to neatly export the logs (using journalctl) so testers and operators can easily send them to use for debugging and analysis.

The ideal implementation might be a 'Download Logs' button in the Administrator Settings menu (or similar settings menu). Clicking the button would result in the following backend flow: copy the go-sbot and peach-web logs into a temporary directory, compress them using zip or similar, then return the file in the response.

I need to update `peach-web` in order to emit detailed logs at multiple levels (`debug`, `info`, `warn` and `error`). I'd also like to learn how to neatly export the logs (using `journalctl`) so testers and operators can easily send them to use for debugging and analysis. The ideal implementation might be a 'Download Logs' button in the Administrator Settings menu (or similar settings menu). Clicking the button would result in the following backend flow: copy the go-sbot and peach-web logs into a temporary directory, compress them using `zip` or similar, then return the file in the response.

glyph added the

peach-web

label 2022-03-26 13:09:40 +00:00

glyph changed title from ~~Add detailed logging~~ to Add detailed logging & log export

2022-03-26 13:28:38 +00:00

glyph added the

enhancement

label 2022-03-26 13:31:05 +00:00

notplants commented

2022-05-20 14:47:24 +00:00

I'm not sure if this is the right place to write this, but a related issue we discussed is how to surface error messages to the User (and developer) in a more helpful way.

I was just debugging some errors that came up in the peach-web interface while running in the yunohost deployment and I found the error messages were not particularly helpful:

For example an error message that said "Write error: unable to write /var/lib/peachcloud/config.yml"

It turned out this error wasn't actually coming from the code that writes the config, but somewhere else which was incorrectly being passed the path to the config file.

Intuitively, what would be great is if an error would tell you where it actually comes from in the code, or has a unique string in the error message, so we can easily search for that unique string in the error message and find where in the code it comes from. I imagine this is perhaps what some of the earlier error libraries we were using were providing -- a way to easily also provide a context string when we throw an error... something like
my_function().context("this is a unique string where this error comes from")?
... instead of just the question mark with no context.

we might be able to achieve something like this using .map_err, but this would also require adding a "context" String field (or something like this) to everyone one of our error types, and maybe a fair amount of boilerplate of typing .map_err and constructing the whole error object. Maybe this is what some of the error libraries are streamlining? I'm not sure.

Perhaps another way to achieve this clarity about where errors come from, is if there is some way to include a stacktrace or dump in Rust error printing in a debug mode that somehow includes the line of which file from which the error comes from. For instance in Django in python you can enable Debug mode, and then whenever you encounter an error in the web application (during testing) it will print a stack trace showing you exactly where the error comes from. The only reason I know of not to include these stack traces in production is they are a potential security vulnerability if variables which weren't supposed to be exposed are accidentally exposed through the stack trace.

So in short my ideas about errors:

if all of our errors contained a custom context (to provide unique text showing where they come from)
if when we display the error, we also show the error chain, or stack trace, showing a line number where the error comes from

or maybe there are additional ways to make our error messages more useful?

there are also places where I'm seeing 500 internal server error, which is even less informative. I imagine from the code reaching a panic, or not having a place to bubble it up to the user.

Tldr; cryptic errors could take a lot of our time to find the source of, especially when they are coming from users machines which are not our own.

@glyph

I'm not sure if this is the right place to write this, but a related issue we discussed is how to surface error messages to the User (and developer) in a more helpful way. I was just debugging some errors that came up in the peach-web interface while running in the yunohost deployment and I found the error messages were not particularly helpful: For example an error message that said "Write error: unable to write /var/lib/peachcloud/config.yml" It turned out this error wasn't actually coming from the code that writes the config, but somewhere else which was incorrectly being passed the path to the config file. Intuitively, what would be great is if an error would tell you where it actually comes from in the code, or has a unique string in the error message, so we can easily search for that unique string in the error message and find where in the code it comes from. I imagine this is perhaps what some of the earlier error libraries we were using were providing -- a way to easily also provide a context string when we throw an error... something like ```my_function().context("this is a unique string where this error comes from")?``` ... instead of just the question mark with no context. we might be able to achieve something like this using .map_err, but this would also require adding a "context" String field (or something like this) to everyone one of our error types, and maybe a fair amount of boilerplate of typing .map_err and constructing the whole error object. Maybe this is what some of the error libraries are streamlining? I'm not sure. Perhaps another way to achieve this clarity about where errors come from, is if there is some way to include a stacktrace or dump in Rust error printing in a debug mode that somehow includes the line of which file from which the error comes from. For instance in Django in python you can enable Debug mode, and then whenever you encounter an error in the web application (during testing) it will print a stack trace showing you exactly where the error comes from. The only reason I know of not to include these stack traces in production is they are a potential security vulnerability if variables which weren't supposed to be exposed are accidentally exposed through the stack trace. So in short my ideas about errors: - if all of our errors contained a custom context (to provide unique text showing where they come from) - if when we display the error, we also show the error chain, or stack trace, showing a line number where the error comes from or maybe there are additional ways to make our error messages more useful? there are also places where I'm seeing 500 internal server error, which is even less informative. I imagine from the code reaching a panic, or not having a place to bubble it up to the user. Tldr; cryptic errors could take a lot of our time to find the source of, especially when they are coming from users machines which are not our own. @glyph

notplants commented

2022-06-01 12:17:27 +00:00

just found this article which I found informative: https://nick.groenen.me/posts/rust-error-handling/

with some info on context for anyhow and use of RUST_BACKTRACE

just found this article which I found informative: https://nick.groenen.me/posts/rust-error-handling/ with some info on context for anyhow and use of RUST_BACKTRACE

notplants commented

2022-06-01 12:20:43 +00:00

this comment mentioned also has an interesting reason to not use thiserror, and manually write errors like we have been

but maybe anyhow would still be useful for context and cause chains, even if don't use thiserror
https://www.reddit.com/r/rust/comments/gj8inf/comment/fqlmknt/?utm_source=reddit&utm_medium=web2x&context=3

this comment mentioned also has an interesting reason to not use thiserror, and manually write errors like we have been but maybe anyhow would still be useful for context and cause chains, even if don't use thiserror https://www.reddit.com/r/rust/comments/gj8inf/comment/fqlmknt/?utm_source=reddit&utm_medium=web2x&context=3

glyph commented

2022-06-08 07:55:14 +00:00

@notplants

I see logging and error reporting as two separate but related problems to solve.

I agree that our error messages need improvement, both to improve the clarity of what went wrong and to assign some sort of unique contextual fingerprint so we know where the error came from. The debugging / stacktrace information you're describing should not be shown during normal user error reporting, in my opinion, but should be available as part of the log export feature. That provides two layers of detail: a concise error message that can be easily communicated and a full-on, in-depth report of all the details.

In terms of improved error reporting and knowing the source of an error, I'd suggest we first see if we can improve our custom error implementation:

https://doc.rust-lang.org/std/error/trait.Error.html

It could be that we're not making best use of the error information we already have at our disposal when doing the printing / formatting. This is still a learning area for me so I'm sure I've done some of it in a suboptimal way.

@notplants I see logging and error reporting as two separate but related problems to solve. I agree that our error messages need improvement, both to improve the clarity of what went wrong and to assign some sort of unique contextual fingerprint so we know where the error came from. The debugging / stacktrace information you're describing should not be shown during normal user error reporting, in my opinion, but should be available as part of the log export feature. That provides two layers of detail: a concise error message that can be easily communicated and a full-on, in-depth report of all the details. In terms of improved error reporting and knowing the source of an error, I'd suggest we first see if we can improve our custom error implementation: https://doc.rust-lang.org/std/error/trait.Error.html It could be that we're not making best use of the error information we already have at our disposal when doing the printing / formatting. This is still a learning area for me so I'm sure I've done some of it in a suboptimal way.

glyph commented

2022-09-07 10:52:00 +00:00

A great case-study of debugging in action, for which detailed log export would be extremely useful:

#134

A great case-study of debugging in action, for which detailed log export would be extremely useful: https://git.coopcloud.tech/PeachCloud/peach-workspace/issues/134

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: PeachCloud/peach-workspace#90