Add detailed logging & log export #90

Open
opened 2022-03-26 13:09:40 +00:00 by glyph · 5 comments
Owner

I need to update peach-web in order to emit detailed logs at multiple levels (debug, info, warn and error).

I'd also like to learn how to neatly export the logs (using journalctl) so testers and operators can easily send them to use for debugging and analysis.

The ideal implementation might be a 'Download Logs' button in the Administrator Settings menu (or similar settings menu). Clicking the button would result in the following backend flow: copy the go-sbot and peach-web logs into a temporary directory, compress them using zip or similar, then return the file in the response.

I need to update `peach-web` in order to emit detailed logs at multiple levels (`debug`, `info`, `warn` and `error`). I'd also like to learn how to neatly export the logs (using `journalctl`) so testers and operators can easily send them to use for debugging and analysis. The ideal implementation might be a 'Download Logs' button in the Administrator Settings menu (or similar settings menu). Clicking the button would result in the following backend flow: copy the go-sbot and peach-web logs into a temporary directory, compress them using `zip` or similar, then return the file in the response.
glyph added the
peach-web
label 2022-03-26 13:09:40 +00:00
glyph changed title from Add detailed logging to Add detailed logging & log export 2022-03-26 13:28:38 +00:00
glyph added the
enhancement
label 2022-03-26 13:31:05 +00:00
Owner

I'm not sure if this is the right place to write this, but a related issue we discussed is how to surface error messages to the User (and developer) in a more helpful way.

I was just debugging some errors that came up in the peach-web interface while running in the yunohost deployment and I found the error messages were not particularly helpful:

For example an error message that said "Write error: unable to write /var/lib/peachcloud/config.yml"

It turned out this error wasn't actually coming from the code that writes the config, but somewhere else which was incorrectly being passed the path to the config file.

Intuitively, what would be great is if an error would tell you where it actually comes from in the code, or has a unique string in the error message, so we can easily search for that unique string in the error message and find where in the code it comes from. I imagine this is perhaps what some of the earlier error libraries we were using were providing -- a way to easily also provide a context string when we throw an error... something like
my_function().context("this is a unique string where this error comes from")?
... instead of just the question mark with no context.

we might be able to achieve something like this using .map_err, but this would also require adding a "context" String field (or something like this) to everyone one of our error types, and maybe a fair amount of boilerplate of typing .map_err and constructing the whole error object. Maybe this is what some of the error libraries are streamlining? I'm not sure.

Perhaps another way to achieve this clarity about where errors come from, is if there is some way to include a stacktrace or dump in Rust error printing in a debug mode that somehow includes the line of which file from which the error comes from. For instance in Django in python you can enable Debug mode, and then whenever you encounter an error in the web application (during testing) it will print a stack trace showing you exactly where the error comes from. The only reason I know of not to include these stack traces in production is they are a potential security vulnerability if variables which weren't supposed to be exposed are accidentally exposed through the stack trace.

So in short my ideas about errors:

  • if all of our errors contained a custom context (to provide unique text showing where they come from)
  • if when we display the error, we also show the error chain, or stack trace, showing a line number where the error comes from

or maybe there are additional ways to make our error messages more useful?

there are also places where I'm seeing 500 internal server error, which is even less informative. I imagine from the code reaching a panic, or not having a place to bubble it up to the user.

Tldr; cryptic errors could take a lot of our time to find the source of, especially when they are coming from users machines which are not our own.

@glyph

I'm not sure if this is the right place to write this, but a related issue we discussed is how to surface error messages to the User (and developer) in a more helpful way. I was just debugging some errors that came up in the peach-web interface while running in the yunohost deployment and I found the error messages were not particularly helpful: For example an error message that said "Write error: unable to write /var/lib/peachcloud/config.yml" It turned out this error wasn't actually coming from the code that writes the config, but somewhere else which was incorrectly being passed the path to the config file. Intuitively, what would be great is if an error would tell you where it actually comes from in the code, or has a unique string in the error message, so we can easily search for that unique string in the error message and find where in the code it comes from. I imagine this is perhaps what some of the earlier error libraries we were using were providing -- a way to easily also provide a context string when we throw an error... something like ```my_function().context("this is a unique string where this error comes from")?``` ... instead of just the question mark with no context. we might be able to achieve something like this using .map_err, but this would also require adding a "context" String field (or something like this) to everyone one of our error types, and maybe a fair amount of boilerplate of typing .map_err and constructing the whole error object. Maybe this is what some of the error libraries are streamlining? I'm not sure. Perhaps another way to achieve this clarity about where errors come from, is if there is some way to include a stacktrace or dump in Rust error printing in a debug mode that somehow includes the line of which file from which the error comes from. For instance in Django in python you can enable Debug mode, and then whenever you encounter an error in the web application (during testing) it will print a stack trace showing you exactly where the error comes from. The only reason I know of not to include these stack traces in production is they are a potential security vulnerability if variables which weren't supposed to be exposed are accidentally exposed through the stack trace. So in short my ideas about errors: - if all of our errors contained a custom context (to provide unique text showing where they come from) - if when we display the error, we also show the error chain, or stack trace, showing a line number where the error comes from or maybe there are additional ways to make our error messages more useful? there are also places where I'm seeing 500 internal server error, which is even less informative. I imagine from the code reaching a panic, or not having a place to bubble it up to the user. Tldr; cryptic errors could take a lot of our time to find the source of, especially when they are coming from users machines which are not our own. @glyph
Owner

just found this article which I found informative: https://nick.groenen.me/posts/rust-error-handling/

with some info on context for anyhow and use of RUST_BACKTRACE

just found this article which I found informative: https://nick.groenen.me/posts/rust-error-handling/ with some info on context for anyhow and use of RUST_BACKTRACE
Owner

this comment mentioned also has an interesting reason to not use thiserror, and manually write errors like we have been

but maybe anyhow would still be useful for context and cause chains, even if don't use thiserror
https://www.reddit.com/r/rust/comments/gj8inf/comment/fqlmknt/?utm_source=reddit&utm_medium=web2x&context=3

this comment mentioned also has an interesting reason to not use thiserror, and manually write errors like we have been but maybe anyhow would still be useful for context and cause chains, even if don't use thiserror https://www.reddit.com/r/rust/comments/gj8inf/comment/fqlmknt/?utm_source=reddit&utm_medium=web2x&context=3
Author
Owner

@notplants

I see logging and error reporting as two separate but related problems to solve.

I agree that our error messages need improvement, both to improve the clarity of what went wrong and to assign some sort of unique contextual fingerprint so we know where the error came from. The debugging / stacktrace information you're describing should not be shown during normal user error reporting, in my opinion, but should be available as part of the log export feature. That provides two layers of detail: a concise error message that can be easily communicated and a full-on, in-depth report of all the details.

In terms of improved error reporting and knowing the source of an error, I'd suggest we first see if we can improve our custom error implementation:

https://doc.rust-lang.org/std/error/trait.Error.html

It could be that we're not making best use of the error information we already have at our disposal when doing the printing / formatting. This is still a learning area for me so I'm sure I've done some of it in a suboptimal way.

@notplants I see logging and error reporting as two separate but related problems to solve. I agree that our error messages need improvement, both to improve the clarity of what went wrong and to assign some sort of unique contextual fingerprint so we know where the error came from. The debugging / stacktrace information you're describing should not be shown during normal user error reporting, in my opinion, but should be available as part of the log export feature. That provides two layers of detail: a concise error message that can be easily communicated and a full-on, in-depth report of all the details. In terms of improved error reporting and knowing the source of an error, I'd suggest we first see if we can improve our custom error implementation: https://doc.rust-lang.org/std/error/trait.Error.html It could be that we're not making best use of the error information we already have at our disposal when doing the printing / formatting. This is still a learning area for me so I'm sure I've done some of it in a suboptimal way.
Author
Owner

A great case-study of debugging in action, for which detailed log export would be extremely useful:

#134

A great case-study of debugging in action, for which detailed log export would be extremely useful: https://git.coopcloud.tech/PeachCloud/peach-workspace/issues/134
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: PeachCloud/peach-workspace#90
No description provided.