Borked PR: Add get_latest_about_message #7

Closed
notplants wants to merge 6 commits from get-latest-about-message into kvt
Owner

This PR adds a few related things:

  • SsbMessageContentType, an enum representing the different content types. This is used for type comparison, so that we don't have to do string matching (which I'm told is an anti-pattern) and can instead check that a message is a particular type by comparing enums. I also added a is_message_type helper function to help with this, which tests if a message is a particular type and simply returns a bool (this can be used for filtering).
  • get_about_messages function which uses get_subset underneath and a manual filter (I left some todos. The manual filter is just temporary so we can test out some other stuff until the sbot regression is fixed).
  • get_latest_about_message, a function which takes in a &str key, and returns the latest value of an about message with that key. this function actually nicely gets to take advantage of the unstructured content field, using the "get" method of Value and supplying the key name. I found this ended up with something pretty elegant, as then you can write a get_name and get_description function (etc.) with simple one liners. On the other hand, using typed data structures, I had to write something like this:
/// Get the latest description for a particular user from their about messages.
pub async fn get_description(&mut self, ssb_id: &str) -> Result<Option<String>, GolgiError> {
        // vector of about messages with most recent at the front of the vector
        let about_messages = self.get_about_messages(ssb_id).await?;
        // iterate through the vector looking for an about message with a description
        // the first one we find is th emost recnet
        for msg in about_messages {
            let about_message: SsbMessageContent = msg.into_ssb_message_content()?;
            match about_message {
                SsbMessageContent::About{description, ..} => {
                    match description {
                        Some(description_text) => {
                            return Ok(Some(description_text))
                        },
                        None => {
                            continue
                        }
                    }
                },
                _ => {}
            }
        }
        // if no about message with a description was found, then return the empty string
        Ok(None)
}

As you can see in the above, I was able to take advantage of the serde deserialization via msg.into_ssb_message_content, but then I think to use the fields of a particular variant, you still need to do a match back down into that particular variant... which is actually quite a lot of code compared to, msg.content.get("description"), and would require being written separately for each particular field. I think I prefer the generic get_latest_about_message function making use of Value.get.

  • I also added a publish_name function (so I could test publish and getting names and descriptions)

One thing I'm curious about, is how we will efficiently get all the fields needed for a profile. In the most inefficient way, we could call functions like this for each of the fields we need. But it would probably be more efficient to do one call to go-sbot get_subset, and then filter that result multiple times in rust, to get all the fields we need. However that could also be built separately and additionally to these individual field-getter functions, which still seem like nice functions to have.

This PR is on top of the previous PR, and is an experiment to see how this feels, but I think its also ready for code review or further thoughts.

This PR adds a few related things: - SsbMessageContentType, an enum representing the different content types. This is used for type comparison, so that we don't have to do string matching (which I'm told is an anti-pattern) and can instead check that a message is a particular type by comparing enums. I also added a is_message_type helper function to help with this, which tests if a message is a particular type and simply returns a bool (this can be used for filtering). - get_about_messages function which uses get_subset underneath and a manual filter (I left some todos. The manual filter is just temporary so we can test out some other stuff until the sbot regression is fixed). - get_latest_about_message, a function which takes in a &str key, and returns the latest value of an about message with that key. this function actually nicely gets to take advantage of the unstructured content field, using the "get" method of Value and supplying the key name. I found this ended up with something pretty elegant, as then you can write a get_name and get_description function (etc.) with simple one liners. On the other hand, using typed data structures, I had to write something like this: ``` rust /// Get the latest description for a particular user from their about messages. pub async fn get_description(&mut self, ssb_id: &str) -> Result<Option<String>, GolgiError> { // vector of about messages with most recent at the front of the vector let about_messages = self.get_about_messages(ssb_id).await?; // iterate through the vector looking for an about message with a description // the first one we find is th emost recnet for msg in about_messages { let about_message: SsbMessageContent = msg.into_ssb_message_content()?; match about_message { SsbMessageContent::About{description, ..} => { match description { Some(description_text) => { return Ok(Some(description_text)) }, None => { continue } } }, _ => {} } } // if no about message with a description was found, then return the empty string Ok(None) } ``` As you can see in the above, I was able to take advantage of the serde deserialization via msg.into_ssb_message_content, but then *I think* to use the fields of a particular variant, you still need to do a match back down into that particular variant... which is actually quite a lot of code compared to, msg.content.get("description"), and would require being written separately for each particular field. I think I prefer the generic get_latest_about_message function making use of Value.get. - I also added a publish_name function (so I could test publish and getting names and descriptions) One thing I'm curious about, is how we will efficiently get all the fields needed for a profile. In the most inefficient way, we could call functions like this for each of the fields we need. But it would probably be more efficient to do one call to go-sbot get_subset, and then filter that result multiple times in rust, to get all the fields we need. However that could also be built separately and additionally to these individual field-getter functions, which still seem like nice functions to have. This PR is on top of the previous PR, and is an experiment to see how this feels, but I think its also ready for code review or further thoughts.
notplants added 3 commits 2021-12-30 02:02:06 +00:00
notplants added 1 commit 2021-12-30 02:03:18 +00:00
notplants requested review from glyph 2021-12-30 02:16:38 +00:00
Author
Owner

next thing I'm going to look into is modifying get_source_until_eof to return an iterator instead of a vector

next thing I'm going to look into is modifying get_source_until_eof to return an iterator instead of a vector
Owner

@notplants

SsbMessageContentType enum and associated is_message_type() method are nice.

[get_latest_about_message()] actually nicely gets to take advantage of the unstructured content field, using the "get" method of Value and supplying the key name. I found this ended up with something pretty elegant, as then you can write a get_name and get_description function (etc.) with simple one liners

Interesting to hear that this approach feels more streamlined.

As a general code comment: I've been guilty of this many times in the past but basically any time you see a nested match in your code you should challenge yourself to de-nest. I'll make some relevant code comments on this point.

it would probably be more efficient to do one call to go-sbot get_subset, and then filter that result multiple times in rust, to get all the fields we need. However that could also be built separately and additionally to these individual field-getter functions, which still seem like nice functions to have.

I'm honestly not sure which path will be more efficient (making many calls to go-sbot or making one or two big ones and filtering in Rust). I totally agree that the individual field-getter functions are very important to have (the "make Lego blocks first, then build with them" approach).

@notplants `SsbMessageContentType` `enum` and associated `is_message_type()` method are nice. > [`get_latest_about_message()`] actually nicely gets to take advantage of the unstructured content field, using the "get" method of Value and supplying the key name. I found this ended up with something pretty elegant, as then you can write a get_name and get_description function (etc.) with simple one liners Interesting to hear that this approach feels more streamlined. As a general code comment: I've been guilty of this many times in the past but basically any time you see a nested match in your code you should challenge yourself to de-nest. I'll make some relevant code comments on this point. > it would probably be more efficient to do one call to go-sbot get_subset, and then filter that result multiple times in rust, to get all the fields we need. However that could also be built separately and additionally to these individual field-getter functions, which still seem like nice functions to have. I'm honestly not sure which path will be more efficient (making many calls to `go-sbot` or making one or two big ones and filtering in Rust). I totally agree that the individual field-getter functions are very important to have (the "make Lego blocks first, then build with them" approach).
Author
Owner

very curious to hear your suggestions about how to denest match statements.

I had also been having an intuition that wasn't the most streamlined way to do things,
but also happy to have a pattern that works for getting things working. Looking forward to learn some new patterns to employ to reduce clutter.

very curious to hear your suggestions about how to denest match statements. I had also been having an intuition that wasn't the most streamlined way to do things, but also happy to have a pattern that works for getting things working. Looking forward to learn some new patterns to employ to reduce clutter.
Author
Owner

Interesting to hear that this approach feels more streamlined.

seeing these two code examples makes me wonder if strict typing mostly benefits when your data is actually strictly typed

perhaps when your data is not strictly typed, using a HashMap (or json::Value) is sometimes an easier way to work with it

> Interesting to hear that this approach feels more streamlined. seeing these two code examples makes me wonder if strict typing mostly benefits when your data is actually strictly typed perhaps when your data is not strictly typed, using a HashMap (or json::Value) is sometimes an easier way to work with it
notplants added 2 commits 2021-12-30 13:09:06 +00:00
glyph requested changes 2021-12-30 13:35:47 +00:00
glyph left a comment
Owner

Coming together nicely!

I've requested a couple changes and offered some feedback on code solutions.

Coming together nicely! I've requested a couple changes and offered some feedback on code solutions.
@ -32,0 +36,4 @@
Vote,
Post,
Contact,
None
Owner

This feels like an anti-pattern (having an Option as a variant). It's better to wrap the SsbMessageContentType in an Option to express "might or might not exist".

This feels like an anti-pattern (having an `Option` as a variant). It's better to wrap the `SsbMessageContentType` in an `Option` to express "might or might not exist".
notplants marked this conversation as resolved
@ -42,0 +47,4 @@
pub fn get_message_type(&self) -> Result<SsbMessageContentType, GolgiError> {
let msg_type = self
.content
.get("type");
Owner

If our message doesn't have a type then we should return from the function right away.

In this case I'd recommend returning an error (a message without a type field in content is erroneous):

let msg_type = self
    .content
    .get("type")
    .ok_or(Err(GolgiError::ContentType))?;

This approach simplifies the matching because msg_type ends up as a &Value. You can then convert the value to &str and match on it straight away.

If our message doesn't have a type then we should return from the function right away. In this case I'd recommend returning an error (a message without a `type` field in `content` is erroneous): ```rust let msg_type = self .content .get("type") .ok_or(Err(GolgiError::ContentType))?; ``` This approach simplifies the matching because `msg_type` ends up as a `&Value`. You can then convert the value to `&str` and match on it straight away.
notplants marked this conversation as resolved
@ -42,0 +50,4 @@
.get("type");
let enum_type = match msg_type {
Some(mtype) => {
let mtype_str: &str = mtype.as_str().ok_or(GolgiError::ContentTypeDecode("type field with invalid format".to_string()))?;
Owner

This error variant is something like ContentTypeValue("type field value is not a string as expected").

This error variant is something like `ContentTypeValue("type field value is not a string as expected")`.
notplants marked this conversation as resolved
@ -42,0 +56,4 @@
"post" => SsbMessageContentType::Post,
"vote" => SsbMessageContentType::Vote,
"contact" => SsbMessageContentType::Contact,
_ => return Err(GolgiError::ContentTypeDecode("type field with unknown value".to_string()))
Owner

I can think of two approaches here to handle the "pattern not matched" case:

  1. Return an error (as you have), but use a different error variant (this error is not the same as "we expected a string but got some other value"). Something like GolgiError::UnknownContentType.

  2. Add a variant to the SsbMessageContentType enum to represent any unknown / unrecognized / unsupported content type:

pub enum SsbMessageContentType {
    About,
    Vote,
    Post,
    Contact,
    Unrecognized,
}

I prefer the second solution since it allows us to represent all valid (ie. string-type) message content types in our SsbMessageContentType type.

I can think of two approaches here to handle the "pattern not matched" case: 1) Return an error (as you have), but use a different error variant (this error is not the same as "we expected a string but got some other value"). Something like `GolgiError::UnknownContentType`. 2) Add a variant to the `SsbMessageContentType` `enum` to represent any unknown / unrecognized / unsupported content type: ```rust pub enum SsbMessageContentType { About, Vote, Post, Contact, Unrecognized, } ``` I prefer the second solution since it allows us to represent all valid (ie. string-type) message content types in our `SsbMessageContentType` type.
notplants marked this conversation as resolved
@ -149,0 +176,4 @@
};
let kvts: Vec<SsbMessageKVT> = self.get_subset(query).await?;
let messages: Vec<SsbMessageValue> = kvts.into_iter().map(|kvt| kvt.value).collect();
// TODO: after fixing sbot regression,
Owner

👍

👍
notplants marked this conversation as resolved
@ -149,0 +197,4 @@
let about_messages = self.get_about_messages(ssb_id).await?;
// iterate through the vector looking for an about message with the given key
// the first one we find is the most recent
for msg in about_messages {
Owner

I had some fun playing with a more compact solution (took me a few hours...I'm not great with iterators and transformation yet).

Here's what I came up with:

let about_messages = self.get_about_messages(ssb_id).await?;

// iterate through all the about messages
let latest_about = about_messages
    .iter()
    // find the first msg that contains the field `key`
    .find(|msg| msg.content.get(key).is_some())
    // map the found msg (`Some(SsbMessageValue)`) to a `Some(Some(&Value))`
    .map(|msg| msg.content.get(key))
    // flatten `Some(Some(&Value))` into `Some(&Value)`
    .flatten()
    // map `Some(&Value)` to `Some(Some(&str))`
    .map(|msg_val| msg_val.as_str())
    // flatten `Some(Some(&str))` to `Some(&str)`
    .flatten()
    // map `Some(&str))` to `Some(String)`
    .map(|msg_str| msg_str.to_string());

// return value is either `Ok(Some(String))` or `Ok(None)`
Ok(latest_about)
I had some fun playing with a more compact solution (took me a few hours...I'm not great with iterators and transformation yet). Here's what I came up with: ```rust let about_messages = self.get_about_messages(ssb_id).await?; // iterate through all the about messages let latest_about = about_messages .iter() // find the first msg that contains the field `key` .find(|msg| msg.content.get(key).is_some()) // map the found msg (`Some(SsbMessageValue)`) to a `Some(Some(&Value))` .map(|msg| msg.content.get(key)) // flatten `Some(Some(&Value))` into `Some(&Value)` .flatten() // map `Some(&Value)` to `Some(Some(&str))` .map(|msg_val| msg_val.as_str()) // flatten `Some(Some(&str))` to `Some(&str)` .flatten() // map `Some(&str))` to `Some(String)` .map(|msg_str| msg_str.to_string()); // return value is either `Ok(Some(String))` or `Ok(None)` Ok(latest_about) ```
Owner

The nice thing about these iterator methods is being able to avoid nested match statements, especially those that have multiple arms which don't do much (e.g. just return continue).

The nice thing about these iterator methods is being able to avoid nested `match` statements, especially those that have multiple arms which don't do much (e.g. just return `continue`).
Author
Owner

nice! the comments on this are great. makes sense. helpful to see flatten used like this too.

nice! the comments on this are great. makes sense. helpful to see flatten used like this too.
notplants closed this pull request 2021-12-30 18:01:53 +00:00
Author
Owner

having weird gitea issue where when I merged in the PR this PR is based off, it closes the PR...

having weird gitea issue where when I merged in the PR this PR is based off, it closes the PR...
Author
Owner

I think gitea doesn't like when I delete the branch a PR is into, while the PR is still open... even though I was planning on changing the branch to "main" after merging the first one.

I will open a new PR pointing to this one, with the same changes. Workaround for now.

I think gitea doesn't like when I delete the branch a PR is into, while the PR is still open... even though I was planning on changing the branch to "main" after merging the first one. I will open a new PR pointing to this one, with the same changes. Workaround for now.
notplants changed title from Add get_latest_about_message to Borked PR: Add get_latest_about_message 2021-12-30 18:38:18 +00:00

Pull request closed

Sign in to join this conversation.
No Reviewers
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: golgi-ssb/golgi#7
No description provided.