Add SsbKVT, SsbMessageValue and SsbMessageContent Types #6

Merged
notplants merged 11 commits from kvt into main 2021-12-30 18:01:52 +00:00
Owner

Serializing into serde_json::Value is working well. Also into Vec<Value>. See the function get_async_until_eof for where this takes place.

Before moving forward, more consideration of types is needed, as you started to discuss here #2

This PR looks at defining our own types in golgi:
- SsbMessageValue (directly pulled from ssb-validate)
- KVT (pulled from kuska, but instead of a value for content, it uses an SsbMessageValue for content)

One advantage of this approach, is that it makes it easy to make Vec<KVT> and Vec<SsbMessageValue> which are nicely typed and easier to work with than just json. Since we should be able to count on messages being validated and well-formatted, perhaps its good to take advantage of this, and convert them into nicely typed objects.

It also gives the small advantage that since we are defining our own type outside of, we can give it the name KVT (instead of Feed) no problem. Small but could make code more understandable to developers working with this library.

#[derive(Serialize, Deserialize, Debug)]
#[serde(deny_unknown_fields)]
pub struct SsbMessageValue {
    pub previous: Option<Multihash>,
    pub author: String,
    pub sequence: u64,
    pub timestamp: LegacyF64,
    pub hash: String,
    pub content: Value,
    pub signature: String,
}
#[derive(Serialize, Deserialize, Debug)]
#[serde(deny_unknown_fields)]
pub struct KVT {
    pub key: String,
    pub value: SsbMessageValue,
    pub timestamp: f64,
    pub rts: Option<f64>,
}

I also started to look into how we would work with the content field of SsbMessageValue and what patterns we could use for converting it into a typed object.

I didn't get super far with this, but did implement an into_post function for SsbMessageValue which was working and allowed for a call like:

let ssb_messages: Vec<SsbMessageValue> = sbot_client.create_history_stream(author).await?;
let posts: Vec<Post> = ssb_messages.iter().flat_map(|msg| msg.into_post()).collect();

(note the above code still thows an error due to an issue with the borrow checker, but I think it shows a possible pattern still).

for particular functions where we know what type of message they are supposed to return (because we are filtering by type in the query) we could also do this conversion within the method, so the user of the function just gets back a nice Vec<Post> and doesn't have to convert it themselves. Although I think Ideally, the user would not get back just a Vec<Post> but actually a variation of Vec<SsbMessageValue> where the content field is of type Post (so that they have all the info, nicely typed, to do whatever they want with it).

This PR meant to continue discussion, curious to hear your thoughts. Whether its preferable to use kuska::Feed , or if its worth it to have our own type, so we can have strongly typed objects with all fields easily accessible. And also if any patterns come to mind for how to have the content field be typed, but differently typed for different types of messages (is this something where we use type inheritance? or how to use enum variants for that? I'm still thinking through this).

Serializing into serde_json::Value is working well. Also into `Vec<Value>`. See the function get_async_until_eof for where this takes place. Before moving forward, more consideration of types is needed, as you started to discuss here https://git.coopcloud.tech/golgi-ssb/golgi/issues/2 This PR looks at defining our own types in golgi: - SsbMessageValue (directly pulled from ssb-validate) - KVT (pulled from kuska, but instead of a value for content, it uses an SsbMessageValue for content) One advantage of this approach, is that it makes it easy to make `Vec<KVT>` and `Vec<SsbMessageValue>` which are nicely typed and easier to work with than just json. Since we should be able to count on messages being validated and well-formatted, perhaps its good to take advantage of this, and convert them into nicely typed objects. It also gives the small advantage that since we are defining our own type outside of, we can give it the name KVT (instead of Feed) no problem. Small but could make code more understandable to developers working with this library. ``` #[derive(Serialize, Deserialize, Debug)] #[serde(deny_unknown_fields)] pub struct SsbMessageValue { pub previous: Option<Multihash>, pub author: String, pub sequence: u64, pub timestamp: LegacyF64, pub hash: String, pub content: Value, pub signature: String, } ``` ``` #[derive(Serialize, Deserialize, Debug)] #[serde(deny_unknown_fields)] pub struct KVT { pub key: String, pub value: SsbMessageValue, pub timestamp: f64, pub rts: Option<f64>, } ``` I also started to look into how we would work with the content field of SsbMessageValue and what patterns we could use for converting it into a typed object. I didn't get super far with this, but did implement an into_post function for SsbMessageValue which was working and allowed for a call like: ``` let ssb_messages: Vec<SsbMessageValue> = sbot_client.create_history_stream(author).await?; let posts: Vec<Post> = ssb_messages.iter().flat_map(|msg| msg.into_post()).collect(); ``` (note the above code still thows an error due to an issue with the borrow checker, but I think it shows a possible pattern still). for particular functions where we know what type of message they are supposed to return (because we are filtering by type in the query) we could also do this conversion within the method, so the user of the function just gets back a nice `Vec<Post>` and doesn't have to convert it themselves. Although I think Ideally, the user would not get back just a `Vec<Post>` but actually a variation of `Vec<SsbMessageValue>` where the content field is of type Post (so that they have all the info, nicely typed, to do whatever they want with it). This PR meant to continue discussion, curious to hear your thoughts. Whether its preferable to use kuska::Feed , or if its worth it to have our own type, so we can have strongly typed objects with all fields easily accessible. And also if any patterns come to mind for how to have the content field be typed, but differently typed for different types of messages (is this something where we use type inheritance? or how to use enum variants for that? I'm still thinking through this).
notplants added 4 commits 2021-12-28 20:08:11 +00:00
Author
Owner

cc @glyph

cc @glyph
notplants added 1 commit 2021-12-29 01:36:01 +00:00
notplants added 1 commit 2021-12-29 01:37:41 +00:00
Author
Owner

Found a feature in the serde docs which I was missing before:
https://serde.rs/enum-representations.html#internally-tagged

I see that adria was already using this tagging on TypedMessage. Cool.

Means you can simply deserialize a TypedMessage from content like this:
let typed_message: TypedMessage = serde_json::from_value(ssb_message.content)?;

where ssb_message.content has type serde_json::Value. Nice that we don't need any match statement on type - serde handles it for us using the type field as the tag to select the variant to deserialize into.

I added a commit here where I added a TypedSsbMessageValue struct, which is the same as SsbMessageValue, but contains an additional field, typed_message, which has type TypedMessage (and is built from content).

pub struct TypedSsbMessageValue {
    pub previous: Option<Multihash>,
    pub author: String,
    pub sequence: u64,
    pub timestamp: LegacyF64,
    pub hash: String,
    pub content: Value,
    pub typed_message: TypedMessage,
    pub signature: String,
}

Alternatively, we could also not have a struct like this, and we could just create a TypedMessage on the fly when needed from an SsbMessageValue, via the snippet included above. I decided to keep the content field in TypedSsbMessageValue even though typed_message is essentially the typed version of the same data, because I thought it might be helpful.

Not sure which of these types it makes the most sense for our functions to work with. A TypedSsbMessageValue has the most useful fields (has all the info one could want to work with) but maybe there are tradeoffs?

Found a feature in the serde docs which I was missing before: https://serde.rs/enum-representations.html#internally-tagged I see that adria was already using this tagging on TypedMessage. Cool. Means you can simply deserialize a TypedMessage from content like this: ```let typed_message: TypedMessage = serde_json::from_value(ssb_message.content)?;``` where ssb_message.content has type serde_json::Value. Nice that we don't need any match statement on type - serde handles it for us using the type field as the tag to select the variant to deserialize into. I added a commit here where I added a TypedSsbMessageValue struct, which is the same as SsbMessageValue, but contains an additional field, typed_message, which has type TypedMessage (and is built from content). ``` pub struct TypedSsbMessageValue { pub previous: Option<Multihash>, pub author: String, pub sequence: u64, pub timestamp: LegacyF64, pub hash: String, pub content: Value, pub typed_message: TypedMessage, pub signature: String, } ``` Alternatively, we could also not have a struct like this, and we could just create a TypedMessage on the fly when needed from an SsbMessageValue, via the snippet included above. I decided to keep the content field in TypedSsbMessageValue even though typed_message is essentially the typed version of the same data, because I thought it might be helpful. Not sure which of these types it makes the most sense for our functions to work with. A TypedSsbMessageValue has the most useful fields (has all the info one could want to work with) but maybe there are tradeoffs?
Owner

@notplants

It's great to hear that various deserialization strategies are working and I'm stoked to see the thoughtfulness in your approach! 🖤

On a procedural note, I would prefer to keep these sorts of open discussions in issues (we could have rather continued the discussion in issue #2). Then pull-request discussions stay focused on the specific code implementation under submission. We can keep the discussion going here in this case but just noting it as a request for the future.


Type names and definitions

I suggest our two main message types be named SsbMessageKVT and SsbMessageValue. KVT alone will still be opaque to many golgi users; prefixing both types with SsbMessage makes their nature clearer.

To avoid unnecessary duplication, we can use the Feed type from kuska and give it an alias to add clarity.

use kuska_ssb::feed::Feed;

pub type SsbMessageKVT = Feed;

Regarding SsbMessageValue, I suggest defining previous as Option<String> and time as f64 so that we don't have to depend on the ssb_multiformats and ssb_legacy_msg_data crates. The Multihash and LegacyF64 types would be more relevant if we were writing a server implementation and wanted to ensure strict interoperability with other implementations. In our case, go-sbot handles that for us.

These message types should not be defined in src/utils.rs (could be src/ssb.rs or src/messages.rs or something similar).

Deserialization flow

Means you can simply deserialize a TypedMessage from content like this:

let typed_message: TypedMessage = serde_json::from_value(ssb_message.content)?;

+1 This is rad.

I don't find the TypedSsbMessageValue approach particularly elegant. It seems like a case where we could rather implement a conversion method on SsbMessageValue for on-demand deserialization when greater type specificity is required. That avoids the need for an extra type which is ever-so-slightly different from another, while also following the pattern you established with .into_post().

So we could have into_typed_content() as the generic conversion implementation and into_post(), into_vote(), into_contact() etc. as the specific conversion implementations.

I'd also consider aliasing TypedMessage as SsbMessageContent to match the SsbMessage... naming pattern (like a matryoshka doll of SSB message types) and clarify what the type contains.

Usage might then end up looking something like this:

let query_results: Vec<SsbMessageKVT> = sbot_client.get_subset(query).await?;

let message_values: Vec<SsbMessageValue> = query_results.iter().map(|kvt| kvt.into_msg_value()).collect();

let message_contents: Vec<SsbMessageContent> = message_values.iter().map(|value| value.into_typed_content()).collect();

// or maybe we know the results are going to be of type `Post`
let post_contents: Vec<Post> = message_values.iter().map(|value| value.into_post()).collect();
@notplants It's great to hear that various deserialization strategies are working and I'm stoked to see the thoughtfulness in your approach! 🖤 On a procedural note, I would prefer to keep these sorts of open discussions in issues (we could have rather continued the discussion in issue #2). Then pull-request discussions stay focused on the specific code implementation under submission. We can keep the discussion going here in this case but just noting it as a request for the future. ----- **Type names and definitions** I suggest our two main message types be named `SsbMessageKVT` and `SsbMessageValue`. `KVT` alone will still be opaque to many golgi users; prefixing both types with `SsbMessage` makes their nature clearer. To avoid unnecessary duplication, we can use the `Feed` type from kuska and give it an alias to add clarity. ```rust use kuska_ssb::feed::Feed; pub type SsbMessageKVT = Feed; ``` Regarding `SsbMessageValue`, I suggest defining `previous` as `Option<String>` and `time` as `f64` so that we don't have to depend on the `ssb_multiformats` and `ssb_legacy_msg_data` crates. The `Multihash` and `LegacyF64` types would be more relevant if we were writing a server implementation and wanted to ensure strict interoperability with other implementations. In our case, `go-sbot` handles that for us. These message types should not be defined in `src/utils.rs` (could be `src/ssb.rs` or `src/messages.rs` or something similar). **Deserialization flow** > Means you can simply deserialize a TypedMessage from content like this: > ```rust > let typed_message: TypedMessage = serde_json::from_value(ssb_message.content)?; > ``` +1 This is rad. I don't find the `TypedSsbMessageValue` approach particularly elegant. It seems like a case where we could rather implement a conversion method on `SsbMessageValue` for on-demand deserialization when greater type specificity is required. That avoids the need for an extra type which is ever-so-slightly different from another, while also following the pattern you established with `.into_post()`. So we could have `into_typed_content()` as the generic conversion implementation and `into_post()`, `into_vote()`, `into_contact()` etc. as the specific conversion implementations. I'd also consider aliasing `TypedMessage` as `SsbMessageContent` to match the `SsbMessage...` naming pattern (like a matryoshka doll of SSB message types) and clarify what the type contains. Usage might then end up looking something like this: ```rust let query_results: Vec<SsbMessageKVT> = sbot_client.get_subset(query).await?; let message_values: Vec<SsbMessageValue> = query_results.iter().map(|kvt| kvt.into_msg_value()).collect(); let message_contents: Vec<SsbMessageContent> = message_values.iter().map(|value| value.into_typed_content()).collect(); // or maybe we know the results are going to be of type `Post` let post_contents: Vec<Post> = message_values.iter().map(|value| value.into_post()).collect(); ```
notplants added 1 commit 2021-12-29 15:59:17 +00:00
notplants added 1 commit 2021-12-29 16:32:09 +00:00
notplants changed title from WIP: Discussion Of Working With KVT to Add SsbKVT, SsbMessageValue and SsbMessageContent Types 2021-12-29 16:32:45 +00:00
Author
Owner

Per our discussion, I changed the types to: SsbKvt, SsbMessageValue and SsbMessageContent,

and implemented the other comments you suggested (remove dependency on multiformat, move from utils.rs to messages.rs)

This PR is now ready for a more detailed review

Per our discussion, I changed the types to: SsbKvt, SsbMessageValue and SsbMessageContent, and implemented the other comments you suggested (remove dependency on multiformat, move from utils.rs to messages.rs) This PR is now ready for a more detailed review
notplants added 1 commit 2021-12-29 16:38:13 +00:00
Author
Owner

I also removed whoami_res_parse and publish_res_parse and replaced them with generic string_res_parse which can be used for both.

I also removed whoami_res_parse and publish_res_parse and replaced them with generic string_res_parse which can be used for both.
glyph requested changes 2021-12-29 18:01:11 +00:00
glyph left a comment
Owner

I've left quite a few comments.

I'm not able to compile the code due to createLogStream issues.

Let me know when you're ready for a re-review and I'll take a second pass.

Cool to see this coming together!

I've left quite a few comments. I'm not able to compile the code due to `createLogStream` issues. Let me know when you're ready for a re-review and I'll take a second pass. Cool to see this coming together!
src/error.rs Outdated
@ -36,2 +36,4 @@
/// JSON serialization or deserialization error.
SerdeJson(JsonError),
/// Error decoding typed ssb message from content.
ContentTypeDecode(String),
Owner

I'm not clear where this error variant is being used. Maybe line 50 in src/messages.rs (the from_value method)?

If so, then we should rather use the existing SerdeJson error variant which encapsulates both serialization and deserialization failures.

I'm not clear where this error variant is being used. Maybe line 50 in `src/messages.rs` (the `from_value` method)? If so, then we should rather use the existing `SerdeJson` error variant which encapsulates both serialization and deserialization failures.
Author
Owner

I think I was using this error for something specifically, then removed it. But I needed another error like this in the next PR, specifically for when the JSON is valid (not a json serialization error) but the format of the content is invalid. Maybe needs a different name?

I think I was using this error for something specifically, then removed it. But I needed another error like this in the next PR, specifically for when the JSON is valid (not a json serialization error) but the format of the content is invalid. Maybe needs a different name?
Owner

If the variant is not used in this PR then it's best to remove it. It's difficult for me to reason about the naming if I can't see the code it's called from.

If the variant is not used in this PR then it's best to remove it. It's difficult for me to reason about the naming if I can't see the code it's called from.
glyph marked this conversation as resolved
src/error.rs Outdated
@ -48,6 +50,7 @@ impl std::error::Error for GolgiError {
GolgiError::Rpc(ref err) => Some(err),
GolgiError::Sbot(_) => None,
GolgiError::SerdeJson(ref err) => Some(err),
GolgiError::ContentTypeDecode(ref _err) => None,
Owner

In this case you can omit the ref _err:

GolgiError::ContentTypeDecode(_) => None,

In this case you can omit the `ref _err`: `GolgiError::ContentTypeDecode(_) => None,`
notplants marked this conversation as resolved
@ -0,0 +32,4 @@
impl SsbMessageValue {
/// Gets the type field of the message content if found,
/// and if not returns "none"
pub fn get_message_type(&self) -> String {
Owner

It seems like Option<String> is a better return type here.

Then L40 can just be .ok() instead of .unwrap_or_else(|| "none".to_string());

It seems like `Option<String>` is a better return type here. Then L40 can just be `.ok()` instead of `.unwrap_or_else(|| "none".to_string());`
Author
Owner

I made some more changes here, and changed this to an enum in the next PR

I made some more changes here, and changed this to an enum in the next PR
notplants marked this conversation as resolved
src/messages.rs Outdated
@ -0,0 +58,4 @@
#[derive(Serialize, Deserialize, Debug)]
#[serde(deny_unknown_fields)]
#[allow(missing_docs)]
pub struct SsbKVT {
Owner

I would rather name this SsbMessageKVT to match the other two types.

I would rather name this `SsbMessageKVT` to match the other two types.
notplants marked this conversation as resolved
src/sbot.rs Outdated
@ -100,0 +103,4 @@
/// Call the `createLogStream` RPC method and return a Vec<Message>
pub async fn log(&mut self) -> Result<String, GolgiError> {
let req_id = self.client.log_req_send().await?;
Owner

Compilation of your branch fails for me due to this line. I'm guessing you've added this in a local branch of kuska? If so, we should remove the createLogStream feature from this PR and only add that once I'm gotten the required functionality merged into kuska.

error[E0599]: no method named `log_req_send` found for struct `ApiCaller` in the current scope
   --> src/sbot.rs:106:34
    |
106 |         let req_id = self.client.log_req_send().await?;
    |                                  ^^^^^^^^^^^^ help: there is an associated function with a similar name: `get_req_send`
Compilation of your branch fails for me due to this line. I'm guessing you've added this in a local branch of `kuska`? If so, we should remove the `createLogStream` feature from this PR and only add that once I'm gotten the required functionality merged into `kuska`. ``` error[E0599]: no method named `log_req_send` found for struct `ApiCaller` in the current scope --> src/sbot.rs:106:34 | 106 | let req_id = self.client.log_req_send().await?; | ^^^^^^^^^^^^ help: there is an associated function with a similar name: `get_req_send` ```
Author
Owner

yup thats what happened. I'll remove this from golgi for now, was really just using it as a sanity check.

yup thats what happened. I'll remove this from golgi for now, was really just using it as a sanity check.
notplants marked this conversation as resolved
@ -26,2 +26,4 @@
/// * `body` - An array of u8 to be parsed.
pub fn string_res_parse(body: &[u8]) -> Result<String, GolgiError> {
// TODO: cleanup with proper error handling etc.
Ok(std::str::from_utf8(body).unwrap().to_string())
Owner

Don't forget to add an error variant to handle this properly ;)

Don't forget to add an error variant to handle this properly ;)
notplants marked this conversation as resolved
@ -54,0 +104,4 @@
/// * `f` - A function which takes in an array of u8 and returns a Result<T, GolgiError>.
/// This is a function which parses the response from the RpcReader. T is a generic type,
/// so this parse function can return multiple possible types (String, json, custom struct etc.)
pub async fn get_async_until_eof<'a, R, T, F>(
Owner

Since we're dealing with a stream here and not an async RPC call we should rather name this function get_source_until_eof.

Then we're sticking with the naming convention used by go-sbot which differentiates between async calls and source (aka stream) calls.

Since we're dealing with a stream here and not an async RPC call we should rather name this function `get_source_until_eof`. Then we're sticking with the naming convention used by `go-sbot` which differentiates between async calls and source (aka stream) calls.
notplants marked this conversation as resolved
notplants added 1 commit 2021-12-30 01:10:29 +00:00
Author
Owner

^ incorporated the code review in that last commit.

Only change I didn't include was removing ContentTypeDecode and get_message_type, which are changed in the next PR.

^ incorporated the code review in that last commit. Only change I didn't include was removing ContentTypeDecode and get_message_type, which are changed in the next PR.
Owner

@notplants

Thanks for the changes!

The last requested change is to remove the unused error variant and associated code (ContentTypeDecode(String)).

@notplants Thanks for the changes! The last requested change is to remove the unused error variant and associated code (`ContentTypeDecode(String)`).
notplants added 1 commit 2021-12-30 12:45:50 +00:00
Author
Owner

Ok I've removed the error variant.

Perhaps making small PRs is a bit awkward too, if the PRs are connected and there are multiple parts interacting. Will attempt to split PRs into meaningful coherent chunks as best I can, but sometimes its a bit of a shifting target.

Ok I've removed the error variant. Perhaps making small PRs is a bit awkward too, if the PRs are connected and there are multiple parts interacting. Will attempt to split PRs into meaningful coherent chunks as best I can, but sometimes its a bit of a shifting target.
Owner

@notplants

Thanks for the removal.

Yeah I totally understand how PR's can balloon, especially in the early phases of a project where you're operating in discovery mode.

Speaking from the perspective of a PR reviewer: the smaller, the better. It's really challenging for me to hold a coherent mental model of the changes when the PR touches many files and adds a lot of new functionality. One new feature per PR might be a nice goal to aim for (with the understanding that this isn't always possible or desirable).

This short guide on authoring PR's has been helpful for me in the past.

You're clear to merge when you're ready :) 🟢

@notplants Thanks for the removal. Yeah I totally understand how PR's can balloon, especially in the early phases of a project where you're operating in discovery mode. Speaking from the perspective of a PR reviewer: the smaller, the better. It's really challenging for me to hold a coherent mental model of the changes when the PR touches many files and adds a lot of new functionality. One new feature per PR might be a nice goal to aim for (with the understanding that this isn't always possible or desirable). This [short guide on authoring PR's](https://www.atlassian.com/blog/git/written-unwritten-guide-pull-requests) has been helpful for me in the past. You're clear to merge when you're ready :) 🟢
notplants merged commit 0fbd6f0931 into main 2021-12-30 18:01:52 +00:00
notplants deleted branch kvt 2021-12-30 18:01:53 +00:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: golgi-ssb/golgi#6
No description provided.