Add SsbKVT, SsbMessageValue and SsbMessageContent Types #6

notplants · 2021-12-28T20:08:11Z

notplants commented

2021-12-28 20:08:11 +00:00

Serializing into serde_json::Value is working well. Also into Vec<Value>. See the function get_async_until_eof for where this takes place.

Before moving forward, more consideration of types is needed, as you started to discuss here #2

This PR looks at defining our own types in golgi:
- SsbMessageValue (directly pulled from ssb-validate)
- KVT (pulled from kuska, but instead of a value for content, it uses an SsbMessageValue for content)

One advantage of this approach, is that it makes it easy to make Vec<KVT> and Vec<SsbMessageValue> which are nicely typed and easier to work with than just json. Since we should be able to count on messages being validated and well-formatted, perhaps its good to take advantage of this, and convert them into nicely typed objects.

It also gives the small advantage that since we are defining our own type outside of, we can give it the name KVT (instead of Feed) no problem. Small but could make code more understandable to developers working with this library.

#[derive(Serialize, Deserialize, Debug)]
#[serde(deny_unknown_fields)]
pub struct SsbMessageValue {
    pub previous: Option<Multihash>,
    pub author: String,
    pub sequence: u64,
    pub timestamp: LegacyF64,
    pub hash: String,
    pub content: Value,
    pub signature: String,
}

#[derive(Serialize, Deserialize, Debug)]
#[serde(deny_unknown_fields)]
pub struct KVT {
    pub key: String,
    pub value: SsbMessageValue,
    pub timestamp: f64,
    pub rts: Option<f64>,
}

I also started to look into how we would work with the content field of SsbMessageValue and what patterns we could use for converting it into a typed object.

I didn't get super far with this, but did implement an into_post function for SsbMessageValue which was working and allowed for a call like:

let ssb_messages: Vec<SsbMessageValue> = sbot_client.create_history_stream(author).await?;
let posts: Vec<Post> = ssb_messages.iter().flat_map(|msg| msg.into_post()).collect();

(note the above code still thows an error due to an issue with the borrow checker, but I think it shows a possible pattern still).

for particular functions where we know what type of message they are supposed to return (because we are filtering by type in the query) we could also do this conversion within the method, so the user of the function just gets back a nice Vec<Post> and doesn't have to convert it themselves. Although I think Ideally, the user would not get back just a Vec<Post> but actually a variation of Vec<SsbMessageValue> where the content field is of type Post (so that they have all the info, nicely typed, to do whatever they want with it).

This PR meant to continue discussion, curious to hear your thoughts. Whether its preferable to use kuska::Feed , or if its worth it to have our own type, so we can have strongly typed objects with all fields easily accessible. And also if any patterns come to mind for how to have the content field be typed, but differently typed for different types of messages (is this something where we use type inheritance? or how to use enum variants for that? I'm still thinking through this).

Serializing into serde_json::Value is working well. Also into `Vec<Value>`. See the function get_async_until_eof for where this takes place. Before moving forward, more consideration of types is needed, as you started to discuss here https://git.coopcloud.tech/golgi-ssb/golgi/issues/2 This PR looks at defining our own types in golgi: - SsbMessageValue (directly pulled from ssb-validate) - KVT (pulled from kuska, but instead of a value for content, it uses an SsbMessageValue for content) One advantage of this approach, is that it makes it easy to make `Vec<KVT>` and `Vec<SsbMessageValue>` which are nicely typed and easier to work with than just json. Since we should be able to count on messages being validated and well-formatted, perhaps its good to take advantage of this, and convert them into nicely typed objects. It also gives the small advantage that since we are defining our own type outside of, we can give it the name KVT (instead of Feed) no problem. Small but could make code more understandable to developers working with this library. ``` #[derive(Serialize, Deserialize, Debug)] #[serde(deny_unknown_fields)] pub struct SsbMessageValue { pub previous: Option<Multihash>, pub author: String, pub sequence: u64, pub timestamp: LegacyF64, pub hash: String, pub content: Value, pub signature: String, } ``` ``` #[derive(Serialize, Deserialize, Debug)] #[serde(deny_unknown_fields)] pub struct KVT { pub key: String, pub value: SsbMessageValue, pub timestamp: f64, pub rts: Option<f64>, } ``` I also started to look into how we would work with the content field of SsbMessageValue and what patterns we could use for converting it into a typed object. I didn't get super far with this, but did implement an into_post function for SsbMessageValue which was working and allowed for a call like: ``` let ssb_messages: Vec<SsbMessageValue> = sbot_client.create_history_stream(author).await?; let posts: Vec<Post> = ssb_messages.iter().flat_map(|msg| msg.into_post()).collect(); ``` (note the above code still thows an error due to an issue with the borrow checker, but I think it shows a possible pattern still). for particular functions where we know what type of message they are supposed to return (because we are filtering by type in the query) we could also do this conversion within the method, so the user of the function just gets back a nice `Vec<Post>` and doesn't have to convert it themselves. Although I think Ideally, the user would not get back just a `Vec<Post>` but actually a variation of `Vec<SsbMessageValue>` where the content field is of type Post (so that they have all the info, nicely typed, to do whatever they want with it). This PR meant to continue discussion, curious to hear your thoughts. Whether its preferable to use kuska::Feed , or if its worth it to have our own type, so we can have strongly typed objects with all fields easily accessible. And also if any patterns come to mind for how to have the content field be typed, but differently typed for different types of messages (is this something where we use type inheritance? or how to use enum variants for that? I'm still thinking through this).

notplants added 4 commits 2021-12-28 20:08:11 +00:00

feae382ead implement getSubset call

807bb8700c Working on getsubset

37b4a21899 Working on getsubset

0952ad9220 Working with KVT types

notplants commented

2021-12-28 20:10:43 +00:00

cc @glyph

notplants added 1 commit 2021-12-29 01:36:01 +00:00

1b2b1db95f Deserialize TypedSsbMessageValue

notplants added 1 commit 2021-12-29 01:37:41 +00:00

69f080fe57 Remove unnecessary lookup of type

notplants commented

2021-12-29 01:40:58 +00:00

Found a feature in the serde docs which I was missing before:
https://serde.rs/enum-representations.html#internally-tagged

I see that adria was already using this tagging on TypedMessage. Cool.

Means you can simply deserialize a TypedMessage from content like this:
let typed_message: TypedMessage = serde_json::from_value(ssb_message.content)?;

where ssb_message.content has type serde_json::Value. Nice that we don't need any match statement on type - serde handles it for us using the type field as the tag to select the variant to deserialize into.

I added a commit here where I added a TypedSsbMessageValue struct, which is the same as SsbMessageValue, but contains an additional field, typed_message, which has type TypedMessage (and is built from content).

pub struct TypedSsbMessageValue {
    pub previous: Option<Multihash>,
    pub author: String,
    pub sequence: u64,
    pub timestamp: LegacyF64,
    pub hash: String,
    pub content: Value,
    pub typed_message: TypedMessage,
    pub signature: String,
}

Alternatively, we could also not have a struct like this, and we could just create a TypedMessage on the fly when needed from an SsbMessageValue, via the snippet included above. I decided to keep the content field in TypedSsbMessageValue even though typed_message is essentially the typed version of the same data, because I thought it might be helpful.

Not sure which of these types it makes the most sense for our functions to work with. A TypedSsbMessageValue has the most useful fields (has all the info one could want to work with) but maybe there are tradeoffs?

Found a feature in the serde docs which I was missing before: https://serde.rs/enum-representations.html#internally-tagged I see that adria was already using this tagging on TypedMessage. Cool. Means you can simply deserialize a TypedMessage from content like this: ```let typed_message: TypedMessage = serde_json::from_value(ssb_message.content)?;``` where ssb_message.content has type serde_json::Value. Nice that we don't need any match statement on type - serde handles it for us using the type field as the tag to select the variant to deserialize into. I added a commit here where I added a TypedSsbMessageValue struct, which is the same as SsbMessageValue, but contains an additional field, typed_message, which has type TypedMessage (and is built from content). ``` pub struct TypedSsbMessageValue { pub previous: Option<Multihash>, pub author: String, pub sequence: u64, pub timestamp: LegacyF64, pub hash: String, pub content: Value, pub typed_message: TypedMessage, pub signature: String, } ``` Alternatively, we could also not have a struct like this, and we could just create a TypedMessage on the fly when needed from an SsbMessageValue, via the snippet included above. I decided to keep the content field in TypedSsbMessageValue even though typed_message is essentially the typed version of the same data, because I thought it might be helpful. Not sure which of these types it makes the most sense for our functions to work with. A TypedSsbMessageValue has the most useful fields (has all the info one could want to work with) but maybe there are tradeoffs?

glyph commented

2021-12-29 14:02:49 +00:00

@notplants

It's great to hear that various deserialization strategies are working and I'm stoked to see the thoughtfulness in your approach! 🖤

On a procedural note, I would prefer to keep these sorts of open discussions in issues (we could have rather continued the discussion in issue #2). Then pull-request discussions stay focused on the specific code implementation under submission. We can keep the discussion going here in this case but just noting it as a request for the future.

Type names and definitions

I suggest our two main message types be named SsbMessageKVT and SsbMessageValue. KVT alone will still be opaque to many golgi users; prefixing both types with SsbMessage makes their nature clearer.

To avoid unnecessary duplication, we can use the Feed type from kuska and give it an alias to add clarity.

use kuska_ssb::feed::Feed;

pub type SsbMessageKVT = Feed;

Regarding SsbMessageValue, I suggest defining previous as Option<String> and time as f64 so that we don't have to depend on the ssb_multiformats and ssb_legacy_msg_data crates. The Multihash and LegacyF64 types would be more relevant if we were writing a server implementation and wanted to ensure strict interoperability with other implementations. In our case, go-sbot handles that for us.

These message types should not be defined in src/utils.rs (could be src/ssb.rs or src/messages.rs or something similar).

Deserialization flow

Means you can simply deserialize a TypedMessage from content like this:
let typed_message: TypedMessage = serde_json::from_value(ssb_message.content)?;

+1 This is rad.

I don't find the TypedSsbMessageValue approach particularly elegant. It seems like a case where we could rather implement a conversion method on SsbMessageValue for on-demand deserialization when greater type specificity is required. That avoids the need for an extra type which is ever-so-slightly different from another, while also following the pattern you established with .into_post().

So we could have into_typed_content() as the generic conversion implementation and into_post(), into_vote(), into_contact() etc. as the specific conversion implementations.

I'd also consider aliasing TypedMessage as SsbMessageContent to match the SsbMessage... naming pattern (like a matryoshka doll of SSB message types) and clarify what the type contains.

Usage might then end up looking something like this:

let query_results: Vec<SsbMessageKVT> = sbot_client.get_subset(query).await?;

let message_values: Vec<SsbMessageValue> = query_results.iter().map(|kvt| kvt.into_msg_value()).collect();

let message_contents: Vec<SsbMessageContent> = message_values.iter().map(|value| value.into_typed_content()).collect();

// or maybe we know the results are going to be of type `Post`
let post_contents: Vec<Post> = message_values.iter().map(|value| value.into_post()).collect();

@notplants It's great to hear that various deserialization strategies are working and I'm stoked to see the thoughtfulness in your approach! 🖤 On a procedural note, I would prefer to keep these sorts of open discussions in issues (we could have rather continued the discussion in issue #2). Then pull-request discussions stay focused on the specific code implementation under submission. We can keep the discussion going here in this case but just noting it as a request for the future. ----- **Type names and definitions** I suggest our two main message types be named `SsbMessageKVT` and `SsbMessageValue`. `KVT` alone will still be opaque to many golgi users; prefixing both types with `SsbMessage` makes their nature clearer. To avoid unnecessary duplication, we can use the `Feed` type from kuska and give it an alias to add clarity. ```rust use kuska_ssb::feed::Feed; pub type SsbMessageKVT = Feed; ``` Regarding `SsbMessageValue`, I suggest defining `previous` as `Option<String>` and `time` as `f64` so that we don't have to depend on the `ssb_multiformats` and `ssb_legacy_msg_data` crates. The `Multihash` and `LegacyF64` types would be more relevant if we were writing a server implementation and wanted to ensure strict interoperability with other implementations. In our case, `go-sbot` handles that for us. These message types should not be defined in `src/utils.rs` (could be `src/ssb.rs` or `src/messages.rs` or something similar). **Deserialization flow** > Means you can simply deserialize a TypedMessage from content like this: > ```rust > let typed_message: TypedMessage = serde_json::from_value(ssb_message.content)?; > ``` +1 This is rad. I don't find the `TypedSsbMessageValue` approach particularly elegant. It seems like a case where we could rather implement a conversion method on `SsbMessageValue` for on-demand deserialization when greater type specificity is required. That avoids the need for an extra type which is ever-so-slightly different from another, while also following the pattern you established with `.into_post()`. So we could have `into_typed_content()` as the generic conversion implementation and `into_post()`, `into_vote()`, `into_contact()` etc. as the specific conversion implementations. I'd also consider aliasing `TypedMessage` as `SsbMessageContent` to match the `SsbMessage...` naming pattern (like a matryoshka doll of SSB message types) and clarify what the type contains. Usage might then end up looking something like this: ```rust let query_results: Vec<SsbMessageKVT> = sbot_client.get_subset(query).await?; let message_values: Vec<SsbMessageValue> = query_results.iter().map(|kvt| kvt.into_msg_value()).collect(); let message_contents: Vec<SsbMessageContent> = message_values.iter().map(|value| value.into_typed_content()).collect(); // or maybe we know the results are going to be of type `Post` let post_contents: Vec<Post> = message_values.iter().map(|value| value.into_post()).collect(); ```

notplants added 1 commit 2021-12-29 15:59:17 +00:00

ebd2604805 Clean up types

notplants added 1 commit 2021-12-29 16:32:09 +00:00

3f2f4d9d59 Fix clippy warnings

notplants changed title from ~~WIP: Discussion Of Working With KVT~~ to Add SsbKVT, SsbMessageValue and SsbMessageContent Types

2021-12-29 16:32:45 +00:00

notplants commented

2021-12-29 16:34:39 +00:00

Per our discussion, I changed the types to: SsbKvt, SsbMessageValue and SsbMessageContent,

and implemented the other comments you suggested (remove dependency on multiformat, move from utils.rs to messages.rs)

This PR is now ready for a more detailed review

Per our discussion, I changed the types to: SsbKvt, SsbMessageValue and SsbMessageContent, and implemented the other comments you suggested (remove dependency on multiformat, move from utils.rs to messages.rs) This PR is now ready for a more detailed review

notplants added 1 commit 2021-12-29 16:38:13 +00:00

ec32643407 Fix example

notplants commented

2021-12-29 16:40:23 +00:00

I also removed whoami_res_parse and publish_res_parse and replaced them with generic string_res_parse which can be used for both.

👍 1

glyph requested changes 2021-12-29 18:01:11 +00:00

glyph left a comment

I've left quite a few comments.

I'm not able to compile the code due to createLogStream issues.

Let me know when you're ready for a re-review and I'll take a second pass.

Cool to see this coming together!

I've left quite a few comments. I'm not able to compile the code due to `createLogStream` issues. Let me know when you're ready for a re-review and I'll take a second pass. Cool to see this coming together!

src/error.rs Outdated

						
				@ -36,2 +36,4 @@

				    /// JSON serialization or deserialization error.

				    SerdeJson(JsonError),

				    /// Error decoding typed ssb message from content.

				    ContentTypeDecode(String),

glyph commented

2021-12-29 17:43:51 +00:00

I'm not clear where this error variant is being used. Maybe line 50 in src/messages.rs (the from_value method)?

If so, then we should rather use the existing SerdeJson error variant which encapsulates both serialization and deserialization failures.

I'm not clear where this error variant is being used. Maybe line 50 in `src/messages.rs` (the `from_value` method)? If so, then we should rather use the existing `SerdeJson` error variant which encapsulates both serialization and deserialization failures.

notplants commented

2021-12-30 01:08:18 +00:00

I think I was using this error for something specifically, then removed it. But I needed another error like this in the next PR, specifically for when the JSON is valid (not a json serialization error) but the format of the content is invalid. Maybe needs a different name?

glyph commented

2021-12-30 10:37:45 +00:00

If the variant is not used in this PR then it's best to remove it. It's difficult for me to reason about the naming if I can't see the code it's called from.

glyph marked this conversation as resolved

src/error.rs Outdated

						
				@ -48,6 +50,7 @@ impl std::error::Error for GolgiError {

				            GolgiError::Rpc(ref err) => Some(err),

				            GolgiError::Sbot(_) => None,

				            GolgiError::SerdeJson(ref err) => Some(err),

				            GolgiError::ContentTypeDecode(ref _err) => None,

glyph commented

2021-12-29 17:44:54 +00:00

In this case you can omit the ref _err:

GolgiError::ContentTypeDecode(_) => None,

In this case you can omit the `ref _err`: `GolgiError::ContentTypeDecode(_) => None,`

👍 1

notplants marked this conversation as resolved

src/messages.rs

						
				@ -0,0 +32,4 @@

				impl SsbMessageValue {

				    /// Gets the type field of the message content if found,

				    /// and if not returns "none"

				    pub fn get_message_type(&self) -> String {

glyph commented

2021-12-29 17:49:46 +00:00

It seems like Option<String> is a better return type here.

Then L40 can just be .ok() instead of .unwrap_or_else(|| "none".to_string());

It seems like `Option<String>` is a better return type here. Then L40 can just be `.ok()` instead of `.unwrap_or_else(|| "none".to_string());`

notplants commented

2021-12-30 00:51:07 +00:00

I made some more changes here, and changed this to an enum in the next PR

notplants marked this conversation as resolved

src/messages.rs Outdated

						
				@ -0,0 +58,4 @@

				#[derive(Serialize, Deserialize, Debug)]

				#[serde(deny_unknown_fields)]

				#[allow(missing_docs)]

				pub struct SsbKVT {

glyph commented

2021-12-29 17:50:31 +00:00

I would rather name this SsbMessageKVT to match the other two types.

I would rather name this `SsbMessageKVT` to match the other two types.

👍 1

notplants marked this conversation as resolved

src/sbot.rs Outdated

						
				@ -100,0 +103,4 @@

				    /// Call the `createLogStream` RPC method and return a Vec<Message>

				    pub async fn log(&mut self) -> Result<String, GolgiError> {

				        let req_id = self.client.log_req_send().await?;

glyph commented

2021-12-29 17:55:57 +00:00

Compilation of your branch fails for me due to this line. I'm guessing you've added this in a local branch of kuska? If so, we should remove the createLogStream feature from this PR and only add that once I'm gotten the required functionality merged into kuska.

error[E0599]: no method named `log_req_send` found for struct `ApiCaller` in the current scope
   --> src/sbot.rs:106:34
    |
106 |         let req_id = self.client.log_req_send().await?;
    |                                  ^^^^^^^^^^^^ help: there is an associated function with a similar name: `get_req_send`

Compilation of your branch fails for me due to this line. I'm guessing you've added this in a local branch of `kuska`? If so, we should remove the `createLogStream` feature from this PR and only add that once I'm gotten the required functionality merged into `kuska`. ``` error[E0599]: no method named `log_req_send` found for struct `ApiCaller` in the current scope --> src/sbot.rs:106:34 | 106 | let req_id = self.client.log_req_send().await?; | ^^^^^^^^^^^^ help: there is an associated function with a similar name: `get_req_send` ```

notplants commented

2021-12-30 01:05:36 +00:00

yup thats what happened. I'll remove this from golgi for now, was really just using it as a sanity check.

notplants marked this conversation as resolved

src/utils.rs

						
				@ -26,2 +26,4 @@

				/// * `body` - An array of u8 to be parsed.

				pub fn string_res_parse(body: &[u8]) -> Result<String, GolgiError> {

				    // TODO: cleanup with proper error handling etc.

				    Ok(std::str::from_utf8(body).unwrap().to_string())

glyph commented

2021-12-29 17:57:08 +00:00

Don't forget to add an error variant to handle this properly ;)

👍 1

notplants marked this conversation as resolved

src/utils.rs

						
				@ -54,0 +104,4 @@

				/// * `f` - A function which takes in an array of u8 and returns a Result<T, GolgiError>.

				/// This is a function which parses the response from the RpcReader. T is a generic type,

				/// so this parse function can return multiple possible types (String, json, custom struct etc.)

				pub async fn get_async_until_eof<'a, R, T, F>(

glyph commented

2021-12-29 17:59:03 +00:00

Since we're dealing with a stream here and not an async RPC call we should rather name this function get_source_until_eof.

Then we're sticking with the naming convention used by go-sbot which differentiates between async calls and source (aka stream) calls.

Since we're dealing with a stream here and not an async RPC call we should rather name this function `get_source_until_eof`. Then we're sticking with the naming convention used by `go-sbot` which differentiates between async calls and source (aka stream) calls.

notplants marked this conversation as resolved

notplants added 1 commit 2021-12-30 01:10:29 +00:00

a7041e31cc Response to code review

notplants commented

2021-12-30 01:11:36 +00:00

^ incorporated the code review in that last commit.

Only change I didn't include was removing ContentTypeDecode and get_message_type, which are changed in the next PR.

^ incorporated the code review in that last commit. Only change I didn't include was removing ContentTypeDecode and get_message_type, which are changed in the next PR.

👍 1

glyph commented

2021-12-30 10:43:58 +00:00

@notplants

Thanks for the changes!

The last requested change is to remove the unused error variant and associated code (ContentTypeDecode(String)).

@notplants Thanks for the changes! The last requested change is to remove the unused error variant and associated code (`ContentTypeDecode(String)`).

notplants added 1 commit 2021-12-30 12:45:50 +00:00

028e3a47b5 Remove ContentTypeDecode error variant

notplants commented

2021-12-30 12:49:34 +00:00

Ok I've removed the error variant.

Perhaps making small PRs is a bit awkward too, if the PRs are connected and there are multiple parts interacting. Will attempt to split PRs into meaningful coherent chunks as best I can, but sometimes its a bit of a shifting target.

Ok I've removed the error variant. Perhaps making small PRs is a bit awkward too, if the PRs are connected and there are multiple parts interacting. Will attempt to split PRs into meaningful coherent chunks as best I can, but sometimes its a bit of a shifting target.

glyph commented

2021-12-30 13:42:26 +00:00

@notplants

Thanks for the removal.

Yeah I totally understand how PR's can balloon, especially in the early phases of a project where you're operating in discovery mode.

Speaking from the perspective of a PR reviewer: the smaller, the better. It's really challenging for me to hold a coherent mental model of the changes when the PR touches many files and adds a lot of new functionality. One new feature per PR might be a nice goal to aim for (with the understanding that this isn't always possible or desirable).

This short guide on authoring PR's has been helpful for me in the past.

You're clear to merge when you're ready :) 🟢

@notplants Thanks for the removal. Yeah I totally understand how PR's can balloon, especially in the early phases of a project where you're operating in discovery mode. Speaking from the perspective of a PR reviewer: the smaller, the better. It's really challenging for me to hold a coherent mental model of the changes when the PR touches many files and adds a lot of new functionality. One new feature per PR might be a nice goal to aim for (with the understanding that this isn't always possible or desirable). This [short guide on authoring PR's](https://www.atlassian.com/blog/git/written-unwritten-guide-pull-requests) has been helpful for me in the past. You're clear to merge when you're ready :) 🟢

👍 1

notplants merged commit 0fbd6f0931 into main

2021-12-30 18:01:52 +00:00

notplants referenced this issue from a commit

2021-12-30 18:01:52 +00:00

Merge pull request 'Add SsbKVT, SsbMessageValue and SsbMessageContent Types' (#6) from kvt into main

notplants deleted branch kvt

2021-12-30 18:01:53 +00:00

Sign in to join this conversation.

No reviewers