Solpunk Pub Failing To Connect To Patchwork #134
Labels
No Label
bug
documentation
duplicate
enhancement
help wanted
invalid
maintenance
peach-lib
peach-network
peach-oled
peach-stats
peach-web
question
refactor
wontfix
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: PeachCloud/peach-workspace#134
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
@abekonge.punk is running peachpub on yunohost on a raspberry pi 3 at solpunk (%/Xt6yV60kKuZHOk1jblAgY3yAAwqcSY9MYjaqjvug+s=.sha256)
the first couple days it worked, and now its running into some issues
from @abekonge.punk
I wonder if this might be the issue https://github.com/ssbc/go-ssb/issues/81
I would try running this on the pi:
the creation of the symbolic link above is necessary because currently there is no way to tell sbotcli where the .ssb-go folder is (https://github.com/ssbc/go-ssb/issues/151)
You can see all the commands you can run with sbotcli (a go cli tool packaged with go-ssb) via /var/www/peachpub/sbotcli -h (although they are not fully documented)
some other debugging commands to try:
systemctl status peachpub-go-sbot
… if peachpub-go-sbot is running, but peach-web says it can’t connect to the database
perhaps we can try the repair command:
cc @glyph , in case you have any additional ideas or just want to follow along ... unclear yet if the issue is from something in go-ssb, or peachpub or the yunohost packaging
this itself is not necessarily a bug, as this number is actually the number of messages authored by the pub (including follow messages etc.) … but maybe this could be communicated more clearly in the UI… and a number which showed the total number of messages from all users also sounds very useful…
Hey @notplants and @abekonge.punk, thanks for the ping.
Echoing the advice from @notplants, I would first check the status of the go-sbot:
systemctl status peachpub-go-sbot
Assuming there is an error which is preventing replication, I'd run the go-sbot with the repair flag enabled. I include the
--repair
flag as default whenever I run go-sbot (I do this in a systemd unit file, though I understand the YunoHost context is different); if there are no repairs to be done then it simply starts up as normal.I wonder what the hops count is set to? It might help to set it to
1
initially so that the Pi doesn't have to work too hard. That being said, the hops count is not related to the replication / connection error - just adding that as an extra thought.@notplants @glyph
Okay, just to check where we were before linking, I checked the folders, and there was no .sbot-go link in
/root/
And
/var/www/peachpub/sbotcli call whoami
gaveAfter creating the link
/var/www/peachpub/sbotcli call whoami
gave:Tried restarting sbot from peachpub ui, db still down, whoami had same error.
Tried rebooting the pi, same results
Some observations of weirdness:
socket
directory in the.ssb-go
folder.ssb-go
folder has a.ssb-go
folder inside it, with some of the same files:contents of .ssb-go:
contents of .ssb-go/.ssb-go/:
systemctl status peachpub-go-sbot
gives
The repair command does give an error, not sure if it is significant:
But after restart, systemctl status is all green
db is still unavailable though :(
Ok, so .ssb-go was owned by root, after fixing that, the repair went better:
Still has that error, and db is still offline
Thanks for looking into it. I tried setting hops to 1, repairing, and restarting, still no dice though.
@abekonge
Thanks for the additional info.
This is indeed very weird, especially the nested
.ssb-go
directory. @notplants is the expert when it comes to YunoHost configuration of PeachPub. @notplants, can you think of any reason why the nested directory might have been created?I hesitate to recommend this without further investigation but it might be worth deleting the inner
.ssb-go
directory (so that only the top-level directory remains). But let's wait until @notplants has a chance to respond.Other thoughts:
It's a promising sign that the
go-sbot
service is active and running. The problem seems to be the communication between the peach application and the go-sbot. TheDATABASE UNAVAILABLE
text appears if the application fails in an attempt to get the latest sequence number for the local feed from the go-sbot. It calls the following methods to do that:c83a22461d/peach-web/src/utils/sbot.rs (L126)
So a failure to execute any of those sbot methods will result in
DATABASE UNAVAILABLE
.I'm going to give this some more thought and try to come up with some debugging advice.
@abekonge
What do you see when you go to the Profile page of the PeachCloud UI?
@glyph Thanks for looking into it!
@abekonge
I have a hypothesis:
The nested
.ssb-go
directories mean that there are twosecret
files, each one with a different public-private keypair. The go-sbot is running with one keypair and PeachCloud is trying to connect with the other keypair. This could explain why you received an error when trying to runsbotcli whoami
: the keypair being used to make the connection is not authorised to call that method.There is further evidence in the logs you shared:
Notice the public key:
@oY8aLV/yEKs75d6TknxlwkhfiThB9HlN5FQY7WWTzho=.ed25519
Later, when you ran the
go-sbot -fsck sequences -repair
command, a different public key appeared in the output:saved identity @Uzcv1AqCgXB... to .ssb-go/secret
You can confirm this by comparing the contents of
.ssb-go/secret
and.ssb-go/.ssb-go/secret
. Are they the same or different?Important: make sure NOT to post the contents here! We don't want to reveal your private key(s) to the world.
I messaged this to @abekonge on ssb, but switching to here, so its all in this thread.
yup I would echo what glyph said to compare .ssb-go/secret and .ssb-go/.ssb-go/secret and see which has the pub key of the pub. If the inner one is a different pub key than the pub, then I would delete the inner folder.
After that, I would confirm
... if not green, I would try
... and see what logs you get there by running go-sbot directly
... and running /var/www/peachpub/sbotcli call whoami in a separate terminal
... if peachpub-go-sbot is green, then I would repeate the test again (now that there are not two .ssb-go folders)
That whoami call is run as root, that's why we needed to make the symbolic link before. Currently sbotcli can't be configured to tell it where .ssb-go, and just looks in ~/.ssb-go, so we trick it with the symbolic link.
ln -s /home/yunohost.app/peachpub/.ssb-go /root/.ssb-go
... I would also confirm that the symbolic link goes to the outer .ssb-go and not to the inner (I'm not totally sure what could have created the inner)
... if peachpub-go-sbot is green, there is no inner .ssb-go folder, and whoami is still giving an error... we still have a mystery
ps: sorry we don't have more streamlined tools for log reporting and debugging, but thanks for the patience. also still up for hopping on a call if this gets too messy
@notplants They are different!
I'm not sure how to see which public key is the pub's? The ui can't see it itself, in the profile view at least .... and I do not yet have a mental picture of what the difference between sbot-go and peachcloud pub is .... I get it that the pub is a ssb-profile like anyother ... and sbot-go is the ssb-server doing all the things, but where is the peachpubs ".ssb" folder or equivalent?
This is an interesting problem. PeachCloud usually queries the sbot to find out the key and then displays it in the web interface. In this case, since the communication between PeachCloud and the sbot is not working, it can't display the key 😅
I looked over on Patchwork and there is an identity named
solpunk pub
with the following public key:@oY8aLV/yEKs75d6TknxlwkhfiThB9HlN5FQY7WWTzho=.ed25519
One request I have: @abekonge, could you open the homepage of the PeachCloud interface and then go Settings -> Scuttlebutt -> Configure Sbot and tell me what the
DATABASE DIRECTORY
input contains?EDIT: I figured it out, see here: #134 (comment) if you do not want to follow my process minutely :)
Of course, through patchwork! Ok the outer .ssb-go secret is the same as solpunk pub.
db dir is
.ssb-go
This took quite a while, like 3 minutes, but afterwards it was green
this still gives the warn and error:
Okay, I tried repair and fsck again, and now I can't start it, and inner .ssb-go IS BACK.
running just sbot-go gives
I'll try to find out which thing makes the evil inner .ssb-go :)
Ok, deleting new new inner .ssb-go, and running
creates a new inner .ssb-go ?!?
no worries, I love bug-fixing with expert support. And async fits me well right now, with my family and a bit under the weather - but if we get stuck, sure lets make a call ...
I wonder if the creation is an artifact of me running
inside the
/home/yunohost.app/peachpub/.ssb-go/
folder.In the config.toml the repo is:
So maybe it looks for .ssb-go, and not finding one in the relative context, creates a new one ... I'll test it by running it from the parent folder
/home/yunohost.app/peachpub/
Ok that worked, now go-sbot is crunching!
So it seems that what happened was that when trying to run the repair command from the .ssb-go folder, it looked for a .ssb-go subfolder, did not find one and went right ahead and created one :)
Now I'll try to see if I can get it to run as system service and see how it goes.
ok after 30 minutes, my pipe was broken, and the go-sbot was stopped along with my ssh session.
now going back I can start the service, and get green status, but db is still unavailable and profile gives.
Stopping and trying the repair command gives a new error:
Ok, running go-sbot again, quitting, running the repair again, seemed to work
But db is still unavail. etc.
ok `systemctl start peachpub-go-sbot´ creates an inner .ssb-go ...
running go-sbot after repair works, and whoami works:
Ok I tried changing the peachoub ui
DATABASE DIRECTORY
to the absolute path of/home/yunohost.app/peachpub/.ssb-go
but that fails to start with systemctl
also relative paths like
./
and../.ssb-go
fails to start. Systemctl logs saysI like the exit code:
exit code
, hahahaI feel like I'm missing something:
so go-sbot works, whoami works.
But when I start the service, that peachpub ui depends on, it creates a new inner .ssb-go which has another secret ... and things does not work .... not sure what to do next - do you have ideas?
ahahahahaha, it works!
Ok, so I found out that go-ssb would make a new .ssb-go in any location it was run, so I figured that the systemd settings would point to the wrong dir somehow, so I went to the systemd unit file:
/etc/systemd/system/peachpub-go-sbot.service
and changed the followingto
I also found the log files, which was helpfull, because when I tried to start, it failed as it did earlier when I changed the path to the db to be absolute:
StandardOutput=append:/var/log/peachpub/peachpub-go-sbot.log
Now when starting and failing, it logged this:
chowning that, it now gave
chowning that as well, and it starts, and peachpub ui runs - yay!
Now as to how these files both got root:root, I have no idea.
And whether the working dir should be changed, I'm not sure.
Also, should go-ssb create a new database when it does not find one?
If there is something I can do to help this not be a problem in the future, I'll be willing to help - this was fun!
Okay, now I have hit the original problem maybe.
So after everything works, I open up patchwork on my machine, solpunk pub shows up as a connection, and it start indexing, the little green circle on the solpunk pub connection fills out.
And then it disappears from patchwork, and the pi is now unresponsive, the green led is on, all the time. I left it overnight, and it was still green, and unresponsive, so had to restart the pi.
Afterwards, I checked that peachpub was up in the ui, and that worked, but not showing up in patchwork.
I stopped go-sbot, repaired the feed, started again manually, and it connected to patchwork and started the indexing and froze yet again.
The following is the output from go-sbot when going into freeze:
So somehow, when trying to sync with my patchwork go-sbot dies or goes into an infinite loop or something?
Wow, thanks @abekonge, this is some A-grade detective work! I really appreciate you tackling the problem methodically.
Seems like there are a few issues at play. One is essentially a configuration issue: an incorrect
WorkingDirectory
path in the unit file and incorrect permissions for some of the files in.ssb-go
.The second issue, the one which is crashing the go-sbot during replication, appears to be related to memory usage. See this issue comment:
https://github.com/ssbc/go-ssb/issues/124#issuecomment-1078357758
I noticed the
memtables
error in one of the logs you posted:go-sbot: failed to instantiate ssb server: while opening memtables error: while opening fid: 1 error: while updating skiplist error: mremap size mismatch: requested: 3007282 got: 134217728
It seems that
@boreq
has a fix for the issue but we'll have to investigate further.My current thinking is that the files became
root:root
when the symlink was created andgo-sbot
was run asroot
. Any files created by thego-sbot
would then haveroot:root
permissions. So I think this particular issue was a side-effect of the debugging process.I think @notplants should have some insight on this one. Based on your experience, it seems the working dir should indeed be changed.
This is the default behaviour and I don't think it's possible to disable it "from the outside". The best we can do is look out for this dual-directory issue in the future.
One thing which puzzles me: the device was originally working and replicating with Patchwork. I suspect the sbot then encountered the
memtables
error and crashed (no longer showing up in local connections of Patchwork). Then, somehow, things got all mixed up. I think part of this was the symlink + root execution, but I'm not sure why the working directory was initially fine but then later it was not?Are you willing and able to share the logfile?
/var/log/peachpub/peachpub-go-sbot.log
Then I can look through the full sequence of events.
Sure, how do you want it?
You could upload it here if it's not too large?
here you go
@abekong cool to read, was a suspensefull report. I was right there with you with your excitement when you figured it out and got it working again
I reach the same conclusions as @glyph.
two issues, the original issue, which seems to be an issue with go-ssb, and we might not be able to fix without some help from our go friends.
the second issue, which was a side-effect of the debugging process. I'm not totally sure how the WorkingDirectory + database path configuration could have been originally working, and then switched to something not working.
in my peachpub running on yunohost on a digital ocean droplet,
/home/yunohost.app/peachpub/.ssb-go/config.toml
has an absolute path to the .ssb-go folder
repo = "/home/yunohost.app/peachpub/.ssb-go"
this absolute path is also whats initially configured in the yunohost package install (https://github.com/YunoHost-Apps/peachpub_ynh/blob/master/conf/config.toml),
so my best guess right now is that when you changed the hops to 1, it also inadvertently changed the repo= to be a relative path of just ".ssb-go", which didn't work with the rest of the configuration settings, and thus required that WorkingDirectory change to accomodate
I will try to confirm this, and possibly see if the go-sbot configuration page might not be reading the already set values and supplying its own default, or something like that
and then the root:root issue that is just that while debugging we ran as root. so after the debugging process, we always need to run chown -R peachpub:peachpub of relevant directories (and this is useful to remember for any future debugging)
so in good news, assuming my guess is correct, through the debugging process we uncovered one specific bug we can fix in peachpub
but the root issue may not be immediately solveable. still really appreciate your trying it out and such detailed debugging. this is also another useful datapoint for plan making
maybe beyond the scope of this thread, but the possible paths I imagine,
... * I guess because you were seeing the inner .ssb-go before changing the hops, that might not have been when the value of repo was changed to be relative ... but maybe at some point that value was changed to be relative for some reason ...
@notplants
Thanks for your assistance with this. I certainly learned a lot from this exercise and it's great that we have a couple of targeted areas to improve on when it comes to PeachCloud.
Good thinking! I hadn't thought of that.
Do you know when you (or I) compiled the version of
go-sbot
that is being used in the YunoHost application? I notice that@cryptix
merged a change togo-sbot
from@boreq
on 12 April which may improve the memory situation slightly (PR link).Would love to have a call dedicated to exploring this together. I'm almost finished writing the lykin tutorial series and then I'll have time to work on PeachCloud again.
Cool, happy that this exercise had some value :)
I noticed that the other files had peachpub:www-data, is that significant?
And do you have ideas to how I can get this pub working again?
I'm up for resetting/reinstalling, if that is what it takes...
@abekonge, I think both peachpub:www-data or peachpub:peachpub would work as permissions for the files. go-sbot and peach-web both run as peachpub user in the yunohost configuration, so its not so important (I believe)
to get the pub working again,
one could uninstall peachpub via yunohost and then reinstall it -- I think this will work, and just give you a fresh install (with a new public key)
however, based on my understanding of this bug, unfortunately I guess that it will likely run into the same error condition at some point again
could reinstall and try again if you have the spoons and see if it reaches the same error condition... but may need to wait for an update to the underlying sbot for a real/stable working fix ... once the update comes though, I should be able to publish a new package to yunohost with the new binary, and then you should be able to simply update peachpub through the admin UI to get the new working code ... I will keep you updated... but from my perspective I can't say if that could be a few weeks, or a few months...
I will also look into what glyph suggested, and try to confirm the current peachpub package includes the latest go-sbot code at the very least
Epic reporting and back & forth debugging folks. I'm in awe.
Asking in
golang ssb general
(https://matrix.to/#/#golang-ssb-general:autonomic.zone) about builds for scuttlego.Interesting idea...
Summary of scuttlego pub possibilities right now is that there is now CLI code in scuttlego that allows us to run it as a pub. boreq is saying there would need to be some things changed inside the lib to make that possible and it is not clear how to do it right now. I've asked to be let known when things might change or if it is possible for me to take a stab at writing one... more in the matrix chat as it comes in i guess...
nice recon, seems like a promising direction ~ https://github.com/planetary-social/scuttlego/issues/54
Ok I'll try resetting. A question: the peachpub @notplants run, you do not run into this issue?
ASIDE: Hanging as hardware-installation protocol is teh best
@abekonge the peachpub I've been running is on a digital ocean droplet, and it hasn't run into this issue, probably because its more powerful hardware
my peachpub on digital ocean droplet still has the issue of not connecting with manyverse, but not this issue your seeing, and has otherwise been working
I had only tested running on a raspberry pi before with test accounts, and not a full amount of data, and naivelly assumed it was working, so I hadn't run into this issue myself yet
but I'm hopeful one way or another we'll be able to get a stable version at some point, and then share the update... so this may be currently more of a taster/speculative-motivator, than the real thing