From: Dmitry Vyukov <dvyukov@google.com>
To: Eric Wong <e@80x24.org>
Cc: workflows@vger.kernel.org,
Konstantin Ryabitsev <konstantin@linuxfoundation.org>,
Steven Rostedt <rostedt@goodmis.org>,
Thomas Gleixner <tglx@linutronix.de>,
"Theodore Ts'o" <tytso@mit.edu>,
David Miller <davem@davemloft.net>
Subject: Re: Fwd: SSB protocol thoughts
Date: Fri, 11 Oct 2019 08:20:22 +0200 [thread overview]
Message-ID: <CACT4Y+YU78dQUeFob7NXaOU-gjnKHtxpceQj2c4=2aBV0_PSxg@mail.gmail.com> (raw)
In-Reply-To: <20191010204335.GB5440@dcvr>
On Thu, Oct 10, 2019 at 10:43 PM Eric Wong <e@80x24.org> wrote:
>
> Dmitry Vyukov <dvyukov@google.com> wrote:
> > Hi,
> >
> > I've spent some time reading about SSB protocol, wrote a toy prototype
> > and played with it. I've tried to do a binary protocol and simpler
> > connection auth/encryption scheme. Here are some thoughts on the
> > protocol, how we can adopt it and associated problems. Some of the
> > problems we don't need to solve right now, but resolving others seem
> > to be required to get anything working. I am probably also
> > overthinking some aspects, I will appreciate if you can stop me from
> > doing this :)
>
> <snip>
>
> > 4. DoS protection.
> > Taking into account this completely distributed and p2p nature of
> > everything, it becomes very easy to DoS the system with new users (one
> > just needs to generate a key pair), with lots of messages from a
> > single user, or both. And then these messages will be synced to
> > everybody. Eventually we will need some protection from DoS. Not that
> > it's not a problem for email, but it's harder to create trusted email
> > accounts and email servers have some DoS/spam protections. If we move
> > from email, it will become our responsibility.
>
> Right, every p2p or federated messaging system will have the
> same problems email has with spam, flooding and/or eventual
> centralization if it becomes popular.
>
> There can't be a forced migration on anybody. Using git isn't
> even a requirement for kernel development, after all.
>
> Instead of introducing a new system with the same problems as
> the old one, I still believe we can improve on the old one...
>
> > 5. Corrupted feeds.
> > Some feeds may become corrupted (intentionally or not). Intentionally
> > it's actually trivial to do -- if you are at message sequence 10, you
> > push 2 different but correctly signed message sequence 11 into
> > different parts of the p2p system. Then there is no way the whole
> > system will agree and recover on its own from this. Different parts
> > will continue pushing to each other message 11 and 11', concluding
> > that the other one is invalid and rejecting it.
> > Konstantin also mentioned the possibility of injecting some illegal
> > content into the system, and then it will become "poisoned".
> > The system needs to continue functioning in the presence of corrupted feeds.
> > A potential solution: periodically scan major pubs, detect
> > inconsistencies and corrupted feeds and publish list of such feeds.
> > E.g. "feed X is bad after message 42: drop all messages after that,
> > don't accept new and don't spread them". This may also help recovering
> > after a potential DoS.
> > However this may have implications on application-level. Consider you
> > reply to a comment X on a patch review, and later message with comment
> > X is dropped from the system.
>
> Yup.
>
> > If we get to this point, then it seems to me we already have an email
> > replacement that is easier to setup, does not depend on any
> > centralized providers, properly authenticated and with strong user
> > identities.
>
> I'm not sure we can get past points 4., 5. or 8. and 14., easily
>
> > Some additional points related to the transport layer:
> >
> > 6. I would consider compressing everything on the wire and on disk
> > with gzip/brotli.
> > I don't see any mention of compression in SSB layers, but it looks
> > very reasonable to me. Why not? At least something like brotli level 3
> > sounds like a pure win, we will have lots of text.
>
> Not sure about brotli, aside from the fact that it's less
> popular and available than zlib, adding to installation
> overhead.
>
> Does SSB hold persistent connections?
>
> Per-connection zlib contexts has a huge memory overhead.
>
> I got around it for NNTP COMPRESS by sharing the zlib context,
> saving a lot of RAM (at the cost of less-efficient compression):
> https://public-inbox.org/meta/20190705225339.5698-5-e@80x24.org/#Z30lib:PublicInbox:NNTPdeflate.pm
>
> <snip>
>
> > 8. Somebody mentioned a possibility of partial syncs (if the total
> > amount of data becomes too large, one may want to not download
> > everything for a new replica).
> > I hope we can postpone this problem until we actually have it.
> > Hopefully it's solvable retrospectively. For now I would say:
> > everybody fetches everything, in the end everybody fetches multiple
> > git repos in its entirety (shallow checkouts are not too useful).
>
> Right, this is a problem with git transports, too.
>
> Client tools for NNTP->(Maildir|POP3) and HTTP search->mboxrd.gz
> results can get around that for email so users can only download
> what they want.
>
> NNTP->POP3 would be an excellent way for kernel.org to get
> around delivery problems to big mail services since
> they all offer POP3 importers :)
>
> <snip>
>
> > 14. Consistency.
> > Consider there is a bug/issue and 2 users post conflicting status
> > updates concurrently. As these updates propagate through the system,
> > it's hard to achieve consistent final state. At least I fail to see
> > how it's possible in a reasonable manner. As a result some peers may
> > permanently disagree on the status of the bug.
> > May also affect patch reviews, if one user marks a patch as "abandon"
> > and another sets some "active" state. Then a "local patchwork" may
> > show it as open/pending for one user and closed/inactive for another.
> > May be even worse for some global configuration/settings data,
> > disagreement/inconsistency on these may be problematic.
> > There is a related problem related to permission revocations. Consider
> > a malicious pub that does not propagate a particular "permission
> > revocation" message. For the rest of participants everything looks
> > legit, they still sync with the pub, get other messages, etc, it's
> > just as if the peer did not publish the revocation message at all. As
> > the result the revocation message will not take effect arbitrary long.
> > These problems seem to be semi-inherent to the fully distributed system.
>
> Yep. Email has this problem with lost/blocked/bounced messages, too.
>
> > The only practical solution that I see is to ignore the problem and
> > rely that everybody gets all messages eventually, messages take effect
> > when you receive them and in that order, and that some inconsistencies
> > are possible but we just live with that. However, it's a bit scary to
> > commit to theoretical impossibility of any 100% consistent state in
> > the system...
> > I see another potential solution, but it's actually half-centralized
> > and suffers from SPOF. When a user issues "bug update" message that is
> > just a request to update state, it's not yet committed. Then there is
> > a centralizer server that acknowledges all such requests and assigns
> > them totally ordered sequence numbers (i.e. "I've received message X
> > first, so I assign it number 1", "then I received message Y and it
> > becomes number 2"). This ordering dictates the final globally
> > consistent state. This scheme can be used for any other state that
> > needs consistency, but it's a centralized server and SPOF, if it's
> > down the requests are still in the system but they are not committed
> > and don't have sequence numbers assigned.
> > Obviously, all of this become infinitely simpler if we have a "forge"
> > solution...
> >
> > Kudos if you are still with me :)
>
> :>
>
> Anything about bridging with email?
Thanks for confirming some of my fears :)
I wrote a bit about email/github bridges here:
https://lore.kernel.org/workflows/d6e8f49e93ece6f208e806ece2aa85b4971f3d17.1569152718.git.dvyukov@google.com/
But mainly it just says that the bridges should be possible. Do you
foresee any potential problems with that?
next prev parent reply other threads:[~2019-10-11 6:20 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CACT4Y+Y0_2rCnt3p69V2U2_F=t4nMOmAOL-RGwxSS-ufk41NAg@mail.gmail.com>
2019-10-10 17:39 ` Dmitry Vyukov
2019-10-10 20:43 ` Eric Wong
2019-10-11 6:20 ` Dmitry Vyukov [this message]
2019-10-13 23:19 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CACT4Y+YU78dQUeFob7NXaOU-gjnKHtxpceQj2c4=2aBV0_PSxg@mail.gmail.com' \
--to=dvyukov@google.com \
--cc=davem@davemloft.net \
--cc=e@80x24.org \
--cc=konstantin@linuxfoundation.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tytso@mit.edu \
--cc=workflows@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox