From: Dan Williams <dan.j.williams@intel.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Christopher Lameter <cl@linux.com>,
linux-rdma <linux-rdma@vger.kernel.org>,
Linux MM <linux-mm@kvack.org>, Michal Hocko <mhocko@kernel.org>
Subject: Re: [LSFMM] RDMA data corruption potential during FS writeback
Date: Fri, 18 May 2018 13:23:50 -0700 [thread overview]
Message-ID: <CAPcyv4i_W94iXCyOd8gSSU6kWscncz5KUqnuzZ_RdVW9UT2U3w@mail.gmail.com> (raw)
In-Reply-To: <20180518173637.GF15611@ziepe.ca>
On Fri, May 18, 2018 at 10:36 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Fri, May 18, 2018 at 04:47:48PM +0000, Christopher Lameter wrote:
>> On Fri, 18 May 2018, Jason Gunthorpe wrote:
>>
>> > > The solution that was proposed at the meeting was that mmu notifiers can
>> > > remedy that situation by allowing callbacks to the RDMA device to ensure
>> > > that the RDMA device and the filesystem do not do concurrent writeback.
>> >
>> > This keeps coming up, and I understand why it seems appealing from the
>> > MM side, but the reality is that very little RDMA hardware supports
>> > this, and it carries with it a fairly big performance penalty so many
>> > users don't like using it.
>>
>> Ok so we have a latent data corruption issue that is not being addressed.
>>
>> > > But could we do more to prevent issues here? I think what may be useful is
>> > > to not allow the memory registrations of file back writable mappings
>> > > unless the device driver provides mmu callbacks or something like that.
>> >
>> > Why does every proposed solution to this involve crippling RDMA? Are
>> > there really no ideas no ideas to allow the FS side to accommodate
>> > this use case??
>>
>> The newcomer here is RDMA. The FS side is the mainstream use case and has
>> been there since Unix learned to do paging.
>
> Well, it has been this way for 12 years, so it isn't that new.
>
> Honestly it sounds like get_user_pages is just a broken Linux
> API??
>
> Nothing can use it to write to pages because the FS could explode -
> RDMA makes it particularly easy to trigger this due to the longer time
> windows, but presumably any get_user_pages could generate a race and
> hit this? Is that right?
>
> I am left with the impression that solving it in the FS is too
> performance costly so FS doesn't want that overheard? Was that also
> the conclusion?
>
> Could we take another crack at this during Linux Plumbers? Will the MM
> parties be there too? I'm sorry I wasn't able to attend LSFMM this
> year!
Yes, you and hch were missed, and I had to skip the last day due to a
family emergency.
Plumbers sounds good to resync on this topic, but we already have a
plan, use "break_layouts()" to coordinate a filesystem's need to move
dax blocks around relative to an active RDMA memory registration. If
you never punch a hole in the middle of your RDMA registration then
you never incur any performance penalty. Otherwise the layout break
notification is just there to tell the application "hey man, talk to
your friend that punched a hole in the middle of your mapping, but the
filesystem wants this block back now. Sorry, I'm kicking you out. Ok,
bye.".
In other words, get_user_pages_longterm() is just a short term
band-aid for RDMA until we can get that infrastructure built. We don't
need to go down any mmu-notifier rabbit holes.
next prev parent reply other threads:[~2018-05-18 20:23 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-18 14:37 Christopher Lameter
2018-05-18 15:49 ` Jason Gunthorpe
2018-05-18 16:47 ` Christopher Lameter
2018-05-18 17:36 ` Jason Gunthorpe
2018-05-18 20:23 ` Dan Williams [this message]
2018-05-19 2:33 ` John Hubbard
2018-05-19 3:24 ` Jason Gunthorpe
2018-05-19 3:51 ` Dan Williams
2018-05-19 5:38 ` John Hubbard
2018-05-21 14:38 ` Matthew Wilcox
2018-05-23 23:03 ` John Hubbard
2018-05-21 13:37 ` Christopher Lameter
2018-05-21 13:59 ` Christopher Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAPcyv4i_W94iXCyOd8gSSU6kWscncz5KUqnuzZ_RdVW9UT2U3w@mail.gmail.com \
--to=dan.j.williams@intel.com \
--cc=cl@linux.com \
--cc=jgg@ziepe.ca \
--cc=linux-mm@kvack.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mhocko@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox