From: Matthew Wilcox <willy@infradead.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>,
John Hubbard <jhubbard@nvidia.com>,
Christopher Lameter <cl@linux.com>,
linux-rdma <linux-rdma@vger.kernel.org>,
Linux MM <linux-mm@kvack.org>, Michal Hocko <mhocko@kernel.org>
Subject: Re: [LSFMM] RDMA data corruption potential during FS writeback
Date: Mon, 21 May 2018 07:38:30 -0700 [thread overview]
Message-ID: <20180521143830.GA25109@bombadil.infradead.org> (raw)
In-Reply-To: <CAPcyv4iGmUg108O-s1h6_YxmjQgMcV_pFpciObHh3zJkTOKfKA@mail.gmail.com>
On Fri, May 18, 2018 at 08:51:38PM -0700, Dan Williams wrote:
> >> +1, and I am now super-interested in this conversation, because
> >> after tracking down a kernel BUG to this classic mistaken pattern:
> >>
> >> get_user_pages (on file-backed memory from ext4)
> >> ...do some DMA
> >> set_pages_dirty
> >> put_page(s)
> >
> > Ummm, RDMA has done essentially that since 2005, since when did it
> > become wrong? Do you have some references? Is there some alternative?
> >
> > See __ib_umem_release
> >
> >> ...there is (rarely!) a backtrace from ext4, that disavows ownership of
> >> any such pages.
> >
> > Yes, I've seen that oops with RDMA, apparently isn't actually that
> > rare if you tweak things just right.
> >
> > I thought it was an obscure ext4 bug :(
> >
> >> Because the obvious "fix" in device driver land is to use a dedicated
> >> buffer for DMA, and copy to the filesystem buffer, and of course I will
> >> get *killed* if I propose such a performance-killing approach. But a
> >> core kernel fix really is starting to sound attractive.
> >
> > Yeah, killed is right. That idea totally cripples RDMA.
> >
> > What is the point of get_user_pages FOLL_WRITE if you can't write to
> > and dirty the pages!?!
>
> You're oversimplifying the problem, here are the details:
>
> https://www.spinics.net/lists/linux-mm/msg142700.html
Suggestion 1:
in get_user_pages_fast(), mark the page as dirty, but don't tag the radix
tree entry as dirty. Then vmscan() won't find it when it's looking to
write out dirty pages. Only mark it as dirty in the radix tree once we
call set_page_dirty_lock().
Suggestion 2:
in get_user_pages_fast(), replace the page in the radix tree with a special
entry that means "page under io". In set_page_dirty_lock(), replace the
"page under io" entry with the struct page pointer.
Both of these suggestions have trouble with simultaneous sub-page IOs to the
same page. Do we care? I suspect we might as pages get larger (see also:
supporting THP pages in the page cache).
next prev parent reply other threads:[~2018-05-21 14:38 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-18 14:37 Christopher Lameter
2018-05-18 15:49 ` Jason Gunthorpe
2018-05-18 16:47 ` Christopher Lameter
2018-05-18 17:36 ` Jason Gunthorpe
2018-05-18 20:23 ` Dan Williams
2018-05-19 2:33 ` John Hubbard
2018-05-19 3:24 ` Jason Gunthorpe
2018-05-19 3:51 ` Dan Williams
2018-05-19 5:38 ` John Hubbard
2018-05-21 14:38 ` Matthew Wilcox [this message]
2018-05-23 23:03 ` John Hubbard
2018-05-21 13:37 ` Christopher Lameter
2018-05-21 13:59 ` Christopher Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180521143830.GA25109@bombadil.infradead.org \
--to=willy@infradead.org \
--cc=cl@linux.com \
--cc=dan.j.williams@intel.com \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=linux-mm@kvack.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mhocko@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox