From: Hillf Danton <hdanton@sina.com>
To: John Hubbard <jhubbard@nvidia.com>
Cc: Hillf Danton <hdanton@sina.com>, linux-mm <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Vlastimil Babka <vbabka@suse.cz>, Jan Kara <jack@suse.cz>,
Mel Gorman <mgorman@suse.de>, Jerome Glisse <jglisse@redhat.com>,
Dan Williams <dan.j.williams@intel.com>,
Ira Weiny <ira.weiny@intel.com>, Christoph Hellwig <hch@lst.de>,
Jonathan Corbet <corbet@lwn.net>
Subject: Re: [RFC] mm: gup: add helper page_try_gup_pin(page)
Date: Mon, 4 Nov 2019 12:34:20 +0800 [thread overview]
Message-ID: <20191104043420.15648-1-hdanton@sina.com> (raw)
In-Reply-To: <20191103112113.8256-1-hdanton@sina.com>
On Sun, 3 Nov 2019 12:20:10 -0800 John Hubbard wrote:
> On 11/3/19 3:21 AM, Hillf Danton wrote:
> >
> > A helper is added for mitigating the gup issue described at
> > https://lwn.net/Articles/784574/. It is unsafe to write out
> > a dirty page that is already gup pinned for DMA.
> >
> > In the current writeback context, dirty pages are written out with
> > no detecting whether they have been gup pinned; Nor mark to keep
> > gupers off. In the gup context, file pages can be pinned with other
> > gupers and writeback ignored.
> >
> > The factor, that no room, supposedly even one bit, in the current
> > page struct can be used for tracking gupers, makes the issue harder
> > to tackle.
>
> Well, as long as we're counting bits, I've taken 21 bits (!) to track
> "gupers". :) More accurately, I'm sharing 31 bits with get_page()...please
Would you please specify the reasoning of tracking multiple gupers
for a dirty page? Do you mean that it is all fine for guper-A to add
changes to guper-B's data without warning and vice versa?
> see my recently posted patchset for tracking dma-pinned pages:
>
> https://lore.kernel.org/r/20191030224930.3990755-1-jhubbard@nvidia.com
>
> Once that is merged, you will have this available:
>
> static inline bool page_dma_pinned(struct page *page);
>
> ...which will reliably track dma-pinned pages.
>
> After that, we still need to convert a some more call sites (block/bio
> in particular) to the new pin_user_pages()/put_user_page() system, in
> order for filesystems to take advantage of it, but this approach has
> the advantage of actually tracking such pages, rather than faking it by
> hoping that there is only one gup caller at a time.
>
>
> >
> > The approach here is, because it makes no sense to allow a file page
> > to have multiple gupers at the same time, looking to make gupers
>
> ohhh...no, that's definitely not a claim you can make.
>
>
> > mutually exclusive, and then guper's singulairty helps to tell if a
> > guper is existing by staring at the change in page count.
> >
> > The result of that sigularity is not yet 100% correct but something
> > of "best effort" as the effect of random get_page() is perhaps also
> > folded in it.
> > It is assumed the best effort is feasible/acceptable in practice
> > without the the cost of growing the page struct size by one bit,
> > were it true that something similar has been applied to the page
> > migrate and reclaim contexts for a while.
> >
> > With the helper in place, we skip writing out a dirty page if a
> > guper is detected; On gupping, we give up pinning a file page due
> > to writeback or losing the race to become a guper.
> >
> > The end result is, no gup-pinned page will be put under writeback.
>
> I think you must have missed the many contentious debates about the
> tension between gup-pinned pages, and writeback. File systems can't
> just ignore writeback in all cases. This patch leads to either
> system hangs or filesystem corruption, in the presence of long-lasting
> gup pins.
The current risk of data corruption due to writeback with long-lived
gup references all ignored is zeroed out by detecting gup-pinned dirty
pages and skipping them; that may lead to problems you mention above.
Though I doubt anything helpful about it can be expected from fs in near
future, we have options for instance that gupers periodically release
their references and re-pin pages after data sync the same way as the
current flusher does.
> Really, this won't work. sorry.
>
>
> thanks,
>
> John Hubbard
> NVIDIA
next prev parent reply other threads:[~2019-11-04 4:34 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-03 11:21 Hillf Danton
2019-11-03 20:20 ` John Hubbard
2019-11-04 4:34 ` Hillf Danton [this message]
2019-11-04 6:09 ` John Hubbard
2019-11-04 8:13 ` Jan Kara
2019-11-04 10:20 ` Hillf Danton
2019-11-04 19:03 ` Jerome Glisse
2019-11-05 8:56 ` David Hildenbrand
2019-11-05 4:27 ` Hillf Danton
2019-11-05 15:54 ` Jerome Glisse
2019-11-06 9:22 ` Hillf Danton
2019-11-06 15:46 ` Jerome Glisse
2019-11-07 9:50 ` Hillf Danton
2019-11-07 14:57 ` Jerome Glisse
2019-11-08 9:38 ` Hillf Danton
2019-11-08 13:59 ` Jerome Glisse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191104043420.15648-1-hdanton@sina.com \
--to=hdanton@sina.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=dan.j.williams@intel.com \
--cc=hch@lst.de \
--cc=ira.weiny@intel.com \
--cc=jack@suse.cz \
--cc=jglisse@redhat.com \
--cc=jhubbard@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox