Re: [RFC] mm: gup: add helper page_try_gup_pin(page)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Hillf Danton <hdanton@sina.com>
To: John Hubbard <jhubbard@nvidia.com>
Cc: Hillf Danton <hdanton@sina.com>, linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>, Jan Kara <jack@suse.cz>,
	Mel Gorman <mgorman@suse.de>, Jerome Glisse <jglisse@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Ira Weiny <ira.weiny@intel.com>, Christoph Hellwig <hch@lst.de>,
	Jonathan Corbet <corbet@lwn.net>
Subject: Re: [RFC] mm: gup: add helper page_try_gup_pin(page)
Date: Mon,  4 Nov 2019 12:34:20 +0800	[thread overview]
Message-ID: <20191104043420.15648-1-hdanton@sina.com> (raw)
In-Reply-To: <20191103112113.8256-1-hdanton@sina.com>


On Sun, 3 Nov 2019 12:20:10 -0800 John Hubbard wrote:
> On 11/3/19 3:21 AM, Hillf Danton wrote:
> > 
> > A helper is added for mitigating the gup issue described at
> > https://lwn.net/Articles/784574/. It is unsafe to write out
> > a dirty page that is already gup pinned for DMA.
> > 
> > In the current writeback context, dirty pages are written out with
> > no detecting whether they have been gup pinned; Nor mark to keep
> > gupers off. In the gup context, file pages can be pinned with other
> > gupers and writeback ignored.
> > 
> > The factor, that no room, supposedly even one bit, in the current
> > page struct can be used for tracking gupers, makes the issue harder
> > to tackle.
> 
> Well, as long as we're counting bits, I've taken 21 bits (!) to track
> "gupers". :)  More accurately, I'm sharing 31 bits with get_page()...please

Would you please specify the reasoning of tracking multiple gupers
for a dirty page? Do you mean that it is all fine for guper-A to add
changes to guper-B's data without warning and vice versa?

> see my recently posted patchset for tracking dma-pinned pages:
> 
> https://lore.kernel.org/r/20191030224930.3990755-1-jhubbard@nvidia.com
> 
> Once that is merged, you will have this available:
> 
>    static inline bool page_dma_pinned(struct page *page);
> 
> ...which will reliably track dma-pinned pages.
> 
> After that, we still need to convert a some more call sites (block/bio 
> in particular) to the new pin_user_pages()/put_user_page() system, in 
> order for filesystems to take advantage of it, but this approach has 
> the advantage of actually tracking such pages, rather than faking it by 
> hoping that there is only one gup caller at a time.
> 
> 
> > 
> > The approach here is, because it makes no sense to allow a file page
> > to have multiple gupers at the same time, looking to make gupers
> 
> ohhh...no, that's definitely not a claim you can make.
> 
> 
> > mutually exclusive, and then guper's singulairty helps to tell if a
> > guper is existing by staring at the change in page count.
> > 
> > The result of that sigularity is not yet 100% correct but something
> > of "best effort" as the effect of random get_page() is perhaps also
> > folded in it.
> > It is assumed the best effort is feasible/acceptable in practice
> > without the the cost of growing the page struct size by one bit,
> > were it true that something similar has been applied to the page
> > migrate and reclaim contexts for a while.
> > 
> > With the helper in place, we skip writing out a dirty page if a
> > guper is detected; On gupping, we give up pinning a file page due
> > to writeback or losing the race to become a guper.
> > 
> > The end result is, no gup-pinned page will be put under writeback.
> 
> I think you must have missed the many contentious debates about the
> tension between gup-pinned pages, and writeback. File systems can't
> just ignore writeback in all cases. This patch leads to either
> system hangs or filesystem corruption, in the presence of long-lasting
> gup pins.

The current risk of data corruption due to writeback with long-lived
gup references all ignored is zeroed out by detecting gup-pinned dirty
pages and skipping them; that may lead to problems you mention above.

Though I doubt anything helpful about it can be expected from fs in near
future, we have options for instance that gupers periodically release
their references and re-pin pages after data sync the same way as the
current flusher does.

> Really, this won't work. sorry.
> 
> 
> thanks,
> 
> John Hubbard
> NVIDIA

next prev parent reply	other threads:[~2019-11-04  4:34 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-03 11:21 Hillf Danton
2019-11-03 20:20 ` John Hubbard
2019-11-04  4:34 ` Hillf Danton [this message]
2019-11-04  6:09   ` John Hubbard
2019-11-04  8:13     ` Jan Kara
2019-11-04 10:20   ` Hillf Danton
2019-11-04 19:03     ` Jerome Glisse
2019-11-05  8:56       ` David Hildenbrand
2019-11-05  4:27     ` Hillf Danton
2019-11-05 15:54       ` Jerome Glisse
2019-11-06  9:22       ` Hillf Danton
2019-11-06 15:46         ` Jerome Glisse
2019-11-07  9:50         ` Hillf Danton
2019-11-07 14:57           ` Jerome Glisse
2019-11-08  9:38           ` Hillf Danton
2019-11-08 13:59             ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191104043420.15648-1-hdanton@sina.com \
    --to=hdanton@sina.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox