Re: [RFC] mm: gup: add helper page_try_gup_pin(page)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: John Hubbard <jhubbard@nvidia.com>
To: Hillf Danton <hdanton@sina.com>
Cc: linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>, Jan Kara <jack@suse.cz>,
	Mel Gorman <mgorman@suse.de>, Jerome Glisse <jglisse@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Ira Weiny <ira.weiny@intel.com>, Christoph Hellwig <hch@lst.de>,
	Jonathan Corbet <corbet@lwn.net>
Subject: Re: [RFC] mm: gup: add helper page_try_gup_pin(page)
Date: Sun, 3 Nov 2019 22:09:03 -0800	[thread overview]
Message-ID: <8df14660-2ce3-eda8-dc33-c4d092915656@nvidia.com> (raw)
In-Reply-To: <20191104043420.15648-1-hdanton@sina.com>

On 11/3/19 8:34 PM, Hillf Danton wrote:
...
>>
>> Well, as long as we're counting bits, I've taken 21 bits (!) to track
>> "gupers". :)  More accurately, I'm sharing 31 bits with get_page()...please
> 
> Would you please specify the reasoning of tracking multiple gupers
> for a dirty page? Do you mean that it is all fine for guper-A to add
> changes to guper-B's data without warning and vice versa?

It's generally OK to call get_user_pages() on a page more than once.
And even though we are seeing some work to reduce the number of places
in the kernel that call get_user_pages(), there are still lots of call sites.
That means lots of combinations and situations that could result in more
than one gup call per page.

Furthermore, there is no mechanism, convention, documentation, nor anything
at all that attempts to enforce "for each page, get_user_pages() may only
be called once."

...
>>
>> I think you must have missed the many contentious debates about the
>> tension between gup-pinned pages, and writeback. File systems can't
>> just ignore writeback in all cases. This patch leads to either
>> system hangs or filesystem corruption, in the presence of long-lasting
>> gup pins.
> 
> The current risk of data corruption due to writeback with long-lived
> gup references all ignored is zeroed out by detecting gup-pinned dirty
> pages and skipping them; that may lead to problems you mention above.
> 

Here, I believe you're pointing out that the current situation in the
kernel is already broken, with respect to fs interactions (especially
writeback) with gup. Yes, you are correct, there is a problem.

> Though I doubt anything helpful about it can be expected from fs in near

Actually, fs and mm folks are working together to solve this.

> future, we have options for instance that gupers periodically release
> their references and re-pin pages after data sync the same way as the
> current flusher does.
> 

That's one idea. I don't see it as viable, given the behavior of, say,
a compute process running OpenCL jobs on a GPU that is connected via
a network or Infiniband card--the idea of "pause" really looks more like
"tear down the complicated multi-driver connection, writeback, then set it
all up again", I suspect. (And if we could easily interrupt the job, we'd
probably really be running with a page-fault-capable GPU plus and IB card
that does ODP, plus HMM, and we wouldn't need to gup-pin anyway...)

Anyway, this is not amenable to quick fixes, because the problem is
a couple of missing design pieces. Which we're working on putting in.
But meanwhile, smaller changes such as this one are just going to move
the problems to different places, rather than solving them. So it's best
not to do that.

thanks,
-- 
John Hubbard
NVIDIA

next prev parent reply	other threads:[~2019-11-04  6:11 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-03 11:21 Hillf Danton
2019-11-03 20:20 ` John Hubbard
2019-11-04  4:34 ` Hillf Danton
2019-11-04  6:09   ` John Hubbard [this message]
2019-11-04  8:13     ` Jan Kara
2019-11-04 10:20   ` Hillf Danton
2019-11-04 19:03     ` Jerome Glisse
2019-11-05  8:56       ` David Hildenbrand
2019-11-05  4:27     ` Hillf Danton
2019-11-05 15:54       ` Jerome Glisse
2019-11-06  9:22       ` Hillf Danton
2019-11-06 15:46         ` Jerome Glisse
2019-11-07  9:50         ` Hillf Danton
2019-11-07 14:57           ` Jerome Glisse
2019-11-08  9:38           ` Hillf Danton
2019-11-08 13:59             ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8df14660-2ce3-eda8-dc33-c4d092915656@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=hdanton@sina.com \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=jglisse@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox