linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Jan Kara <jack@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio()
Date: Mon, 10 Jan 2022 21:10:36 +0000	[thread overview]
Message-ID: <YdygzGbTYVyOfJeB@casper.infradead.org> (raw)
In-Reply-To: <20220110203611.7s2lg4cyejj5l5ah@quack3.lan>

On Mon, Jan 10, 2022 at 09:36:11PM +0100, Jan Kara wrote:
> On Mon 10-01-22 15:52:51, Matthew Wilcox wrote:
> > On Mon, Jan 10, 2022 at 04:22:08PM +0100, Jan Kara wrote:
> > > On Sun 09-01-22 00:01:49, John Hubbard wrote:
> > > > On 1/8/22 20:39, Matthew Wilcox wrote:
> > > > > On Wed, Jan 05, 2022 at 12:17:46AM -0800, John Hubbard wrote:
> > > > > > > +		if (!folio_test_dirty(folio)) {
> > > > > > > +			folio_lock(folio);
> > > > > > > +			folio_mark_dirty(folio);
> > > > > > > +			folio_unlock(folio);
> > > > > > 
> > > > > > At some point, maybe even here, I suspect that creating the folio
> > > > > > version of set_page_dirty_lock() would help. I'm sure you have
> > > > > > a better feel for whether it helps, after doing all of this conversion
> > > > > > work, but it just sort of jumped out at me as surprising to see it
> > > > > > in this form.
> > > > > 
> > > > > I really hate set_page_dirty_lock().  It smacks of "there is a locking
> > > > > rule here which we're violating, so we'll just take the lock to fix it"
> > > > > without understanding why there's a locking problem here.
> > > > > 
> > > > > As far as I can tell, originally, the intent was that you would lock
> > > > > the page before modifying any of the data in the page.  ie you would
> > > > > do:
> > > > > 
> > > > > 	gup()
> > > > > 	lock_page()
> > > > > 	addr = kmap_page()
> > > > > 	*addr = 1;
> > > > > 	kunmap_page()
> > > > > 	set_page_dirty()
> > > > > 	unlock_page()
> > > > > 	put_page()
> > > > > 
> > > > > and that would prevent races between modifying the page and (starting)
> > > > > writeback, not to mention truncate() and various other operations.
> > > > > 
> > > > > Clearly we can't do that for DMA-pinned pages.  There's only one lock
> > > > > bit.  But do we even need to take the lock if we have the page pinned?
> > > > > What are we protecting against?
> > > > 
> > > > This is a fun question, because you're asking it at a point when the
> > > > overall problem remains unsolved. That is, the interaction between
> > > > file-backed pages and gup/pup is still completely broken.
> > > > 
> > > > And I don't have an answer for you: it does seem like lock_page() is
> > > > completely pointless here. Looking back, there are some 25 callers of
> > > > unpin_user_pages_dirty_lock(), and during all those patch reviews, no
> > > > one noticed this point!
> > > 
> > > I'd say it is underdocumented but not obviously pointless :) AFAIR (and
> > > Christoph or Andrew may well correct me) the page lock in
> > > set_page_dirty_lock() is there to protect metadata associated with the page
> > > through page->private. Otherwise truncate could free these (e.g.
> > > block_invalidatepage()) while ->set_page_dirty() callback (e.g.
> > > __set_page_dirty_buffers()) works on this metadata.
> > 
> > Yes, but ... we have an inconsistency between DMA writes to the page and
> > CPU writes to the page.
> > 
> > 	fd = open(file)
> > 	write(fd, 1024 * 1024)
> > 	mmap(NULL, 1024 * 1024, PROT_RW, MAP_SHARED, fd, 0)
> > 	register-memory-with-RDMA
> > 	ftruncate(fd, 0);	// page is removed from page cache
> > 	ftruncate(fd, 1024 * 1024) 
> > 
> > Now if we do a store from the CPU, we instantiate a new page in the
> > page cache and the store will be written back to the file.  If we do
> > an RDMA-write, the write goes to the old page and will be lost.  Indeed,
> > it's no longer visible to the CPU (but is visible to other RDMA reads!)
> > 
> > Which is fine if the program did it itself because it's doing something
> > clearly bonkers, but another program might be the one doing the
> > two truncate() steps, and this would surprise an innocent program.
> > 
> > I still favour blocking the truncate-down (or holepunch) until there
> > are no pinned pages in the inode.  But I know this is a change in
> > behaviour since for some reason, truncate() gets to override mmap().
> 
> I agree although this is unrelated to the page lock discussion above. In
> principle we can consider such change (after all we chose this solution for
> DAX) but it has some consequences - e.g. that disk space cannot be
> reclaimed when someone has pagecache pages pinned (which may be unexpected
> from sysadmin POV) or that we have to be careful or eager application
> doing DIO (once it is converted to pinning) can block truncate
> indefinitely.

It's not unrelated ... once we figure out how to solve this problem,
the set_page_dirty() call happens while the page is still DMA-pinned,
so any solution can be applicable to both places.  Maybe the solution
to truncate vs DMA-pin won't be applicable to both ...

As far as badly behaved applications doing DMA-pinning blocking truncate()
goes, have we considered the possibility of declining the DMA pin if the
process does not own the mmaped file?  That would limit the amount of
trouble it can cause, but maybe it would break some interesting use cases.


  reply	other threads:[~2022-01-10 21:10 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
2022-01-02 21:57 ` [PATCH 01/17] mm: Add folio_put_refs() Matthew Wilcox (Oracle)
2022-01-04  8:00   ` Christoph Hellwig
2022-01-04 21:15   ` John Hubbard
2022-01-02 21:57 ` [PATCH 02/17] mm: Add folio_pincount_available() Matthew Wilcox (Oracle)
2022-01-04  8:01   ` Christoph Hellwig
2022-01-04 18:25     ` Matthew Wilcox
2022-01-04 21:40   ` John Hubbard
2022-01-05  5:04     ` Matthew Wilcox
2022-01-05  6:24       ` John Hubbard
2022-01-02 21:57 ` [PATCH 03/17] mm: Add folio_pincount_ptr() Matthew Wilcox (Oracle)
2022-01-04  8:02   ` Christoph Hellwig
2022-01-04 21:43   ` John Hubbard
2022-01-06 21:57   ` William Kucharski
2022-01-02 21:57 ` [PATCH 04/17] mm: Convert page_maybe_dma_pinned() to use a folio Matthew Wilcox (Oracle)
2022-01-04  8:03   ` Christoph Hellwig
2022-01-04 22:01   ` John Hubbard
2022-01-02 21:57 ` [PATCH 05/17] gup: Add try_get_folio() Matthew Wilcox (Oracle)
2022-01-04  8:18   ` Christoph Hellwig
2022-01-05  1:25   ` John Hubbard
2022-01-05  7:00     ` John Hubbard
2022-01-07 18:23     ` Jason Gunthorpe
2022-01-08  1:37     ` Matthew Wilcox
2022-01-08  2:36       ` John Hubbard
2022-01-10 15:01       ` Jason Gunthorpe
2022-01-02 21:57 ` [PATCH 06/17] mm: Remove page_cache_add_speculative() and page_cache_get_speculative() Matthew Wilcox (Oracle)
2022-01-04  8:18   ` Christoph Hellwig
2022-01-05  1:29   ` John Hubbard
2022-01-02 21:57 ` [PATCH 07/17] gup: Add gup_put_folio() Matthew Wilcox (Oracle)
2022-01-04  8:22   ` Christoph Hellwig
2022-01-05  6:52   ` John Hubbard
2022-01-06 22:05   ` William Kucharski
2022-01-02 21:57 ` [PATCH 08/17] gup: Add try_grab_folio() Matthew Wilcox (Oracle)
2022-01-04  8:24   ` Christoph Hellwig
2022-01-05  7:06   ` John Hubbard
2022-01-02 21:57 ` [PATCH 09/17] gup: Convert gup_pte_range() to use a folio Matthew Wilcox (Oracle)
2022-01-04  8:25   ` Christoph Hellwig
2022-01-05  7:36   ` John Hubbard
2022-01-05  7:52     ` Matthew Wilcox
2022-01-05  7:57       ` John Hubbard
2022-01-02 21:57 ` [PATCH 10/17] gup: Convert gup_hugepte() " Matthew Wilcox (Oracle)
2022-01-04  8:26   ` Christoph Hellwig
2022-01-05  7:46   ` John Hubbard
2022-01-02 21:57 ` [PATCH 11/17] gup: Convert gup_huge_pmd() " Matthew Wilcox (Oracle)
2022-01-04  8:26   ` Christoph Hellwig
2022-01-05  7:50   ` John Hubbard
2022-01-02 21:57 ` [PATCH 12/17] gup: Convert gup_huge_pud() " Matthew Wilcox (Oracle)
2022-01-05  7:58   ` John Hubbard
2022-01-02 21:57 ` [PATCH 13/17] gup: Convert gup_huge_pgd() " Matthew Wilcox (Oracle)
2022-01-05  7:58   ` John Hubbard
2022-01-02 21:57 ` [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio() Matthew Wilcox (Oracle)
2022-01-04  8:32   ` Christoph Hellwig
2022-01-05  8:17   ` John Hubbard
2022-01-09  4:39     ` Matthew Wilcox
2022-01-09  8:01       ` John Hubbard
2022-01-10 15:22         ` Jan Kara
2022-01-10 15:52           ` Matthew Wilcox
2022-01-10 20:36             ` Jan Kara
2022-01-10 21:10               ` Matthew Wilcox [this message]
2022-01-17 12:07                 ` Jan Kara
2022-01-02 21:57 ` [PATCH 15/17] gup: Convert for_each_compound_range() to gup_for_each_folio_range() Matthew Wilcox (Oracle)
2022-01-04  8:35   ` Christoph Hellwig
2022-01-05  8:30   ` John Hubbard
2022-01-02 21:57 ` [PATCH 16/17] mm: Add isolate_lru_folio() Matthew Wilcox (Oracle)
2022-01-04  8:36   ` Christoph Hellwig
2022-01-05  8:44   ` John Hubbard
2022-01-06  0:34     ` Matthew Wilcox
2022-01-02 21:57 ` [PATCH 17/17] gup: Convert check_and_migrate_movable_pages() to use a folio Matthew Wilcox (Oracle)
2022-01-04  8:37   ` Christoph Hellwig
2022-01-05  9:00   ` John Hubbard
2022-01-06 22:12 ` [PATCH 00/17] Convert GUP to folios William Kucharski
2022-01-07 18:54 ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YdygzGbTYVyOfJeB@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jhubbard@nvidia.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox