linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>, Qian Cai <cai@lca.pw>,
	js1304@gmail.com,  Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
	kernel-team@lge.com,  Vlastimil Babka <vbabka@suse.cz>,
	Christoph Hellwig <hch@infradead.org>,
	 Roman Gushchin <guro@fb.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	 Michal Hocko <mhocko@suse.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>
Subject: Re: [PATCH v3 7/8] mm/mempolicy: use a standard migration target allocation callback
Date: Thu, 8 Oct 2020 22:50:16 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.2010082216250.10228@eggly.anvils> (raw)
In-Reply-To: <cf715524-f30d-634f-2a05-e02c4e52e675@oracle.com>

On Thu, 8 Oct 2020, Mike Kravetz wrote:
> On 10/7/20 8:21 PM, Hugh Dickins wrote:
> > 
> > Mike, j'accuse... your 5.7 commit c0d0381ade79 ("hugetlbfs:
> > use i_mmap_rwsem for more pmd sharing synchronization"), in which
> > unmap_and_move_huge_page() now passes the TTU_RMAP_LOCKED flag to
> > try_to_unmap(), because it's already holding mapping->i_mmap_rwsem:
> > but that is not the right lock to secure an anon_vma lookup.
> 
> Thanks Hugh!  Your analysis is correct and the code in that commit is
> not correct.  I was so focused on the file mapping case, I overlooked
> (actually introduced) this issue for anon mappings.
> 
> Let me verify that this indeed is the root cause.  However, since
> move_pages12 migrated anon hugetlb pages it certainly does look to be
> the case.
> 
> > I intended to send a patch, passing TTU_RMAP_LOCKED only in the
> > !PageAnon case (and, see vma_adjust(), anon_vma lock conveniently
> > nests inside i_mmap_rwsem); but then wondered if i_mmap_rwsem was
> > needed in that case or not, so looked deeper into c0d0381ade79.
> > 
> > Hmm, not even you liked it!  But the worst of it looks simply
> > unnecessary to me, and I hope can be deleted - but better by you
> > than by me (in particular, you were trying to kill 1) and 2) birds
> > with one stone, and I've always given up on understanding hugetlb's
> > reservations: I suspect that side of it is irrelevant here,
> > but I wouldn't pretend to be sure).
> > 
> > How could you ever find a PageAnon page in a vma_shareable() area?
> > 
> > It is all rather confusing (vma_shareable() depending on VM_MAYSHARE,
> > whereas is_cow_mapping() on VM_SHARED and VM_MAYWRITE: they have to
> > be studied together with do_mmap()'s 
> > 			vm_flags |= VM_SHARED | VM_MAYSHARE;
> > 			if (!(file->f_mode & FMODE_WRITE))
> > 				vm_flags &= ~(VM_MAYWRITE | VM_SHARED);
> > 
> > (And let me add to the confusion by admitting that, prior to 3.15's
> > cda540ace6a1 "mm: get_user_pages(write,force) refuse to COW in
> > shared areas", maybe it was possible to find a PageAnon there.)
> > 
> > But my belief (best confirmed by you running your tests with a
> > suitably placed BUG_ON or WARN_ON) is that you'll never find a
> > PageAnon in a vma_shareable() area, so will never need try_to_unmap()
> > to unshare a pagetable in the PageAnon case, so won't need i_mmap_rwsem
> > for PageAnon there, and _get_hugetlb_page_mapping() (your function that
> > deduces an address_space from an anon_vma) can just be deleted.
> 
> Yes, it is confusing.  Let me look into this.  I would be really happy
> to delete that ugly function.
> 
> > (And in passing, may I ask what hugetlb_page_mapping_lock_write()'s
> > hpage->_mapcount inc and dec are for?  You comment it as a hack,
> > but don't explain what needs that hack, and I don't see it.)
> 
> We are trying to lock the mapping (mapping->i_mmap_rwsem).  We know
> mapping is valid, because we obtained it from page_mapping() and it
> will remain valid because we have the page locked.  Page needs to be
> unlocked to unmap.  However, we have to drop page lock in order to
> acquire i_mmap_rwsem.  Once we drop page lock, mapping could become
> invalid.  So, the code code artifically incs mapcount so that mapping
> will remain valid when upmapping page.

No, unless you can point me to some other hugetlbfs-does-it-differently
(and I didn't see it there in that commit), raising _mapcount does not
provide any such protection; but does add the possiblility of a
"BUG: Bad page cache" and leak from unaccount_page_cache_page().

Earlier in the day I was trying to work out what to recommend instead,
but had to turn aside to something else: I'll try again tomorrow.

It's a problem I've faced before in tmpfs, keeping a hold on the
mapping while page lock is dropped.  Quite awkward: igrab() looks as
if it's the right thing to use, but turns out to give no protection
against umount.  Last time around, I ended up with a stop_eviction
count in the shmem inode, which shmem_evict_inode() waits on if
necessary.  Something like that could be done for hugetlbfs too,
but I'd prefer to do it without adding extra, if there is a way.

> 
> As mentioned above, I hope all this can be removed.

If you continue to nest page lock inside i_mmap_rwsem for hugetlbfs,
then I think that part of hugetlb_page_mapping_lock_write() has to
remain.  I'd much prefer that hugetlbfs did not reverse the usual
nesting, but accept that you had reasons for doing it that way.

Hugh


  reply	other threads:[~2020-10-09  5:50 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-23  6:13 [PATCH v3 0/8] clean-up the migration target allocation functions js1304
2020-06-23  6:13 ` [PATCH v3 1/8] mm/page_isolation: prefer the node of the source page js1304
2020-06-23  6:13 ` [PATCH v3 2/8] mm/migrate: move migration helper from .h to .c js1304
2020-06-23  6:13 ` [PATCH v3 3/8] mm/hugetlb: unify migration callbacks js1304
2020-06-24 21:18   ` Mike Kravetz
2020-06-25 11:26   ` Michal Hocko
2020-06-26  4:02     ` Joonsoo Kim
2020-07-02 16:13       ` Vlastimil Babka
2020-07-03  0:55         ` Joonsoo Kim
2020-06-23  6:13 ` [PATCH v3 4/8] mm/hugetlb: make hugetlb migration callback CMA aware js1304
2020-06-25 11:54   ` Michal Hocko
2020-06-26  4:49     ` Joonsoo Kim
2020-06-26  7:23       ` Michal Hocko
2020-06-29  6:27         ` Joonsoo Kim
2020-06-29  7:55           ` Michal Hocko
2020-06-30  6:30             ` Joonsoo Kim
2020-06-30  6:42               ` Michal Hocko
2020-06-30  7:22                 ` Joonsoo Kim
2020-06-30 16:37                   ` Mike Kravetz
2020-06-23  6:13 ` [PATCH v3 5/8] mm/migrate: make a standard migration target allocation function js1304
2020-06-25 12:05   ` Michal Hocko
2020-06-26  5:02     ` Joonsoo Kim
2020-06-26  7:33       ` Michal Hocko
2020-06-29  6:41         ` Joonsoo Kim
2020-06-29  8:03           ` Michal Hocko
2020-06-30  7:19             ` Joonsoo Kim
2020-07-03 15:25   ` Vlastimil Babka
2020-06-23  6:13 ` [PATCH v3 6/8] mm/gup: use a standard migration target allocation callback js1304
2020-06-25 12:08   ` Michal Hocko
2020-06-26  5:03     ` Joonsoo Kim
2020-07-03 15:56   ` Vlastimil Babka
2020-07-06  8:34     ` Joonsoo Kim
2020-06-23  6:13 ` [PATCH v3 7/8] mm/mempolicy: " js1304
2020-06-25 12:09   ` Michal Hocko
2020-07-03 15:59   ` Vlastimil Babka
     [not found]   ` <20200708012044.GC992@lca.pw>
2020-07-08  6:45     ` Michal Hocko
2020-10-08  3:21     ` Hugh Dickins
2020-10-08 17:29       ` Mike Kravetz
2020-10-09  5:50         ` Hugh Dickins [this message]
2020-10-09 17:42           ` Mike Kravetz
2020-10-09 22:23             ` Hugh Dickins
2020-10-10  0:25               ` Mike Kravetz
2020-06-23  6:13 ` [PATCH v3 8/8] mm/page_alloc: remove a wrapper for alloc_migration_target() js1304
2020-06-25 12:10   ` Michal Hocko
2020-07-03 16:18   ` Vlastimil Babka
2020-07-06  8:44     ` Joonsoo Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2010082216250.10228@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cai@lca.pw \
    --cc=guro@fb.com \
    --cc=hch@infradead.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=js1304@gmail.com \
    --cc=kernel-team@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox