Re: [PATCH v2] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mike Kravetz <mike.kravetz@oracle.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	David Hildenbrand <david@redhat.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Mina Almasry <almasrymina@google.com>,
	Peter Xu <peterx@redhat.com>, Rik van Riel <riel@surriel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Wei Chen <harperchen1110@gmail.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
Date: Mon, 24 Oct 2022 14:55:27 -0700	[thread overview]
Message-ID: <Y1cJz6i5YMNfSeAm@monkey> (raw)
In-Reply-To: <20221023025047.470646-1-mike.kravetz@oracle.com>

On 10/22/22 19:50, Mike Kravetz wrote:
> madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear the
> page tables associated with the address range.  For hugetlb vmas,
> zap_page_range will call __unmap_hugepage_range_final.  However,
> __unmap_hugepage_range_final assumes the passed vma is about to be
> removed and deletes the vma_lock to prevent pmd sharing as the vma is
> on the way out.  In the case of madvise(MADV_DONTNEED) the vma remains,
> but the missing vma_lock prevents pmd sharing and could potentially
> lead to issues with truncation/fault races.
> 
> This issue was originally reported here [1] as a BUG triggered in
> page_try_dup_anon_rmap.  Prior to the introduction of the hugetlb
> vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to
> prevent pmd sharing.  Subsequent faults on this vma were confused as
> VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping
> was not set in new pages added to the page table.  This resulted in
> pages that appeared anonymous in a VM_SHARED vma and triggered the BUG.
> 
> Create a new routine clear_hugetlb_page_range() that can be called from
> madvise(MADV_DONTNEED) for hugetlb vmas.  It has the same setup as
> zap_page_range, but does not delete the vma_lock.

After seeing a syzbot use after free report [2] that is also addressed by
this patch, I started thinking ... 

When __unmap_hugepage_range_final was created, the only time unmap_single_vma
was called for hugetlb vmas was during process exit time via exit_mmap.  I got
in trouble when I added a call via madvise(MADV_DONTNEED) which calls
zap_page_range.  This patch takes care of that calling path by having
madvise(MADV_DONTNEED) call a new routine clear_hugetlb_page_range instead of
zap_page_range for hugetlb vmas.  The use after free bug had me auditing code
paths to make sure __unmap_hugepage_range_final was REALLY only called at
process exit time.  If not, and we could fault on a vma after calling
__unmap_hugepage_range_final we would be in trouble.

My thought was, what if we had __unmap_hugepage_range_final check mm->mm_users
to determine if it was being called in the process exit path?  If !mm_users,
then we can delete the vma_lock to prevent pmd sharing as we know the process
is exiting.  If not, we do not delete the lock.  That seems to be more robust
and would prevent issues if someone accidentally introduces a new code path
where __unmap_hugepage_range_final (unmap_single_vma for a hugetlb vma)
could be called outside process exit context.

Thoughts?

[2] https://lore.kernel.org/linux-mm/000000000000d5e00a05e834962e@google.com/
-- 
Mike Kravetz

next prev parent reply	other threads:[~2022-10-24 21:55 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-23  2:50 Mike Kravetz
2022-10-24 21:55 ` Mike Kravetz [this message]
2022-10-24 23:14   ` Mike Kravetz
2022-10-26 21:42 ` Peter Xu
2022-10-26 23:54   ` Mike Kravetz
2022-10-27  1:12     ` Peter Xu
2022-10-28 15:23       ` Mike Kravetz
2022-10-28 16:13         ` Peter Xu
2022-10-28 21:17           ` Mike Kravetz
2022-10-28 23:20             ` Peter Xu
2022-10-30  0:15               ` Mike Kravetz
2022-10-30  0:54                 ` Nadav Amit
2022-10-30 18:43                   ` Peter Xu
2022-10-30 18:52                     ` Nadav Amit
2022-10-31  1:44                       ` Mike Kravetz
2022-11-02 19:24                         ` Peter Xu
2022-11-07 20:01                           ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y1cJz6i5YMNfSeAm@monkey \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=harperchen1110@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@linux.dev \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox