linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: riel@surriel.com, linux-kernel@vger.kernel.org,
	kernel-team@meta.com, linux-mm@kvack.org,
	akpm@linux-foundation.org, muchun.song@linux.dev,
	mike.kravetz@oracle.com, leit@meta.com, willy@infradead.org,
	stable@kernel.org, Ackerley Tng <ackerleytng@google.com>
Subject: Re: [PATCH 2/4] hugetlbfs: extend hugetlb_vma_lock to private VMAs
Date: Tue, 12 Nov 2024 12:24:00 +0100	[thread overview]
Message-ID: <ZzM60CLkWKwFzWqa@localhost.localdomain> (raw)
In-Reply-To: <Zy0gqwIw5Y3IuNTD@x1n>

On Thu, Nov 07, 2024 at 03:18:51PM -0500, Peter Xu wrote:
> +Ackerley +Oscar
> 
> I'm reading the resv code recently and just stumbled upon this. So want to
> raise this question.
> 
> IIUC __vma_private_lock() will return false for MAP_PRIVATE hugetlb vma if
> the vma is dup()ed from a fork(), with/without commit 187da0f8250a
> ("hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write") which fixed a
> slightly different issue.
> 
> The problem is the current vma lock for private mmap() is based on the resv
> map, and the resv map only belongs to the process that mmap()ed this
> private vma.  E.g. dup_mmap() has:
> 
>                 if (is_vm_hugetlb_page(tmp))
>                         hugetlb_dup_vma_private(tmp);
> 
> Which does:
> 
> 	if (vma->vm_flags & VM_MAYSHARE) {
>                 ...
> 	} else
> 		vma->vm_private_data = NULL; <---------------------
> 
> So even if I don't know how many of us are even using hugetlb PRIVATE +
> fork(), assuming that's the most controversial use case that I'm aware of
> on hugetlb that people complains about.. with some tricky changes like
> 04f2cbe35699.. Just still want to raise this pure question, that after a
> fork() on private vma, and if I read it alright, lock/unlock operations may
> become noop..

I have been taking a look at this, and yes, __vma_private_lock will
return false for private hugetlb mappings that were forked .

I quickly checked what protects what and we currently have:

hugetlb_vma_lock_read - copy_hugetlb_page_range (only sharing)
hugetlb_vma_lock_read - hugetlb_wp (only for HPAGE_RESV_OWNER)
hugetlb_vma_lock_read - hugetlb_fault , protects huge_pmd_unshare?
hugetlb_vma_lock_read - pagewalks

hugetlb_vma_lock_write - hugetlb_change_protection
hugetlb_vma_lock_write - hugetlb_unshare_pmds
hugetlb_vma_lock_wirte - move_hugetlb_page_tables
hugetlb_vma_lock_wirte - _hugetlb_zap_begin (unmap_vmas)

the ones taking the hugetlb_vma_lock in write (so, the last four) also
take the i_mmap_lock_write (vma->vm_file->f_mapping), and AFAIK, hugetlb
mappings, private or not, should have vma->vm_file->f_mapping set.

Which means that technically we cannot race between hugetlb_change_protection
and move_hugetlb_page_tables etc.

But, checking 

 commit bf4916922c60f43efaa329744b3eef539aa6a2b2
 Author: Rik van Riel <riel@surriel.com>
 Date:   Thu Oct 5 23:59:07 2023 -0400
 
     hugetlbfs: extend hugetlb_vma_lock to private VMAs

which its motivation was to protect MADV_DONTNEED vs page_faults, I do
not see how it gets protected with private hugetlb mappings that were
dupped (forked).

 madvise_dontneed_single_vma
  zap_page_range_single
   _hugetlb_zap_begin
    hugetlb_vma_lock_write - noop for mappings that do not own the reservation
    i_mmap_lock_write

But the hugetlb_fault path only takes hugetlb_vma_lock_*, so theorically
we still could race between page_fault vs madvise_dontneed_single_vma?

A quick way to prove would be map a hugetlb private mapping, fork it and
have two threads tryong to madvise(MADV_DONTNEED) and the other trying
to write to it?

I do not know, maybe we are protected in some other way I cannot see
right now.

I will have a look.

 

-- 
Oscar Salvador
SUSE Labs


  reply	other threads:[~2024-11-12 11:24 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-06  3:59 [PATCH v7 0/4] hugetlbfs: close race between MADV_DONTNEED and page fault riel
2023-10-06  3:59 ` [PATCH 1/4] hugetlbfs: clear resv_map pointer if mmap fails riel
2023-10-06  3:59 ` [PATCH 2/4] hugetlbfs: extend hugetlb_vma_lock to private VMAs riel
2024-11-07 20:18   ` Peter Xu
2024-11-12 11:24     ` Oscar Salvador [this message]
2023-10-06  3:59 ` [PATCH 3/4] hugetlbfs: close race between MADV_DONTNEED and page fault riel
2023-10-06 17:57   ` Mike Kravetz
2023-10-06  3:59 ` [PATCH 4/4] hugetlbfs: replace hugetlb_vma_lock with invalidate_lock riel
2023-10-17  0:52   ` Mike Kravetz
2023-10-17 13:47     ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZzM60CLkWKwFzWqa@localhost.localdomain \
    --to=osalvador@suse.de \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=kernel-team@meta.com \
    --cc=leit@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=stable@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox