linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Baolin Wang <baolin.wang@linux.alibaba.com>,
	linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <muchun.song@linux.dev>, Peter Xu <peterx@redhat.com>,
	Oscar Salvador <osalvador@suse.de>,
	stable@vger.kernel.org
Subject: Re: [PATCH v1 2/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking
Date: Fri, 26 Jul 2024 10:04:18 +0200	[thread overview]
Message-ID: <c16a731f-1029-4ede-bbea-af2218c566d1@redhat.com> (raw)
In-Reply-To: <0067dfe6-b9a6-4e98-9eef-7219299bfe58@linux.alibaba.com>

On 26.07.24 04:33, Baolin Wang wrote:
> 
> 
> On 2024/7/26 02:39, David Hildenbrand wrote:
>> We recently made GUP's common page table walking code to also walk
>> hugetlb VMAs without most hugetlb special-casing, preparing for the
>> future of having less hugetlb-specific page table walking code in the
>> codebase. Turns out that we missed one page table locking detail: page
>> table locking for hugetlb folios that are not mapped using a single
>> PMD/PUD.
>>
>> Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB
>> hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the
>> page tables, will perform a pte_offset_map_lock() to grab the PTE table
>> lock.
>>
>> However, hugetlb that concurrently modifies these page tables would
>> actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the
>> locks would differ. Something similar can happen right now with hugetlb
>> folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.
>>
>> Let's make huge_pte_lockptr() effectively uses the same PT locks as any
>> core-mm page table walker would.
> 
> Thanks for raising the issue again. I remember fixing this issue 2 years
> ago in commit fac35ba763ed ("mm/hugetlb: fix races when looking up a
> CONT-PTE/PMD size hugetlb page"), but it seems to be broken again.
> 

Ah, right! We fixed it by rerouting to hugetlb code that we then removed :D

Did we have a reproducer back then that would make my live easier?

>> There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb
>> folio being mapped using two PTE page tables. While hugetlb wants to take
>> the PMD table lock, core-mm would grab the PTE table lock of one of both
>> PTE page tables. In such corner cases, we have to make sure that both
>> locks match, which is (fortunately!) currently guaranteed for 8xx as it
>> does not support SMP.
>>
>> Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code")
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>    include/linux/hugetlb.h | 25 ++++++++++++++++++++++---
>>    1 file changed, 22 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> index c9bf68c239a01..da800e56fe590 100644
>> --- a/include/linux/hugetlb.h
>> +++ b/include/linux/hugetlb.h
>> @@ -944,10 +944,29 @@ static inline bool htlb_allow_alloc_fallback(int reason)
>>    static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
>>    					   struct mm_struct *mm, pte_t *pte)
>>    {
>> -	if (huge_page_size(h) == PMD_SIZE)
>> +	VM_WARN_ON(huge_page_size(h) == PAGE_SIZE);
>> +	VM_WARN_ON(huge_page_size(h) >= P4D_SIZE);
>> +
>> +	/*
>> +	 * hugetlb must use the exact same PT locks as core-mm page table
>> +	 * walkers would. When modifying a PTE table, hugetlb must take the
>> +	 * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD
>> +	 * PT lock etc.
>> +	 *
>> +	 * The expectation is that any hugetlb folio smaller than a PMD is
>> +	 * always mapped into a single PTE table and that any hugetlb folio
>> +	 * smaller than a PUD (but at least as big as a PMD) is always mapped
>> +	 * into a single PMD table.
> 
> ARM64 also supports cont-PMD size hugetlb, which is 32MiB size with a 4
> KiB base page size. This means the PT locks for 32MiB hugetlb may race
> again, as we currently only hold one PMD lock for several PMD entries of
> a cont-PMD size hugetlb.

Exactly, that's the case where all cont-PMD entries fall into the same 
page table.

That's also what I will try reproducing with (migration racing with 
GUP). But the race window is small and a lot of other stuff is protected 
by the VMA lock.

-- 
Cheers,

David / dhildenb



  parent reply	other threads:[~2024-07-26  8:04 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-25 18:39 [PATCH v1 0/2] " David Hildenbrand
2024-07-25 18:39 ` [PATCH v1 1/2] mm: let pte_lockptr() consume a pte_t pointer David Hildenbrand
2024-07-26 15:36   ` Peter Xu
2024-07-26 16:02     ` David Hildenbrand
2024-07-26 21:28       ` Peter Xu
2024-07-26 21:48         ` David Hildenbrand
2024-07-29  6:19           ` Qi Zheng
2024-07-30  8:40             ` David Hildenbrand
2024-07-30  9:10               ` Qi Zheng
2024-07-29 16:26           ` Peter Xu
2024-07-29 16:39             ` Peter Xu
2024-07-29 17:46               ` David Hildenbrand
2024-07-30 18:44                 ` Peter Xu
2024-07-30 19:49                   ` David Hildenbrand
2024-07-29  7:48   ` Qi Zheng
2024-07-29  8:46     ` David Hildenbrand
2024-07-29  8:52       ` Qi Zheng
     [not found]   ` <CGME20240730153058eucas1p2319e4cc985dcdc6e98d08398c33fcfd3@eucas1p2.samsung.com>
2024-07-30 15:30     ` Marek Szyprowski
2024-07-30 15:45       ` David Hildenbrand
2024-07-30 15:49         ` David Hildenbrand
2024-07-30 16:08           ` Marek Szyprowski
2024-07-30 16:10             ` David Hildenbrand
2024-07-25 18:39 ` [PATCH v1 2/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking David Hildenbrand
2024-07-26  2:33   ` Baolin Wang
2024-07-26  3:03     ` Baolin Wang
2024-07-26  8:04       ` David Hildenbrand
2024-07-26  8:04     ` David Hildenbrand [this message]
2024-07-26  9:38       ` Baolin Wang
2024-07-26 11:40         ` David Hildenbrand
2024-07-29  1:48           ` Baolin Wang
2024-07-26  8:18   ` Muchun Song
2024-07-26 15:26   ` Peter Xu
2024-07-26 15:32     ` David Hildenbrand
2024-07-29  4:51   ` Oscar Salvador
2024-07-25 20:41 ` [PATCH v1 0/2] " Andrew Morton
2024-07-26  9:19   ` David Hildenbrand
2024-07-26 14:45     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c16a731f-1029-4ede-bbea-af2218c566d1@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=peterx@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox