From: Mike Kravetz <mike.kravetz@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Andrew Morton <akpm@linux-foundation.org>,
Muchun Song <songmuchun@bytedance.com>,
Naoya Horiguchi <naoya.horiguchi@nec.com>
Subject: Re: [PATCH] mm/hugetlb: Fix pgtable lock on pmd sharing
Date: Mon, 12 Jun 2023 13:44:18 -0700 [thread overview]
Message-ID: <20230612204418.GD3704@monkey> (raw)
In-Reply-To: <20230612160420.809818-1-peterx@redhat.com>
On 06/12/23 12:04, Peter Xu wrote:
> Huge pmd sharing operates on PUD not PMD, huge_pte_lock() is not suitable
> in this case because it should only work for last level pte changes, while
> pmd sharing is always one level higher.
Right! That lock does not prevent someone else from concurrently modifying
the PUD.
> Meanwhile, here we're locking over the spte pgtable lock which is even not
> a lock for current mm but someone else's.
>
> It seems even racy on operating on the lock, as after put_page() of the
> spte pgtable page logically the page can be released, so at least the
> spin_unlock() needs to be done after the put_page().
Agree.
> No report I am aware, I'm not even sure whether it'll just work on taking
> the spte pmd lock, because while we're holding i_mmap read lock it probably
> means the vma interval tree is frozen, all pte allocators over this pud
> entry could always find the specific svma and spte page, so maybe they'll
> serialize on this spte page lock?
It seems they would serialize IF they were trying to instantiate the same
shared page. However, I suppose another thread could be trying to
instantiate something totally different in the VA range represented by that
PUD. In this case, it seems like there would be no synchronization.
> Even so, doesn't seem to be expected.
> It just seems to be an accident of cb900f412154.
>
> Fix it with the proper pud lock (which is the mm's page_table_lock).
>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Fixes: cb900f412154 ("mm, hugetlb: convert hugetlbfs to use split pmd lock")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> mm/hugetlb.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
Agree with this change. But, it does make one wonder if the pud_clear()
in huge_pmd_unshare is safe. Like here, we really should be holding the
higher level lock but are holding the PMD lock.
--
Mike Kravetz
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index dfa412d8cb30..270ec0ecd5a1 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -7133,7 +7133,6 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
> unsigned long saddr;
> pte_t *spte = NULL;
> pte_t *pte;
> - spinlock_t *ptl;
>
> i_mmap_lock_read(mapping);
> vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
> @@ -7154,7 +7153,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
> if (!spte)
> goto out;
>
> - ptl = huge_pte_lock(hstate_vma(vma), mm, spte);
> + spin_lock(&mm->page_table_lock);
> if (pud_none(*pud)) {
> pud_populate(mm, pud,
> (pmd_t *)((unsigned long)spte & PAGE_MASK));
> @@ -7162,7 +7161,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
> } else {
> put_page(virt_to_page(spte));
> }
> - spin_unlock(ptl);
> + spin_unlock(&mm->page_table_lock);
> out:
> pte = (pte_t *)pmd_alloc(mm, pud, addr);
> i_mmap_unlock_read(mapping);
> --
> 2.40.1
>
next prev parent reply other threads:[~2023-06-12 20:44 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-12 16:04 Peter Xu
2023-06-12 20:44 ` Mike Kravetz [this message]
2023-06-12 21:35 ` Peter Xu
2023-06-12 21:44 ` Mike Kravetz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230612204418.GD3704@monkey \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=naoya.horiguchi@nec.com \
--cc=peterx@redhat.com \
--cc=songmuchun@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox