linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Qi Zheng <zhengqi.arch@bytedance.com>
To: David Hildenbrand <david@redhat.com>
Cc: Peter Xu <peterx@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <muchun.song@linux.dev>,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH v1 1/2] mm: let pte_lockptr() consume a pte_t pointer
Date: Tue, 30 Jul 2024 17:10:05 +0800	[thread overview]
Message-ID: <a3206cfe-34c4-4b43-9399-40a715f5fbaa@bytedance.com> (raw)
In-Reply-To: <47fe3480-a4a4-465b-8972-c6507c7a76ee@redhat.com>

Hi David,

On 2024/7/30 16:40, David Hildenbrand wrote:
>>> ... also because I still want to understand why the PTL of the PMD table
>>> is required at all. What if we lock it first and somebody else wants to
>>> lock it after us while we already ripped it out? Sure there must be some
>>> reason for the lock, I just don't understand it yet :/.
>>
>> For pmd lock, I think this is needed to clear the pmd entry
>> (pmdp_collapse_flush()). For pte lock, there should be the following two
>> reasons:
>>
> 
> Thanks for the details.
> 
> My current understanding correct that removing *empty* page tables is
> currently only possible if we can guarantee that nothing would try 
> modifying the
> page tables after we drop the PTE page table lock, but we would be happy 
> if someone
> would walk an empty page table while we remove it as long as the access 
> is read-only.
> 
> In retract_page_tables() I thought that would be guaranteed as we 
> prevent refaults
> from the page cache and exclude any uffd + anon handling using the big 
> hammer
> (if any could be active, disallow zapping the page table).
> 
> What I am still not quite getting is what happens if someone would grab 
> the PTE page
> table lock after we released it -- and how that really protects us here.
> 
> 
> I assume it's the
> 
> spin_lock(ptl);
> if (likely(pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
> ...
> 
> handling in __pte_offset_map_lock() that guarantees that.

Yes.

> 
> 
> That indeed means that pte_offset_map_nolock() requires similar checks 
> after
> obtaining the PTL (for the cases where we are not holding the PMD table 
> lock
> and can be sure that we are the one ripping out that table right now).

Yes, if the PTE page will be freed concurrently, then the pmd entry
needs to be rechecked after acquiring the PTL. So I made
pte_offset_map_nolock() return pmdval in my patch[1], and then we
can do the following:

start_pte = pte_offset_map_nolock(mm, pmd, &pmdval, addr, &ptl);
spin_lock(ptl)
if (likely(pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
...


[1]. 
https://lore.kernel.org/lkml/7f5233f9f612c7f58abf218852fb1042d764940b.1719570849.git.zhengqi.arch@bytedance.com/

> 
> 
>> 1. release it after clearing pmd entry, then we can capture the changed
>>      pmd in pte_offset_map_lock() etc after holding this pte lock.
>>      (This is also what I did in my patchset)
> 
> Yes, I get it now.
> 
>>
>> 2. As mentioned in the comments, we may be concurrent with
>>      userfaultfd_ioctl(), but we do not hold the read lock of mmap (or
>>      read lock of vma), so the VM_UFFD_WP may be set. Therefore, we need
>>      to hold the pte lock to check whether a new pte entry has been
>>      inserted.
>>      (See commit[1] for more details)
> 
> 
> Yes, I see we tried to document that magic and it is still absolutely 
> confusing :)
> 
> But at least now it's clearer to me why retract_page_tables() uses 
> slightly different
> locking than collapse_pte_mapped_thp().
> 
> Maybe we should look into letting collapse_pte_mapped_thp() use a 
> similar approach as
> retract_page_tables() to at least make it more consistent. That topic is 
> also touched
> in a98460494b16.

Agree, more consistency is better.

Thanks,
Qi

> 


  reply	other threads:[~2024-07-30  9:10 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-25 18:39 [PATCH v1 0/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking David Hildenbrand
2024-07-25 18:39 ` [PATCH v1 1/2] mm: let pte_lockptr() consume a pte_t pointer David Hildenbrand
2024-07-26 15:36   ` Peter Xu
2024-07-26 16:02     ` David Hildenbrand
2024-07-26 21:28       ` Peter Xu
2024-07-26 21:48         ` David Hildenbrand
2024-07-29  6:19           ` Qi Zheng
2024-07-30  8:40             ` David Hildenbrand
2024-07-30  9:10               ` Qi Zheng [this message]
2024-07-29 16:26           ` Peter Xu
2024-07-29 16:39             ` Peter Xu
2024-07-29 17:46               ` David Hildenbrand
2024-07-30 18:44                 ` Peter Xu
2024-07-30 19:49                   ` David Hildenbrand
2024-07-29  7:48   ` Qi Zheng
2024-07-29  8:46     ` David Hildenbrand
2024-07-29  8:52       ` Qi Zheng
     [not found]   ` <CGME20240730153058eucas1p2319e4cc985dcdc6e98d08398c33fcfd3@eucas1p2.samsung.com>
2024-07-30 15:30     ` Marek Szyprowski
2024-07-30 15:45       ` David Hildenbrand
2024-07-30 15:49         ` David Hildenbrand
2024-07-30 16:08           ` Marek Szyprowski
2024-07-30 16:10             ` David Hildenbrand
2024-07-25 18:39 ` [PATCH v1 2/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking David Hildenbrand
2024-07-26  2:33   ` Baolin Wang
2024-07-26  3:03     ` Baolin Wang
2024-07-26  8:04       ` David Hildenbrand
2024-07-26  8:04     ` David Hildenbrand
2024-07-26  9:38       ` Baolin Wang
2024-07-26 11:40         ` David Hildenbrand
2024-07-29  1:48           ` Baolin Wang
2024-07-26  8:18   ` Muchun Song
2024-07-26 15:26   ` Peter Xu
2024-07-26 15:32     ` David Hildenbrand
2024-07-29  4:51   ` Oscar Salvador
2024-07-25 20:41 ` [PATCH v1 0/2] " Andrew Morton
2024-07-26  9:19   ` David Hildenbrand
2024-07-26 14:45     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a3206cfe-34c4-4b43-9399-40a715f5fbaa@bytedance.com \
    --to=zhengqi.arch@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox