* [PATCH] mm/rmap: remove unnecessary page_table_lock
@ 2024-04-22 10:52 Yajun Deng
2024-04-22 11:24 ` Qi Zheng
[not found] ` <b848c431-deca-42e4-925c-673b3fa1f251@redhat.com>
0 siblings, 2 replies; 7+ messages in thread
From: Yajun Deng @ 2024-04-22 10:52 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, linux-kernel, Yajun Deng
page_table_lock is a lock that for page table, we won't change page
table in __anon_vma_prepare(). As we can see, it works well in
anon_vma_clone(). They do the same operation.
Remove unnecessary page_table_lock in __anon_vma_prepare().
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
---
mm/rmap.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index 2608c40dffad..e894640a9327 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -187,7 +187,6 @@ static void anon_vma_chain_link(struct vm_area_struct *vma,
*/
int __anon_vma_prepare(struct vm_area_struct *vma)
{
- struct mm_struct *mm = vma->vm_mm;
struct anon_vma *anon_vma, *allocated;
struct anon_vma_chain *avc;
@@ -208,8 +207,6 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
}
anon_vma_lock_write(anon_vma);
- /* page_table_lock to protect against threads */
- spin_lock(&mm->page_table_lock);
if (likely(!vma->anon_vma)) {
vma->anon_vma = anon_vma;
anon_vma_chain_link(vma, avc, anon_vma);
@@ -217,7 +214,6 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
allocated = NULL;
avc = NULL;
}
- spin_unlock(&mm->page_table_lock);
anon_vma_unlock_write(anon_vma);
if (unlikely(allocated))
--
2.25.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/rmap: remove unnecessary page_table_lock
2024-04-22 10:52 [PATCH] mm/rmap: remove unnecessary page_table_lock Yajun Deng
@ 2024-04-22 11:24 ` Qi Zheng
[not found] ` <b848c431-deca-42e4-925c-673b3fa1f251@redhat.com>
1 sibling, 0 replies; 7+ messages in thread
From: Qi Zheng @ 2024-04-22 11:24 UTC (permalink / raw)
To: Yajun Deng; +Cc: akpm, linux-mm, linux-kernel
On 2024/4/22 18:52, Yajun Deng wrote:
> page_table_lock is a lock that for page table, we won't change page
> table in __anon_vma_prepare(). As we can see, it works well in
> anon_vma_clone(). They do the same operation.
>
> Remove unnecessary page_table_lock in __anon_vma_prepare().
IIUC, the page_table_lock here is not to protect page table, but as
mentioned in the comments, to prevent concurrent modification by
multiple threads.
>
> Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
> ---
> mm/rmap.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 2608c40dffad..e894640a9327 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -187,7 +187,6 @@ static void anon_vma_chain_link(struct vm_area_struct *vma,
> */
> int __anon_vma_prepare(struct vm_area_struct *vma)
> {
> - struct mm_struct *mm = vma->vm_mm;
> struct anon_vma *anon_vma, *allocated;
> struct anon_vma_chain *avc;
>
> @@ -208,8 +207,6 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
> }
>
> anon_vma_lock_write(anon_vma);
> - /* page_table_lock to protect against threads */
> - spin_lock(&mm->page_table_lock);
> if (likely(!vma->anon_vma)) {
> vma->anon_vma = anon_vma;
> anon_vma_chain_link(vma, avc, anon_vma);
> @@ -217,7 +214,6 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
> allocated = NULL;
> avc = NULL;
> }
> - spin_unlock(&mm->page_table_lock);
> anon_vma_unlock_write(anon_vma);
>
> if (unlikely(allocated))
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/rmap: remove unnecessary page_table_lock
[not found] ` <b848c431-deca-42e4-925c-673b3fa1f251@redhat.com>
@ 2024-04-23 7:53 ` Yajun Deng
2024-04-23 8:18 ` David Hildenbrand
0 siblings, 1 reply; 7+ messages in thread
From: Yajun Deng @ 2024-04-23 7:53 UTC (permalink / raw)
To: David Hildenbrand, akpm; +Cc: linux-mm, linux-kernel
April 22, 2024 at 7:24 PM, "David Hildenbrand" <david@redhat.com> wrote:
>
> On 22.04.24 12:52, Yajun Deng wrote:
>
> >
> > page_table_lock is a lock that for page table, we won't change page
> >
> > table in __anon_vma_prepare(). As we can see, it works well in
> >
> > anon_vma_clone(). They do the same operation.
> >
>
> We are reusing mm->page_table_lock to serialize, not the *actual* low-level page table locks that really protect PTEs.
>
> With that locking gone, there would be nothing protection vma->anon_vma.
>
> Note that anon_vma_clone() is likely called with the mmap_lock held in write mode, which is not the case for __anon_vma_prepare() ...
Yes, anon_vma_clone() is called with the mmap_lock held. I added mmap_assert_write_locked(dst->vm_mm) to prove it.
I added mmap_assert_write_locked(vma->vm_mm) in __anon_vma_prepare() at the same time, it shows __anon_vma_prepare()
is also called with the mmap_lock held too.
>
> I think this change is wrong.
>
> -- Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/rmap: remove unnecessary page_table_lock
2024-04-23 7:53 ` Yajun Deng
@ 2024-04-23 8:18 ` David Hildenbrand
2024-04-23 8:35 ` Yajun Deng
0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2024-04-23 8:18 UTC (permalink / raw)
To: Yajun Deng, akpm; +Cc: linux-mm, linux-kernel
On 23.04.24 09:53, Yajun Deng wrote:
> April 22, 2024 at 7:24 PM, "David Hildenbrand" <david@redhat.com> wrote:
>
>
>
>>
>> On 22.04.24 12:52, Yajun Deng wrote:
>>
>>>
>>> page_table_lock is a lock that for page table, we won't change page
>>>
>>> table in __anon_vma_prepare(). As we can see, it works well in
>>>
>>> anon_vma_clone(). They do the same operation.
>>>
>>
>> We are reusing mm->page_table_lock to serialize, not the *actual* low-level page table locks that really protect PTEs.
>>
>> With that locking gone, there would be nothing protection vma->anon_vma.
>>
>> Note that anon_vma_clone() is likely called with the mmap_lock held in write mode, which is not the case for __anon_vma_prepare() ...
>
> Yes, anon_vma_clone() is called with the mmap_lock held. I added mmap_assert_write_locked(dst->vm_mm) to prove it.
> I added mmap_assert_write_locked(vma->vm_mm) in __anon_vma_prepare() at the same time, it shows __anon_vma_prepare()
> is also called with the mmap_lock held too.
Make sure you actually have lockdep built in and enabled.
__anon_vma_prepare() is for example called from do_anonymous_page()
where we might only hold the mmap_lock in read mode (or not at all IIRC
with VMA in read mode).
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/rmap: remove unnecessary page_table_lock
2024-04-23 8:18 ` David Hildenbrand
@ 2024-04-23 8:35 ` Yajun Deng
2024-04-23 17:11 ` Liam R. Howlett
0 siblings, 1 reply; 7+ messages in thread
From: Yajun Deng @ 2024-04-23 8:35 UTC (permalink / raw)
To: David Hildenbrand, akpm; +Cc: linux-mm, linux-kernel
April 23, 2024 at 4:18 PM, "David Hildenbrand" <david@redhat.com> wrote:
>
> On 23.04.24 09:53, Yajun Deng wrote:
>
> >
> > April 22, 2024 at 7:24 PM, "David Hildenbrand" <david@redhat.com> wrote:
> >
> > > >>
> >
> > >
> > > On 22.04.24 12:52, Yajun Deng wrote:
> > >
> >
> > page_table_lock is a lock that for page table, we won't change page
> >
> > table in __anon_vma_prepare(). As we can see, it works well in
> >
> > anon_vma_clone(). They do the same operation.
> >
> > >
> > > We are reusing mm->page_table_lock to serialize, not the *actual* low-level page table locks that really protect PTEs.
> > >
> > > With that locking gone, there would be nothing protection vma->anon_vma.
> > >
> > > Note that anon_vma_clone() is likely called with the mmap_lock held in write mode, which is not the case for __anon_vma_prepare() ...
> > >
> >
> > Yes, anon_vma_clone() is called with the mmap_lock held. I added mmap_assert_write_locked(dst->vm_mm) to prove it.
> >
> > I added mmap_assert_write_locked(vma->vm_mm) in __anon_vma_prepare() at the same time, it shows __anon_vma_prepare()
> >
> > is also called with the mmap_lock held too.
> >
>
> Make sure you actually have lockdep built in and enabled.
>
This is my config.
CONFIG_LOCKDEP=n
CONFIG_DEBUG_VM=y
I did another test.
I put mmap_assert_write_locked(mm) before 'set_bit(MMF_OOM_SKIP, &mm->flags)' in mmap.c, it's outside the lock.
It will crash when on boot. I think mmap_assert_write_locked() works.
> __anon_vma_prepare() is for example called from do_anonymous_page() where we might only hold the mmap_lock in read mode (or not at all IIRC with VMA in read mode).
>
> -- Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/rmap: remove unnecessary page_table_lock
2024-04-23 8:35 ` Yajun Deng
@ 2024-04-23 17:11 ` Liam R. Howlett
2024-04-24 3:04 ` Yajun Deng
0 siblings, 1 reply; 7+ messages in thread
From: Liam R. Howlett @ 2024-04-23 17:11 UTC (permalink / raw)
To: Yajun Deng; +Cc: David Hildenbrand, akpm, linux-mm, linux-kernel
* Yajun Deng <yajun.deng@linux.dev> [240423 04:35]:
> April 23, 2024 at 4:18 PM, "David Hildenbrand" <david@redhat.com> wrote:
>
>
>
> >
> > On 23.04.24 09:53, Yajun Deng wrote:
> >
> > >
> > > April 22, 2024 at 7:24 PM, "David Hildenbrand" <david@redhat.com> wrote:
> > >
> > > > >>
> > >
> > > >
> > > > On 22.04.24 12:52, Yajun Deng wrote:
> > > >
> > >
> > > page_table_lock is a lock that for page table, we won't change page
> > >
> > > table in __anon_vma_prepare(). As we can see, it works well in
> > >
> > > anon_vma_clone(). They do the same operation.
> > >
> > > >
> > > > We are reusing mm->page_table_lock to serialize, not the *actual* low-level page table locks that really protect PTEs.
> > > >
> > > > With that locking gone, there would be nothing protection vma->anon_vma.
> > > >
> > > > Note that anon_vma_clone() is likely called with the mmap_lock held in write mode, which is not the case for __anon_vma_prepare() ...
> > > >
> > >
> > > Yes, anon_vma_clone() is called with the mmap_lock held. I added mmap_assert_write_locked(dst->vm_mm) to prove it.
> > >
> > > I added mmap_assert_write_locked(vma->vm_mm) in __anon_vma_prepare() at the same time, it shows __anon_vma_prepare()
> > >
> > > is also called with the mmap_lock held too.
> > >
> >
> > Make sure you actually have lockdep built in and enabled.
> >
>
> This is my config.
> CONFIG_LOCKDEP=n
> CONFIG_DEBUG_VM=y
>
> I did another test.
> I put mmap_assert_write_locked(mm) before 'set_bit(MMF_OOM_SKIP, &mm->flags)' in mmap.c, it's outside the lock.
> It will crash when on boot. I think mmap_assert_write_locked() works.
If you are changing locks, then please test with lockdep on.
>
>
> > __anon_vma_prepare() is for example called from do_anonymous_page() where we might only hold the mmap_lock in read mode (or not at all IIRC with VMA in read mode).
Consider two concurrent readers getting to this function with the same
vma. There is no mergeable anon vma, so both create a new anon_vma.
You take the anon_vma lock to parallelize the linking to the vma - but
they are different locks because they are both new anon_vma structs
allocated by concurrent readers.
You now need a lock that you know cannot allow this to happen. Looking
at the top of mm/rmap.c and see which one works. The next in line is
the page table lock, so that one is used here.
What if we reverse the locks? We can deadlock.
What if we don't take the anon_vma lock? We can have two writers to the
anon_vma.
Thanks,
Liam
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/rmap: remove unnecessary page_table_lock
2024-04-23 17:11 ` Liam R. Howlett
@ 2024-04-24 3:04 ` Yajun Deng
0 siblings, 0 replies; 7+ messages in thread
From: Yajun Deng @ 2024-04-24 3:04 UTC (permalink / raw)
To: Liam R. Howlett; +Cc: David Hildenbrand, akpm, linux-mm, linux-kernel
April 24, 2024 at 1:11 AM, "Liam R. Howlett" <Liam.Howlett@oracle.com> wrote:
>
> * Yajun Deng <yajun.deng@linux.dev> [240423 04:35]:
>
> >
> > April 23, 2024 at 4:18 PM, "David Hildenbrand" <david@redhat.com> wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On 23.04.24 09:53, Yajun Deng wrote:
> >
> >
> >
> > >
> >
> > > April 22, 2024 at 7:24 PM, "David Hildenbrand" <david@redhat.com> wrote:
> >
> > >
> >
> > > > >>
> >
> > >
> >
> > > >
> >
> > > > On 22.04.24 12:52, Yajun Deng wrote:
> >
> > > >
> >
> > >
> >
> > > page_table_lock is a lock that for page table, we won't change page
> >
> > >
> >
> > > table in __anon_vma_prepare(). As we can see, it works well in
> >
> > >
> >
> > > anon_vma_clone(). They do the same operation.
> >
> > >
> >
> > > >
> >
> > > > We are reusing mm->page_table_lock to serialize, not the *actual* low-level page table locks that really protect PTEs.
> >
> > > >
> >
> > > > With that locking gone, there would be nothing protection vma->anon_vma.
> >
> > > >
> >
> > > > Note that anon_vma_clone() is likely called with the mmap_lock held in write mode, which is not the case for __anon_vma_prepare() ...
> >
> > > >
> >
> > >
> >
> > > Yes, anon_vma_clone() is called with the mmap_lock held. I added mmap_assert_write_locked(dst->vm_mm) to prove it.
> >
> > >
> >
> > > I added mmap_assert_write_locked(vma->vm_mm) in __anon_vma_prepare() at the same time, it shows __anon_vma_prepare()
> >
> > >
> >
> > > is also called with the mmap_lock held too.
> >
> > >
> >
> >
> >
> > Make sure you actually have lockdep built in and enabled.
> >
> >
> >
> >
> >
> > This is my config.
> >
> > CONFIG_LOCKDEP=n
> >
> > CONFIG_DEBUG_VM=y
> >
> >
> >
> > I did another test.
> >
> > I put mmap_assert_write_locked(mm) before 'set_bit(MMF_OOM_SKIP, &mm->flags)' in mmap.c, it's outside the lock.
> >
> > It will crash when on boot. I think mmap_assert_write_locked() works.
> >
>
> If you are changing locks, then please test with lockdep on.
>
It's my fault. It shows a warning with lockdep on.
> >
> > __anon_vma_prepare() is for example called from do_anonymous_page() where we might only hold the mmap_lock in read mode (or not at all IIRC with VMA in read mode).
> >
>
> Consider two concurrent readers getting to this function with the same
>
> vma. There is no mergeable anon vma, so both create a new anon_vma.
>
> You take the anon_vma lock to parallelize the linking to the vma - but
>
> they are different locks because they are both new anon_vma structs
>
> allocated by concurrent readers.
>
> You now need a lock that you know cannot allow this to happen. Looking
>
> at the top of mm/rmap.c and see which one works. The next in line is
>
> the page table lock, so that one is used here.
>
> What if we reverse the locks? We can deadlock.
>
> What if we don't take the anon_vma lock? We can have two writers to the
>
> anon_vma.
>
> Thanks,
>
> Liam
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-04-24 3:04 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-22 10:52 [PATCH] mm/rmap: remove unnecessary page_table_lock Yajun Deng
2024-04-22 11:24 ` Qi Zheng
[not found] ` <b848c431-deca-42e4-925c-673b3fa1f251@redhat.com>
2024-04-23 7:53 ` Yajun Deng
2024-04-23 8:18 ` David Hildenbrand
2024-04-23 8:35 ` Yajun Deng
2024-04-23 17:11 ` Liam R. Howlett
2024-04-24 3:04 ` Yajun Deng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox