linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma
@ 2025-01-09  7:00 Liu Shixin
  2025-01-09 13:40 ` Matthew Wilcox
  2025-01-09 17:00 ` Yang Shi
  0 siblings, 2 replies; 12+ messages in thread
From: Liu Shixin @ 2025-01-09  7:00 UTC (permalink / raw)
  To: Andrew Morton, Chengming Zhou, Matthew Wilcox, Kefeng Wang,
	Nanyong Sun, Muchun Song, Qi Zheng, Johannes Weiner, Yang Shi
  Cc: linux-mm, linux-kernel, Liu Shixin

syzkaller reported such a BUG_ON():

 ------------[ cut here ]------------
 kernel BUG at mm/khugepaged.c:1835!
 Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
...
 CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G        W          6.13.0-rc6 #22
 Tainted: [W]=WARN
 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
 pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : collapse_file+0xa44/0x1400
 lr : collapse_file+0x88/0x1400
 sp : ffff80008afe3a60
...
 Call trace:
  collapse_file+0xa44/0x1400 (P)
  hpage_collapse_scan_file+0x278/0x400
  madvise_collapse+0x1bc/0x678
  madvise_vma_behavior+0x32c/0x448
  madvise_walk_vmas.constprop.0+0xbc/0x140
  do_madvise.part.0+0xdc/0x2c8
  __arm64_sys_madvise+0x68/0x88
  invoke_syscall+0x50/0x120
  el0_svc_common.constprop.0+0xc8/0xf0
  do_el0_svc+0x24/0x38
  el0_svc+0x34/0x128
  el0t_64_sync_handler+0xc8/0xd0
  el0t_64_sync+0x190/0x198

This indicates that the pgoff is unaligned. After analysis, I confirm
the vma is mapped to /dev/zero. Such a vma certainly has vm_file, but
it is set to anonymous by mmap_zero(). So even if it's mmapped by
2m-unaligned, it can pass the check in thp_vma_allowable_order() as it
is an anonymous-mmap, but then be collapsed as a file-mmap.

It seems the problem has existed for a long time, but actually, since
we have khugepaged_max_ptes_none check before, we will skip collapse it
as it is /dev/zero and so has no present page. But commit d8ea7cc8547c
limit the check for only khugepaged, so the BUG_ON() can be triggered
by madvise_collapse().

Add vma_is_anonymous() check to make such vma be processed by
hpage_collapse_scan_pmd().

Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
---
 mm/khugepaged.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 653dbb1ff05c..eb9d240e42e8 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2422,7 +2422,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
 			VM_BUG_ON(khugepaged_scan.address < hstart ||
 				  khugepaged_scan.address + HPAGE_PMD_SIZE >
 				  hend);
-			if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+			if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file &&
+			    !vma_is_anonymous(vma)) {
 				struct file *file = get_file(vma->vm_file);
 				pgoff_t pgoff = linear_page_index(vma,
 						khugepaged_scan.address);
@@ -2768,7 +2769,8 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
 		mmap_assert_locked(mm);
 		memset(cc->node_load, 0, sizeof(cc->node_load));
 		nodes_clear(cc->alloc_nmask);
-		if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+		if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file &&
+		    !vma_is_anonymous(vma)) {
 			struct file *file = get_file(vma->vm_file);
 			pgoff_t pgoff = linear_page_index(vma, addr);
 
-- 
2.34.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma
  2025-01-09  7:00 [PATCH] mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma Liu Shixin
@ 2025-01-09 13:40 ` Matthew Wilcox
  2025-01-10  2:29   ` Liu Shixin
  2025-01-09 17:00 ` Yang Shi
  1 sibling, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2025-01-09 13:40 UTC (permalink / raw)
  To: Liu Shixin
  Cc: Andrew Morton, Chengming Zhou, Kefeng Wang, Nanyong Sun,
	Muchun Song, Qi Zheng, Johannes Weiner, Yang Shi, linux-mm,
	linux-kernel

On Thu, Jan 09, 2025 at 03:00:59PM +0800, Liu Shixin wrote:
> Add vma_is_anonymous() check to make such vma be processed by
> hpage_collapse_scan_pmd().

Wouldn't it be better to replace the vm_file check with
vma_is_anonymous()?  ie:

> -			if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
> +			if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file &&
> +			    !vma_is_anonymous(vma)) {

+			if (IS_ENABLED(CONFIG_SHMEM) &&
+			    !vma_is_anonymous(vma)) {



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma
  2025-01-09  7:00 [PATCH] mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma Liu Shixin
  2025-01-09 13:40 ` Matthew Wilcox
@ 2025-01-09 17:00 ` Yang Shi
  2025-01-10  2:08   ` Baolin Wang
                     ` (2 more replies)
  1 sibling, 3 replies; 12+ messages in thread
From: Yang Shi @ 2025-01-09 17:00 UTC (permalink / raw)
  To: Liu Shixin, Andrew Morton, Chengming Zhou, Matthew Wilcox,
	Kefeng Wang, Nanyong Sun, Muchun Song, Qi Zheng, Johannes Weiner
  Cc: linux-mm, linux-kernel




On 1/8/25 11:00 PM, Liu Shixin wrote:
> syzkaller reported such a BUG_ON():
>
>   ------------[ cut here ]------------
>   kernel BUG at mm/khugepaged.c:1835!
>   Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
> ...
>   CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G        W          6.13.0-rc6 #22
>   Tainted: [W]=WARN
>   Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
>   pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   pc : collapse_file+0xa44/0x1400
>   lr : collapse_file+0x88/0x1400
>   sp : ffff80008afe3a60
> ...
>   Call trace:
>    collapse_file+0xa44/0x1400 (P)
>    hpage_collapse_scan_file+0x278/0x400
>    madvise_collapse+0x1bc/0x678
>    madvise_vma_behavior+0x32c/0x448
>    madvise_walk_vmas.constprop.0+0xbc/0x140
>    do_madvise.part.0+0xdc/0x2c8
>    __arm64_sys_madvise+0x68/0x88
>    invoke_syscall+0x50/0x120
>    el0_svc_common.constprop.0+0xc8/0xf0
>    do_el0_svc+0x24/0x38
>    el0_svc+0x34/0x128
>    el0t_64_sync_handler+0xc8/0xd0
>    el0t_64_sync+0x190/0x198
>
> This indicates that the pgoff is unaligned. After analysis, I confirm
> the vma is mapped to /dev/zero. Such a vma certainly has vm_file, but
> it is set to anonymous by mmap_zero(). So even if it's mmapped by
> 2m-unaligned, it can pass the check in thp_vma_allowable_order() as it
> is an anonymous-mmap, but then be collapsed as a file-mmap.
>
> It seems the problem has existed for a long time, but actually, since
> we have khugepaged_max_ptes_none check before, we will skip collapse it
> as it is /dev/zero and so has no present page. But commit d8ea7cc8547c
> limit the check for only khugepaged, so the BUG_ON() can be triggered
> by madvise_collapse().
>
> Add vma_is_anonymous() check to make such vma be processed by
> hpage_collapse_scan_pmd().
>
> Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
> Signed-off-by: Liu Shixin <liushixin2@huawei.com>
> ---
>   mm/khugepaged.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 653dbb1ff05c..eb9d240e42e8 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2422,7 +2422,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>   			VM_BUG_ON(khugepaged_scan.address < hstart ||
>   				  khugepaged_scan.address + HPAGE_PMD_SIZE >
>   				  hend);
> -			if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
> +			if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file &&
> +			    !vma_is_anonymous(vma)) {

Thanks for catching this. It sounds a little bit weird to have vm_file 
for an anonymous VMA. I'm not sure why we should keep such special case. 
It seems shared mapping is treated as shmem file mapping. So can we set 
vm_file to NULL when mmap'ing /dev/zero for private mapping? Something like:

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 169eed162a7f..fc332efc5c11 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -527,6 +527,7 @@ static int mmap_zero(struct file *file, struct 
vm_area_struct *vma)
         if (vma->vm_flags & VM_SHARED)
                 return shmem_zero_setup(vma);
         vma_set_anonymous(vma);
+       vma->vm_file = NULL;
         return 0;
  }


>   				struct file *file = get_file(vma->vm_file);
>   				pgoff_t pgoff = linear_page_index(vma,
>   						khugepaged_scan.address);
> @@ -2768,7 +2769,8 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
>   		mmap_assert_locked(mm);
>   		memset(cc->node_load, 0, sizeof(cc->node_load));
>   		nodes_clear(cc->alloc_nmask);
> -		if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
> +		if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file &&
> +		    !vma_is_anonymous(vma)) {
>   			struct file *file = get_file(vma->vm_file);
>   			pgoff_t pgoff = linear_page_index(vma, addr);
>   



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma
  2025-01-09 17:00 ` Yang Shi
@ 2025-01-10  2:08   ` Baolin Wang
  2025-01-10  2:32   ` Liu Shixin
  2025-01-10  4:31   ` Matthew Wilcox
  2 siblings, 0 replies; 12+ messages in thread
From: Baolin Wang @ 2025-01-10  2:08 UTC (permalink / raw)
  To: Yang Shi, Liu Shixin, Andrew Morton, Chengming Zhou,
	Matthew Wilcox, Kefeng Wang, Nanyong Sun, Muchun Song, Qi Zheng,
	Johannes Weiner
  Cc: linux-mm, linux-kernel



On 2025/1/10 01:00, Yang Shi wrote:
> 
> 
> 
> On 1/8/25 11:00 PM, Liu Shixin wrote:
>> syzkaller reported such a BUG_ON():
>>
>>   ------------[ cut here ]------------
>>   kernel BUG at mm/khugepaged.c:1835!
>>   Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>> ...
>>   CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: 
>> G        W          6.13.0-rc6 #22
>>   Tainted: [W]=WARN
>>   Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
>>   pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>   pc : collapse_file+0xa44/0x1400
>>   lr : collapse_file+0x88/0x1400
>>   sp : ffff80008afe3a60
>> ...
>>   Call trace:
>>    collapse_file+0xa44/0x1400 (P)
>>    hpage_collapse_scan_file+0x278/0x400
>>    madvise_collapse+0x1bc/0x678
>>    madvise_vma_behavior+0x32c/0x448
>>    madvise_walk_vmas.constprop.0+0xbc/0x140
>>    do_madvise.part.0+0xdc/0x2c8
>>    __arm64_sys_madvise+0x68/0x88
>>    invoke_syscall+0x50/0x120
>>    el0_svc_common.constprop.0+0xc8/0xf0
>>    do_el0_svc+0x24/0x38
>>    el0_svc+0x34/0x128
>>    el0t_64_sync_handler+0xc8/0xd0
>>    el0t_64_sync+0x190/0x198
>>
>> This indicates that the pgoff is unaligned. After analysis, I confirm
>> the vma is mapped to /dev/zero. Such a vma certainly has vm_file, but
>> it is set to anonymous by mmap_zero(). So even if it's mmapped by
>> 2m-unaligned, it can pass the check in thp_vma_allowable_order() as it
>> is an anonymous-mmap, but then be collapsed as a file-mmap.
>>
>> It seems the problem has existed for a long time, but actually, since
>> we have khugepaged_max_ptes_none check before, we will skip collapse it
>> as it is /dev/zero and so has no present page. But commit d8ea7cc8547c
>> limit the check for only khugepaged, so the BUG_ON() can be triggered
>> by madvise_collapse().
>>
>> Add vma_is_anonymous() check to make such vma be processed by
>> hpage_collapse_scan_pmd().
>>
>> Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate 
>> khugepaged-only behavior")
>> Signed-off-by: Liu Shixin <liushixin2@huawei.com>
>> ---
>>   mm/khugepaged.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 653dbb1ff05c..eb9d240e42e8 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -2422,7 +2422,8 @@ static unsigned int 
>> khugepaged_scan_mm_slot(unsigned int pages, int *result,
>>               VM_BUG_ON(khugepaged_scan.address < hstart ||
>>                     khugepaged_scan.address + HPAGE_PMD_SIZE >
>>                     hend);
>> -            if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
>> +            if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file &&
>> +                !vma_is_anonymous(vma)) {
> 
> Thanks for catching this. It sounds a little bit weird to have vm_file 
> for an anonymous VMA. I'm not sure why we should keep such special case. 
> It seems shared mapping is treated as shmem file mapping. So can we set 
> vm_file to NULL when mmap'ing /dev/zero for private mapping? Something 
> like:
> 
> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> index 169eed162a7f..fc332efc5c11 100644
> --- a/drivers/char/mem.c
> +++ b/drivers/char/mem.c
> @@ -527,6 +527,7 @@ static int mmap_zero(struct file *file, struct 
> vm_area_struct *vma)
>          if (vma->vm_flags & VM_SHARED)
>                  return shmem_zero_setup(vma);
>          vma_set_anonymous(vma);
> +       vma->vm_file = NULL;
>          return 0;
>   }

Yes, this makes more sense to me.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma
  2025-01-09 13:40 ` Matthew Wilcox
@ 2025-01-10  2:29   ` Liu Shixin
  0 siblings, 0 replies; 12+ messages in thread
From: Liu Shixin @ 2025-01-10  2:29 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Chengming Zhou, Kefeng Wang, Nanyong Sun,
	Muchun Song, Qi Zheng, Johannes Weiner, Yang Shi, linux-mm,
	linux-kernel



On 2025/1/9 21:40, Matthew Wilcox wrote:
> On Thu, Jan 09, 2025 at 03:00:59PM +0800, Liu Shixin wrote:
>> Add vma_is_anonymous() check to make such vma be processed by
>> hpage_collapse_scan_pmd().
> Wouldn't it be better to replace the vm_file check with
> vma_is_anonymous()?  ie:
>
>> -			if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
>> +			if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file &&
>> +			    !vma_is_anonymous(vma)) {
> +			if (IS_ENABLED(CONFIG_SHMEM) &&
> +			    !vma_is_anonymous(vma)) {
>
>
> .
>
Thanks, replace is better.

Yang Shi suggest me to set vm_file to NULL when mmap /dev/zero as private.
It seems only /dev/zero is special which has vm_file but no vm_ops.
This way may solve more hidden similar problems, so I prefer it now.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma
  2025-01-09 17:00 ` Yang Shi
  2025-01-10  2:08   ` Baolin Wang
@ 2025-01-10  2:32   ` Liu Shixin
  2025-01-10  4:31   ` Matthew Wilcox
  2 siblings, 0 replies; 12+ messages in thread
From: Liu Shixin @ 2025-01-10  2:32 UTC (permalink / raw)
  To: Yang Shi, Andrew Morton, Chengming Zhou, Matthew Wilcox,
	Kefeng Wang, Nanyong Sun, Muchun Song, Qi Zheng, Johannes Weiner
  Cc: linux-mm, linux-kernel



On 2025/1/10 1:00, Yang Shi wrote:
>
>
>
> On 1/8/25 11:00 PM, Liu Shixin wrote:
>> syzkaller reported such a BUG_ON():
>>
>>   ------------[ cut here ]------------
>>   kernel BUG at mm/khugepaged.c:1835!
>>   Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>> ...
>>   CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G        W          6.13.0-rc6 #22
>>   Tainted: [W]=WARN
>>   Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
>>   pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>   pc : collapse_file+0xa44/0x1400
>>   lr : collapse_file+0x88/0x1400
>>   sp : ffff80008afe3a60
>> ...
>>   Call trace:
>>    collapse_file+0xa44/0x1400 (P)
>>    hpage_collapse_scan_file+0x278/0x400
>>    madvise_collapse+0x1bc/0x678
>>    madvise_vma_behavior+0x32c/0x448
>>    madvise_walk_vmas.constprop.0+0xbc/0x140
>>    do_madvise.part.0+0xdc/0x2c8
>>    __arm64_sys_madvise+0x68/0x88
>>    invoke_syscall+0x50/0x120
>>    el0_svc_common.constprop.0+0xc8/0xf0
>>    do_el0_svc+0x24/0x38
>>    el0_svc+0x34/0x128
>>    el0t_64_sync_handler+0xc8/0xd0
>>    el0t_64_sync+0x190/0x198
>>
>> This indicates that the pgoff is unaligned. After analysis, I confirm
>> the vma is mapped to /dev/zero. Such a vma certainly has vm_file, but
>> it is set to anonymous by mmap_zero(). So even if it's mmapped by
>> 2m-unaligned, it can pass the check in thp_vma_allowable_order() as it
>> is an anonymous-mmap, but then be collapsed as a file-mmap.
>>
>> It seems the problem has existed for a long time, but actually, since
>> we have khugepaged_max_ptes_none check before, we will skip collapse it
>> as it is /dev/zero and so has no present page. But commit d8ea7cc8547c
>> limit the check for only khugepaged, so the BUG_ON() can be triggered
>> by madvise_collapse().
>>
>> Add vma_is_anonymous() check to make such vma be processed by
>> hpage_collapse_scan_pmd().
>>
>> Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
>> Signed-off-by: Liu Shixin <liushixin2@huawei.com>
>> ---
>>   mm/khugepaged.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 653dbb1ff05c..eb9d240e42e8 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -2422,7 +2422,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>>               VM_BUG_ON(khugepaged_scan.address < hstart ||
>>                     khugepaged_scan.address + HPAGE_PMD_SIZE >
>>                     hend);
>> -            if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
>> +            if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file &&
>> +                !vma_is_anonymous(vma)) {
>
> Thanks for catching this. It sounds a little bit weird to have vm_file for an anonymous VMA. I'm not sure why we should keep such special case. It seems shared mapping is treated as shmem file mapping. So can we set vm_file to NULL when mmap'ing /dev/zero for private mapping? Something like:
>
> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> index 169eed162a7f..fc332efc5c11 100644
> --- a/drivers/char/mem.c
> +++ b/drivers/char/mem.c
> @@ -527,6 +527,7 @@ static int mmap_zero(struct file *file, struct vm_area_struct *vma)
>         if (vma->vm_flags & VM_SHARED)
>                 return shmem_zero_setup(vma);
>         vma_set_anonymous(vma);
> +       vma->vm_file = NULL;
>         return 0;
>  }
>
Thanks for your advise, I'll look at it again.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma
  2025-01-09 17:00 ` Yang Shi
  2025-01-10  2:08   ` Baolin Wang
  2025-01-10  2:32   ` Liu Shixin
@ 2025-01-10  4:31   ` Matthew Wilcox
  2025-01-10 18:04     ` Yang Shi
  2 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2025-01-10  4:31 UTC (permalink / raw)
  To: Yang Shi
  Cc: Liu Shixin, Andrew Morton, Chengming Zhou, Kefeng Wang,
	Nanyong Sun, Muchun Song, Qi Zheng, Johannes Weiner, linux-mm,
	linux-kernel

On Thu, Jan 09, 2025 at 09:00:24AM -0800, Yang Shi wrote:
> Thanks for catching this. It sounds a little bit weird to have vm_file for
> an anonymous VMA. I'm not sure why we should keep such special case. It
> seems shared mapping is treated as shmem file mapping. So can we set vm_file
> to NULL when mmap'ing /dev/zero for private mapping? Something like:
> 
> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> index 169eed162a7f..fc332efc5c11 100644
> --- a/drivers/char/mem.c
> +++ b/drivers/char/mem.c
> @@ -527,6 +527,7 @@ static int mmap_zero(struct file *file, struct
> vm_area_struct *vma)
>         if (vma->vm_flags & VM_SHARED)
>                 return shmem_zero_setup(vma);
>         vma_set_anonymous(vma);
> +       vma->vm_file = NULL;
>         return 0;
>  }

I'm wary this might cause other bugs somewhere.  rc6 is a bit late to be
introducing such a subtle change.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma
  2025-01-10  4:31   ` Matthew Wilcox
@ 2025-01-10 18:04     ` Yang Shi
  2025-01-10 19:01       ` Matthew Wilcox
  0 siblings, 1 reply; 12+ messages in thread
From: Yang Shi @ 2025-01-10 18:04 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Liu Shixin, Andrew Morton, Chengming Zhou, Kefeng Wang,
	Nanyong Sun, Muchun Song, Qi Zheng, Johannes Weiner, linux-mm,
	linux-kernel




On 1/9/25 8:31 PM, Matthew Wilcox wrote:
> On Thu, Jan 09, 2025 at 09:00:24AM -0800, Yang Shi wrote:
>> Thanks for catching this. It sounds a little bit weird to have vm_file for
>> an anonymous VMA. I'm not sure why we should keep such special case. It
>> seems shared mapping is treated as shmem file mapping. So can we set vm_file
>> to NULL when mmap'ing /dev/zero for private mapping? Something like:
>>
>> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
>> index 169eed162a7f..fc332efc5c11 100644
>> --- a/drivers/char/mem.c
>> +++ b/drivers/char/mem.c
>> @@ -527,6 +527,7 @@ static int mmap_zero(struct file *file, struct
>> vm_area_struct *vma)
>>          if (vma->vm_flags & VM_SHARED)
>>                  return shmem_zero_setup(vma);
>>          vma_set_anonymous(vma);
>> +       vma->vm_file = NULL;
>>          return 0;
>>   }
> I'm wary this might cause other bugs somewhere.  rc6 is a bit late to be
> introducing such a subtle change.

Thanks for the extra caution. Applying the proposed fix in khugepaged 
code is fine to me either. We can try to kill the special case later.

Looking at the code further, I think we should do more to make private 
/dev/zero mapping an anonymous mapping:

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 169eed162a7f..98cfac2bb01f 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -527,6 +527,10 @@ static int mmap_zero(struct file *file, struct 
vm_area_struct *vma)
         if (vma->vm_flags & VM_SHARED)
                 return shmem_zero_setup(vma);
         vma_set_anonymous(vma);
+       fput(vma->vm_file);
+       vma->vm_file = NULL;
+       vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
+
         return 0;
  }


AFAICT, the user visible effect is we will have different entry in 
smaps/maps.

Before the change:
ffffb7190000-ffffb7590000 rw-p 00001000 00:06 8                          
/dev/zero
Size:               4096 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   4 kB
Pss_Dirty:             4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac

After the change:
ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
Size:               4096 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   4 kB
Pss_Dirty:             4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac

I'm not sure who really cares about the difference.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma
  2025-01-10 18:04     ` Yang Shi
@ 2025-01-10 19:01       ` Matthew Wilcox
  2025-01-10 19:40         ` Yang Shi
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2025-01-10 19:01 UTC (permalink / raw)
  To: Yang Shi
  Cc: Liu Shixin, Andrew Morton, Chengming Zhou, Kefeng Wang,
	Nanyong Sun, Muchun Song, Qi Zheng, Johannes Weiner, linux-mm,
	linux-kernel

On Fri, Jan 10, 2025 at 10:04:42AM -0800, Yang Shi wrote:
> On 1/9/25 8:31 PM, Matthew Wilcox wrote:
> > On Thu, Jan 09, 2025 at 09:00:24AM -0800, Yang Shi wrote:
> > > Thanks for catching this. It sounds a little bit weird to have vm_file for
> > > an anonymous VMA. I'm not sure why we should keep such special case. It
> > > seems shared mapping is treated as shmem file mapping. So can we set vm_file
> > > to NULL when mmap'ing /dev/zero for private mapping? Something like:
> > > 
> > > diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> > > index 169eed162a7f..fc332efc5c11 100644
> > > --- a/drivers/char/mem.c
> > > +++ b/drivers/char/mem.c
> > > @@ -527,6 +527,7 @@ static int mmap_zero(struct file *file, struct
> > > vm_area_struct *vma)
> > >          if (vma->vm_flags & VM_SHARED)
> > >                  return shmem_zero_setup(vma);
> > >          vma_set_anonymous(vma);
> > > +       vma->vm_file = NULL;
> > >          return 0;
> > >   }
> > I'm wary this might cause other bugs somewhere.  rc6 is a bit late to be
> > introducing such a subtle change.
> 
> Thanks for the extra caution. Applying the proposed fix in khugepaged code
> is fine to me either. We can try to kill the special case later.
> 
> Looking at the code further, I think we should do more to make private
> /dev/zero mapping an anonymous mapping:

I'm still nervous about this.  We map device inodes in a lot of places.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma
  2025-01-10 19:01       ` Matthew Wilcox
@ 2025-01-10 19:40         ` Yang Shi
  2025-01-11  3:54           ` Liu Shixin
  0 siblings, 1 reply; 12+ messages in thread
From: Yang Shi @ 2025-01-10 19:40 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Liu Shixin, Andrew Morton, Chengming Zhou, Kefeng Wang,
	Nanyong Sun, Muchun Song, Qi Zheng, Johannes Weiner, linux-mm,
	linux-kernel




On 1/10/25 11:01 AM, Matthew Wilcox wrote:
> On Fri, Jan 10, 2025 at 10:04:42AM -0800, Yang Shi wrote:
>> On 1/9/25 8:31 PM, Matthew Wilcox wrote:
>>> On Thu, Jan 09, 2025 at 09:00:24AM -0800, Yang Shi wrote:
>>>> Thanks for catching this. It sounds a little bit weird to have vm_file for
>>>> an anonymous VMA. I'm not sure why we should keep such special case. It
>>>> seems shared mapping is treated as shmem file mapping. So can we set vm_file
>>>> to NULL when mmap'ing /dev/zero for private mapping? Something like:
>>>>
>>>> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
>>>> index 169eed162a7f..fc332efc5c11 100644
>>>> --- a/drivers/char/mem.c
>>>> +++ b/drivers/char/mem.c
>>>> @@ -527,6 +527,7 @@ static int mmap_zero(struct file *file, struct
>>>> vm_area_struct *vma)
>>>>           if (vma->vm_flags & VM_SHARED)
>>>>                   return shmem_zero_setup(vma);
>>>>           vma_set_anonymous(vma);
>>>> +       vma->vm_file = NULL;
>>>>           return 0;
>>>>    }
>>> I'm wary this might cause other bugs somewhere.  rc6 is a bit late to be
>>> introducing such a subtle change.
>> Thanks for the extra caution. Applying the proposed fix in khugepaged code
>> is fine to me either. We can try to kill the special case later.
>>
>> Looking at the code further, I think we should do more to make private
>> /dev/zero mapping an anonymous mapping:
> I'm still nervous about this.  We map device inodes in a lot of places.

Yes, we do. But I don't think this change actually changes the semantic 
of /dev/zero. Shared /dev/zero mapping is still treated as shmem 
mapping, private /dev/zero mapping is treated as anonymous mapping, but 
the current implementation is actually half-baked. It has NULL 
vma->vm_ops which is used to tell kernel whether it is an anonymous vma 
or not, but it also has valid vma->vm_file and vma->vm_pgoff as in file 
offset.

So this special case makes kernel has 3 types of VMA:
     - anonymous VMA: vm_ops is NULL, vm_file is NULL, vm_pgoff is the 
linear address pgoff
     - file VMA: vm_ops is *NOT* NULL, valid vm_file and vm_pgoff is 
index in file
     - private /dev/zero mapping VMA




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma
  2025-01-10 19:40         ` Yang Shi
@ 2025-01-11  3:54           ` Liu Shixin
  2025-01-13 18:51             ` Yang Shi
  0 siblings, 1 reply; 12+ messages in thread
From: Liu Shixin @ 2025-01-11  3:54 UTC (permalink / raw)
  To: Yang Shi, Matthew Wilcox
  Cc: Andrew Morton, Chengming Zhou, Kefeng Wang, Nanyong Sun,
	Muchun Song, Qi Zheng, Johannes Weiner, linux-mm, linux-kernel



On 2025/1/11 3:40, Yang Shi wrote:
>
>
>
> On 1/10/25 11:01 AM, Matthew Wilcox wrote:
>> On Fri, Jan 10, 2025 at 10:04:42AM -0800, Yang Shi wrote:
>>> On 1/9/25 8:31 PM, Matthew Wilcox wrote:
>>>> On Thu, Jan 09, 2025 at 09:00:24AM -0800, Yang Shi wrote:
>>>>> Thanks for catching this. It sounds a little bit weird to have vm_file for
>>>>> an anonymous VMA. I'm not sure why we should keep such special case. It
>>>>> seems shared mapping is treated as shmem file mapping. So can we set vm_file
>>>>> to NULL when mmap'ing /dev/zero for private mapping? Something like:
>>>>>
>>>>> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
>>>>> index 169eed162a7f..fc332efc5c11 100644
>>>>> --- a/drivers/char/mem.c
>>>>> +++ b/drivers/char/mem.c
>>>>> @@ -527,6 +527,7 @@ static int mmap_zero(struct file *file, struct
>>>>> vm_area_struct *vma)
>>>>>           if (vma->vm_flags & VM_SHARED)
>>>>>                   return shmem_zero_setup(vma);
>>>>>           vma_set_anonymous(vma);
>>>>> +       vma->vm_file = NULL;
>>>>>           return 0;
>>>>>    }
>>>> I'm wary this might cause other bugs somewhere.  rc6 is a bit late to be
>>>> introducing such a subtle change.
>>> Thanks for the extra caution. Applying the proposed fix in khugepaged code
>>> is fine to me either. We can try to kill the special case later.
>>>
>>> Looking at the code further, I think we should do more to make private
>>> /dev/zero mapping an anonymous mapping:
>> I'm still nervous about this.  We map device inodes in a lot of places.
>
> Yes, we do. But I don't think this change actually changes the semantic of /dev/zero. Shared /dev/zero mapping is still treated as shmem mapping, private /dev/zero mapping is treated as anonymous mapping, but the current implementation is actually half-baked. It has NULL vma->vm_ops which is used to tell kernel whether it is an anonymous vma or not, but it also has valid vma->vm_file and vma->vm_pgoff as in file offset.
>
> So this special case makes kernel has 3 types of VMA:
>     - anonymous VMA: vm_ops is NULL, vm_file is NULL, vm_pgoff is the linear address pgoff
>     - file VMA: vm_ops is *NOT* NULL, valid vm_file and vm_pgoff is index in file
>     - private /dev/zero mapping VMA
>
I have posted v2 to fix it in a safe way. Link: https://lore.kernel.org/all/20250111034511.2223353-1-liushixin2@huawei.com/

Maybe we can also revisit commit bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") and fix it by another way?

By the way, it seems we collpase the file even after cow for a private file mapping. Is that so?

Thanks,
>
> .
>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma
  2025-01-11  3:54           ` Liu Shixin
@ 2025-01-13 18:51             ` Yang Shi
  0 siblings, 0 replies; 12+ messages in thread
From: Yang Shi @ 2025-01-13 18:51 UTC (permalink / raw)
  To: Liu Shixin, Matthew Wilcox
  Cc: Andrew Morton, Chengming Zhou, Kefeng Wang, Nanyong Sun,
	Muchun Song, Qi Zheng, Johannes Weiner, linux-mm, linux-kernel




On 1/10/25 7:54 PM, Liu Shixin wrote:
>
> On 2025/1/11 3:40, Yang Shi wrote:
>>
>>
>> On 1/10/25 11:01 AM, Matthew Wilcox wrote:
>>> On Fri, Jan 10, 2025 at 10:04:42AM -0800, Yang Shi wrote:
>>>> On 1/9/25 8:31 PM, Matthew Wilcox wrote:
>>>>> On Thu, Jan 09, 2025 at 09:00:24AM -0800, Yang Shi wrote:
>>>>>> Thanks for catching this. It sounds a little bit weird to have vm_file for
>>>>>> an anonymous VMA. I'm not sure why we should keep such special case. It
>>>>>> seems shared mapping is treated as shmem file mapping. So can we set vm_file
>>>>>> to NULL when mmap'ing /dev/zero for private mapping? Something like:
>>>>>>
>>>>>> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
>>>>>> index 169eed162a7f..fc332efc5c11 100644
>>>>>> --- a/drivers/char/mem.c
>>>>>> +++ b/drivers/char/mem.c
>>>>>> @@ -527,6 +527,7 @@ static int mmap_zero(struct file *file, struct
>>>>>> vm_area_struct *vma)
>>>>>>            if (vma->vm_flags & VM_SHARED)
>>>>>>                    return shmem_zero_setup(vma);
>>>>>>            vma_set_anonymous(vma);
>>>>>> +       vma->vm_file = NULL;
>>>>>>            return 0;
>>>>>>     }
>>>>> I'm wary this might cause other bugs somewhere.  rc6 is a bit late to be
>>>>> introducing such a subtle change.
>>>> Thanks for the extra caution. Applying the proposed fix in khugepaged code
>>>> is fine to me either. We can try to kill the special case later.
>>>>
>>>> Looking at the code further, I think we should do more to make private
>>>> /dev/zero mapping an anonymous mapping:
>>> I'm still nervous about this.  We map device inodes in a lot of places.
>> Yes, we do. But I don't think this change actually changes the semantic of /dev/zero. Shared /dev/zero mapping is still treated as shmem mapping, private /dev/zero mapping is treated as anonymous mapping, but the current implementation is actually half-baked. It has NULL vma->vm_ops which is used to tell kernel whether it is an anonymous vma or not, but it also has valid vma->vm_file and vma->vm_pgoff as in file offset.
>>
>> So this special case makes kernel has 3 types of VMA:
>>      - anonymous VMA: vm_ops is NULL, vm_file is NULL, vm_pgoff is the linear address pgoff
>>      - file VMA: vm_ops is *NOT* NULL, valid vm_file and vm_pgoff is index in file
>>      - private /dev/zero mapping VMA
>>
> I have posted v2 to fix it in a safe way. Link: https://lore.kernel.org/all/20250111034511.2223353-1-liushixin2@huawei.com/
>
> Maybe we can also revisit commit bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") and fix it by another way?

What do you mean about revisiting this commit? Reset vm_file and 
recalculate vm_pgoff in vma_set_anonymous()?

>
> By the way, it seems we collpase the file even after cow for a private file mapping. Is that so?

It seems so.

>
> Thanks,
>> .
>>



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-01-13 18:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-09  7:00 [PATCH] mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma Liu Shixin
2025-01-09 13:40 ` Matthew Wilcox
2025-01-10  2:29   ` Liu Shixin
2025-01-09 17:00 ` Yang Shi
2025-01-10  2:08   ` Baolin Wang
2025-01-10  2:32   ` Liu Shixin
2025-01-10  4:31   ` Matthew Wilcox
2025-01-10 18:04     ` Yang Shi
2025-01-10 19:01       ` Matthew Wilcox
2025-01-10 19:40         ` Yang Shi
2025-01-11  3:54           ` Liu Shixin
2025-01-13 18:51             ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox