[PATCH] mm: hugetlb: fix HVO crash on s390

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm: hugetlb: fix HVO crash on s390
@ 2025-10-28 15:39 Luiz Capitulino
  2025-10-28 16:05 ` Joao Martins
  0 siblings, 1 reply; 9+ messages in thread
From: Luiz Capitulino @ 2025-10-28 15:39 UTC (permalink / raw)
  To: hca, borntraeger, joao.m.martins, mike.kravetz, linux-kernel,
	linux-mm, linux-s390
  Cc: osalvador, akpm, david, aneesh.kumar

A reproducible crash occurs when enabling HVO on s390. The crash and the
proposed fix were worked on an s390 KVM guest running on an older
hypervisor, as I don't have access to an LPAR. However, the same
issue should occur on bare-metal.

Reproducer (it may take a few runs to trigger):

 # sysctl vm.hugetlb_optimize_vmemmap=1
 # echo 1 > /proc/sys/vm/nr_hugepages
 # echo 0 > /proc/sys/vm/nr_hugepages

Crash log:

[   52.340369] list_del corruption. prev->next should be 000000d382110008, but was 000000d7116d3880. (prev=000000d7116d3910)
[   52.340420] ------------[ cut here ]------------
[   52.340424] kernel BUG at lib/list_debug.c:62!
[   52.340566] monitor event: 0040 ilc:2 [#1]SMP
[   52.340573] Modules linked in: ctcm fsm qeth ccwgroup zfcp scsi_transport_fc qdio dasd_fba_mod dasd_eckd_mod dasd_mod xfs ghash_s390 prng des_s390 libdes sha3_512_s390 sha3_256_s390 virtio_net virtio_blk net_failover sha_common failover dm_mirror dm_region_hash dm_log dm_mod paes_s390 crypto_engine pkey_cca pkey_ep11 zcrypt pkey_pckmo pkey aes_s390
[   52.340606] CPU: 1 UID: 0 PID: 1672 Comm: root-rep2 Kdump: loaded Not tainted 6.18.0-rc3 #1 NONE
[   52.340610] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
[   52.340611] Krnl PSW : 0704c00180000000 0000015710cda7fe (__list_del_entry_valid_or_report+0xfe/0x128)
[   52.340619]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[   52.340622] Krnl GPRS: c0000000ffffefff 0000000100000027 000000000000006d 0000000000000000
[   52.340623]            000000d7116d35d8 000000d7116d35d0 0000000000000002 000000d7116d39b0
[   52.340625]            000000d7116d3880 000000d7116d3910 000000d7116d3910 000000d382110008
[   52.340626]            000003ffac1ccd08 000000d7116d39b0 0000015710cda7fa 000000d7116d37d0
[   52.340632] Krnl Code: 0000015710cda7ee: c020003e496f	larl	%r2,00000157114a3acc
           0000015710cda7f4: c0e5ffd5280e	brasl	%r14,000001571077f810
          #0000015710cda7fa: af000000		mc	0,0
          >0000015710cda7fe: b9040029		lgr	%r2,%r9
           0000015710cda802: c0e5ffe5e193	brasl	%r14,0000015710996b28
           0000015710cda808: e34090080004	lg	%r4,8(%r9)
           0000015710cda80e: b9040059		lgr	%r5,%r9
           0000015710cda812: b9040038		lgr	%r3,%r8
[   52.340643] Call Trace:
[   52.340645]  [<0000015710cda7fe>] __list_del_entry_valid_or_report+0xfe/0x128
[   52.340649] ([<0000015710cda7fa>] __list_del_entry_valid_or_report+0xfa/0x128)
[   52.340652]  [<0000015710a30b2e>] hugetlb_vmemmap_restore_folios+0x96/0x138
[   52.340655]  [<0000015710a268ac>] update_and_free_pages_bulk+0x64/0x150
[   52.340659]  [<0000015710a26f8a>] set_max_huge_pages+0x4ca/0x6f0
[   52.340662]  [<0000015710a273ba>] hugetlb_sysctl_handler_common+0xea/0x120
[   52.340665]  [<0000015710a27484>] hugetlb_sysctl_handler+0x44/0x50
[   52.340667]  [<0000015710b53ffa>] proc_sys_call_handler+0x17a/0x280
[   52.340672]  [<0000015710a90968>] vfs_write+0x2c8/0x3a0
[   52.340676]  [<0000015710a90bd2>] ksys_write+0x72/0x100
[   52.340679]  [<00000157111483a8>] __do_syscall+0x150/0x318
[   52.340682]  [<0000015711153a5e>] system_call+0x6e/0x90
[   52.340684] Last Breaking-Event-Address:
[   52.340684]  [<000001571077f85c>] _printk+0x4c/0x58
[   52.340690] Kernel panic - not syncing: Fatal exception: panic_on_oops

This issue was introduced by commit f13b83fdd996 ("hugetlb: batch TLB
flushes when freeing vmemmap"). Before that change, the HVO
implementation called flush_tlb_kernel_range() each time a vmemmap
PMD split and remapping was performed. The mentioned commit changed this
to issue a few flush_tlb_all() calls after performing all remappings.

However, on s390, flush_tlb_kernel_range() expands to
__tlb_flush_kernel() while flush_tlb_all() is not implemented. As a
result, we went from flushing the TLB for every remapping to no flushing
at all.

This commit fixes this by introducing vmemmap_flush_tlb_all(), which
expands to __tlb_flush_kernel() on s390 and to flush_tlb_all() on other
archs.

Fixes: f13b83fdd996 ("hugetlb: batch TLB flushes when freeing vmemmap")
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
---
 mm/hugetlb_vmemmap.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index ba0fb1b6a5a8..5819a3088850 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -48,6 +48,15 @@ struct vmemmap_remap_walk {
 	unsigned long		flags;
 };
 
+static inline void vmemmap_flush_tlb_all(void)
+{
+#ifdef CONFIG_S390
+	__tlb_flush_kernel();
+#else
+	flush_tlb_all();
+#endif
+}
+
 static int vmemmap_split_pmd(pmd_t *pmd, struct page *head, unsigned long start,
 			     struct vmemmap_remap_walk *walk)
 {
@@ -539,7 +548,7 @@ long hugetlb_vmemmap_restore_folios(const struct hstate *h,
 	}
 
 	if (restored)
-		flush_tlb_all();
+		vmemmap_flush_tlb_all();
 	if (!ret)
 		ret = restored;
 	return ret;
@@ -703,7 +712,7 @@ static void __hugetlb_vmemmap_optimize_folios(struct hstate *h,
 		 */
 		goto out;
 
-	flush_tlb_all();
+	vmemmap_flush_tlb_all();
 
 	list_for_each_entry(folio, folio_list, lru) {
 		int ret;
@@ -721,7 +730,7 @@ static void __hugetlb_vmemmap_optimize_folios(struct hstate *h,
 		 * allowing more vmemmap remaps to occur.
 		 */
 		if (ret == -ENOMEM && !list_empty(&vmemmap_pages)) {
-			flush_tlb_all();
+			vmemmap_flush_tlb_all();
 			free_vmemmap_page_list(&vmemmap_pages);
 			INIT_LIST_HEAD(&vmemmap_pages);
 			__hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, flags);
@@ -729,7 +738,7 @@ static void __hugetlb_vmemmap_optimize_folios(struct hstate *h,
 	}
 
 out:
-	flush_tlb_all();
+	vmemmap_flush_tlb_all();
 	free_vmemmap_page_list(&vmemmap_pages);
 }
 
-- 
2.51.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: hugetlb: fix HVO crash on s390
  2025-10-28 15:39 [PATCH] mm: hugetlb: fix HVO crash on s390 Luiz Capitulino
@ 2025-10-28 16:05 ` Joao Martins
  2025-10-28 16:13   ` David Hildenbrand
  2025-10-28 16:14   ` Heiko Carstens
  0 siblings, 2 replies; 9+ messages in thread
From: Joao Martins @ 2025-10-28 16:05 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: osalvador, akpm, david, aneesh.kumar, hca, borntraeger,
	mike.kravetz, linux-kernel, linux-mm, linux-s390

On 28/10/2025 15:39, Luiz Capitulino wrote:
> A reproducible crash occurs when enabling HVO on s390. The crash and the
> proposed fix were worked on an s390 KVM guest running on an older
> hypervisor, as I don't have access to an LPAR. However, the same
> issue should occur on bare-metal.
> 
> Reproducer (it may take a few runs to trigger):
> 
>  # sysctl vm.hugetlb_optimize_vmemmap=1
>  # echo 1 > /proc/sys/vm/nr_hugepages
>  # echo 0 > /proc/sys/vm/nr_hugepages
> 
> Crash log:
> 
> [   52.340369] list_del corruption. prev->next should be 000000d382110008, but was 000000d7116d3880. (prev=000000d7116d3910)
> [   52.340420] ------------[ cut here ]------------
> [   52.340424] kernel BUG at lib/list_debug.c:62!
> [   52.340566] monitor event: 0040 ilc:2 [#1]SMP
> [   52.340573] Modules linked in: ctcm fsm qeth ccwgroup zfcp scsi_transport_fc qdio dasd_fba_mod dasd_eckd_mod dasd_mod xfs ghash_s390 prng des_s390 libdes sha3_512_s390 sha3_256_s390 virtio_net virtio_blk net_failover sha_common failover dm_mirror dm_region_hash dm_log dm_mod paes_s390 crypto_engine pkey_cca pkey_ep11 zcrypt pkey_pckmo pkey aes_s390
> [   52.340606] CPU: 1 UID: 0 PID: 1672 Comm: root-rep2 Kdump: loaded Not tainted 6.18.0-rc3 #1 NONE
> [   52.340610] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
> [   52.340611] Krnl PSW : 0704c00180000000 0000015710cda7fe (__list_del_entry_valid_or_report+0xfe/0x128)
> [   52.340619]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [   52.340622] Krnl GPRS: c0000000ffffefff 0000000100000027 000000000000006d 0000000000000000
> [   52.340623]            000000d7116d35d8 000000d7116d35d0 0000000000000002 000000d7116d39b0
> [   52.340625]            000000d7116d3880 000000d7116d3910 000000d7116d3910 000000d382110008
> [   52.340626]            000003ffac1ccd08 000000d7116d39b0 0000015710cda7fa 000000d7116d37d0
> [   52.340632] Krnl Code: 0000015710cda7ee: c020003e496f	larl	%r2,00000157114a3acc
>            0000015710cda7f4: c0e5ffd5280e	brasl	%r14,000001571077f810
>           #0000015710cda7fa: af000000		mc	0,0
>           >0000015710cda7fe: b9040029		lgr	%r2,%r9
>            0000015710cda802: c0e5ffe5e193	brasl	%r14,0000015710996b28
>            0000015710cda808: e34090080004	lg	%r4,8(%r9)
>            0000015710cda80e: b9040059		lgr	%r5,%r9
>            0000015710cda812: b9040038		lgr	%r3,%r8
> [   52.340643] Call Trace:
> [   52.340645]  [<0000015710cda7fe>] __list_del_entry_valid_or_report+0xfe/0x128
> [   52.340649] ([<0000015710cda7fa>] __list_del_entry_valid_or_report+0xfa/0x128)
> [   52.340652]  [<0000015710a30b2e>] hugetlb_vmemmap_restore_folios+0x96/0x138
> [   52.340655]  [<0000015710a268ac>] update_and_free_pages_bulk+0x64/0x150
> [   52.340659]  [<0000015710a26f8a>] set_max_huge_pages+0x4ca/0x6f0
> [   52.340662]  [<0000015710a273ba>] hugetlb_sysctl_handler_common+0xea/0x120
> [   52.340665]  [<0000015710a27484>] hugetlb_sysctl_handler+0x44/0x50
> [   52.340667]  [<0000015710b53ffa>] proc_sys_call_handler+0x17a/0x280
> [   52.340672]  [<0000015710a90968>] vfs_write+0x2c8/0x3a0
> [   52.340676]  [<0000015710a90bd2>] ksys_write+0x72/0x100
> [   52.340679]  [<00000157111483a8>] __do_syscall+0x150/0x318
> [   52.340682]  [<0000015711153a5e>] system_call+0x6e/0x90
> [   52.340684] Last Breaking-Event-Address:
> [   52.340684]  [<000001571077f85c>] _printk+0x4c/0x58
> [   52.340690] Kernel panic - not syncing: Fatal exception: panic_on_oops
> 
> This issue was introduced by commit f13b83fdd996 ("hugetlb: batch TLB
> flushes when freeing vmemmap"). Before that change, the HVO
> implementation called flush_tlb_kernel_range() each time a vmemmap
> PMD split and remapping was performed. The mentioned commit changed this
> to issue a few flush_tlb_all() calls after performing all remappings.
> 
> However, on s390, flush_tlb_kernel_range() expands to
> __tlb_flush_kernel() while flush_tlb_all() is not implemented. As a
> result, we went from flushing the TLB for every remapping to no flushing
> at all.
> 
> This commit fixes this by introducing vmemmap_flush_tlb_all(), which
> expands to __tlb_flush_kernel() on s390 and to flush_tlb_all() on other
> archs.
> 
> Fixes: f13b83fdd996 ("hugetlb: batch TLB flushes when freeing vmemmap")>
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
> ---
>  mm/hugetlb_vmemmap.c | 17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index ba0fb1b6a5a8..5819a3088850 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -48,6 +48,15 @@ struct vmemmap_remap_walk {
>  	unsigned long		flags;
>  };
>  
> +static inline void vmemmap_flush_tlb_all(void)
> +{
> +#ifdef CONFIG_S390
> +	__tlb_flush_kernel();
> +#else
> +	flush_tlb_all();
> +#endif
> +}
> +

Wouldn't a better fix be to implement flush_tlb_all() in
s390/include/asm/tlbflush.h since that aliases to __tlb_flush_kernel()?

>  static int vmemmap_split_pmd(pmd_t *pmd, struct page *head, unsigned long start,
>  			     struct vmemmap_remap_walk *walk)
>  {
> @@ -539,7 +548,7 @@ long hugetlb_vmemmap_restore_folios(const struct hstate *h,
>  	}
>  
>  	if (restored)
> -		flush_tlb_all();
> +		vmemmap_flush_tlb_all();
>  	if (!ret)
>  		ret = restored;
>  	return ret;
> @@ -703,7 +712,7 @@ static void __hugetlb_vmemmap_optimize_folios(struct hstate *h,
>  		 */
>  		goto out;
>  
> -	flush_tlb_all();
> +	vmemmap_flush_tlb_all();
>  
>  	list_for_each_entry(folio, folio_list, lru) {
>  		int ret;
> @@ -721,7 +730,7 @@ static void __hugetlb_vmemmap_optimize_folios(struct hstate *h,
>  		 * allowing more vmemmap remaps to occur.
>  		 */
>  		if (ret == -ENOMEM && !list_empty(&vmemmap_pages)) {
> -			flush_tlb_all();
> +			vmemmap_flush_tlb_all();
>  			free_vmemmap_page_list(&vmemmap_pages);
>  			INIT_LIST_HEAD(&vmemmap_pages);
>  			__hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, flags);
> @@ -729,7 +738,7 @@ static void __hugetlb_vmemmap_optimize_folios(struct hstate *h,
>  	}
>  
>  out:
> -	flush_tlb_all();
> +	vmemmap_flush_tlb_all();
>  	free_vmemmap_page_list(&vmemmap_pages);
>  }
>  



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: hugetlb: fix HVO crash on s390
  2025-10-28 16:05 ` Joao Martins
@ 2025-10-28 16:13   ` David Hildenbrand
  2025-10-28 16:14   ` Heiko Carstens
  1 sibling, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2025-10-28 16:13 UTC (permalink / raw)
  To: Joao Martins, Luiz Capitulino
  Cc: osalvador, akpm, aneesh.kumar, hca, borntraeger, mike.kravetz,
	linux-kernel, linux-mm, linux-s390

On 28.10.25 17:05, Joao Martins wrote:
> On 28/10/2025 15:39, Luiz Capitulino wrote:
>> A reproducible crash occurs when enabling HVO on s390. The crash and the
>> proposed fix were worked on an s390 KVM guest running on an older
>> hypervisor, as I don't have access to an LPAR. However, the same
>> issue should occur on bare-metal.
>>
>> Reproducer (it may take a few runs to trigger):
>>
>>   # sysctl vm.hugetlb_optimize_vmemmap=1
>>   # echo 1 > /proc/sys/vm/nr_hugepages
>>   # echo 0 > /proc/sys/vm/nr_hugepages
>>
>> Crash log:
>>
>> [   52.340369] list_del corruption. prev->next should be 000000d382110008, but was 000000d7116d3880. (prev=000000d7116d3910)
>> [   52.340420] ------------[ cut here ]------------
>> [   52.340424] kernel BUG at lib/list_debug.c:62!
>> [   52.340566] monitor event: 0040 ilc:2 [#1]SMP
>> [   52.340573] Modules linked in: ctcm fsm qeth ccwgroup zfcp scsi_transport_fc qdio dasd_fba_mod dasd_eckd_mod dasd_mod xfs ghash_s390 prng des_s390 libdes sha3_512_s390 sha3_256_s390 virtio_net virtio_blk net_failover sha_common failover dm_mirror dm_region_hash dm_log dm_mod paes_s390 crypto_engine pkey_cca pkey_ep11 zcrypt pkey_pckmo pkey aes_s390
>> [   52.340606] CPU: 1 UID: 0 PID: 1672 Comm: root-rep2 Kdump: loaded Not tainted 6.18.0-rc3 #1 NONE
>> [   52.340610] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
>> [   52.340611] Krnl PSW : 0704c00180000000 0000015710cda7fe (__list_del_entry_valid_or_report+0xfe/0x128)
>> [   52.340619]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>> [   52.340622] Krnl GPRS: c0000000ffffefff 0000000100000027 000000000000006d 0000000000000000
>> [   52.340623]            000000d7116d35d8 000000d7116d35d0 0000000000000002 000000d7116d39b0
>> [   52.340625]            000000d7116d3880 000000d7116d3910 000000d7116d3910 000000d382110008
>> [   52.340626]            000003ffac1ccd08 000000d7116d39b0 0000015710cda7fa 000000d7116d37d0
>> [   52.340632] Krnl Code: 0000015710cda7ee: c020003e496f	larl	%r2,00000157114a3acc
>>             0000015710cda7f4: c0e5ffd5280e	brasl	%r14,000001571077f810
>>            #0000015710cda7fa: af000000		mc	0,0
>>            >0000015710cda7fe: b9040029		lgr	%r2,%r9
>>             0000015710cda802: c0e5ffe5e193	brasl	%r14,0000015710996b28
>>             0000015710cda808: e34090080004	lg	%r4,8(%r9)
>>             0000015710cda80e: b9040059		lgr	%r5,%r9
>>             0000015710cda812: b9040038		lgr	%r3,%r8
>> [   52.340643] Call Trace:
>> [   52.340645]  [<0000015710cda7fe>] __list_del_entry_valid_or_report+0xfe/0x128
>> [   52.340649] ([<0000015710cda7fa>] __list_del_entry_valid_or_report+0xfa/0x128)
>> [   52.340652]  [<0000015710a30b2e>] hugetlb_vmemmap_restore_folios+0x96/0x138
>> [   52.340655]  [<0000015710a268ac>] update_and_free_pages_bulk+0x64/0x150
>> [   52.340659]  [<0000015710a26f8a>] set_max_huge_pages+0x4ca/0x6f0
>> [   52.340662]  [<0000015710a273ba>] hugetlb_sysctl_handler_common+0xea/0x120
>> [   52.340665]  [<0000015710a27484>] hugetlb_sysctl_handler+0x44/0x50
>> [   52.340667]  [<0000015710b53ffa>] proc_sys_call_handler+0x17a/0x280
>> [   52.340672]  [<0000015710a90968>] vfs_write+0x2c8/0x3a0
>> [   52.340676]  [<0000015710a90bd2>] ksys_write+0x72/0x100
>> [   52.340679]  [<00000157111483a8>] __do_syscall+0x150/0x318
>> [   52.340682]  [<0000015711153a5e>] system_call+0x6e/0x90
>> [   52.340684] Last Breaking-Event-Address:
>> [   52.340684]  [<000001571077f85c>] _printk+0x4c/0x58
>> [   52.340690] Kernel panic - not syncing: Fatal exception: panic_on_oops
>>
>> This issue was introduced by commit f13b83fdd996 ("hugetlb: batch TLB
>> flushes when freeing vmemmap"). Before that change, the HVO
>> implementation called flush_tlb_kernel_range() each time a vmemmap
>> PMD split and remapping was performed. The mentioned commit changed this
>> to issue a few flush_tlb_all() calls after performing all remappings.
>>
>> However, on s390, flush_tlb_kernel_range() expands to
>> __tlb_flush_kernel() while flush_tlb_all() is not implemented. As a
>> result, we went from flushing the TLB for every remapping to no flushing
>> at all.
>>
>> This commit fixes this by introducing vmemmap_flush_tlb_all(), which
>> expands to __tlb_flush_kernel() on s390 and to flush_tlb_all() on other
>> archs.
>>
>> Fixes: f13b83fdd996 ("hugetlb: batch TLB flushes when freeing vmemmap")>
> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
>> ---
>>   mm/hugetlb_vmemmap.c | 17 +++++++++++++----
>>   1 file changed, 13 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>> index ba0fb1b6a5a8..5819a3088850 100644
>> --- a/mm/hugetlb_vmemmap.c
>> +++ b/mm/hugetlb_vmemmap.c
>> @@ -48,6 +48,15 @@ struct vmemmap_remap_walk {
>>   	unsigned long		flags;
>>   };
>>   
>> +static inline void vmemmap_flush_tlb_all(void)
>> +{
>> +#ifdef CONFIG_S390
>> +	__tlb_flush_kernel();
>> +#else
>> +	flush_tlb_all();
>> +#endif
>> +}
>> +
> 
> Wouldn't a better fix be to implement flush_tlb_all() in
> s390/include/asm/tlbflush.h since that aliases to __tlb_flush_kernel()?

Agreed, that feels cleaner and avoids this ifdef here.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: hugetlb: fix HVO crash on s390
  2025-10-28 16:05 ` Joao Martins
  2025-10-28 16:13   ` David Hildenbrand
@ 2025-10-28 16:14   ` Heiko Carstens
  2025-10-28 16:48     ` Joao Martins
  1 sibling, 1 reply; 9+ messages in thread
From: Heiko Carstens @ 2025-10-28 16:14 UTC (permalink / raw)
  To: Joao Martins
  Cc: Luiz Capitulino, osalvador, akpm, david, aneesh.kumar,
	borntraeger, mike.kravetz, linux-kernel, linux-mm, linux-s390

On Tue, Oct 28, 2025 at 04:05:45PM +0000, Joao Martins wrote:
> > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> > index ba0fb1b6a5a8..5819a3088850 100644
> > --- a/mm/hugetlb_vmemmap.c
> > +++ b/mm/hugetlb_vmemmap.c
> > @@ -48,6 +48,15 @@ struct vmemmap_remap_walk {
> >  	unsigned long		flags;
> >  };
> >  
> > +static inline void vmemmap_flush_tlb_all(void)
> > +{
> > +#ifdef CONFIG_S390
> > +	__tlb_flush_kernel();
> > +#else
> > +	flush_tlb_all();
> > +#endif
> > +}
> > +
> 
> Wouldn't a better fix be to implement flush_tlb_all() in
> s390/include/asm/tlbflush.h since that aliases to __tlb_flush_kernel()?

The question is rather what is flush_tlb_all() supposed to flush? Is
it supposed to flush only tlb entries corresponding to the kernel
address space, or should it flush just everything?

Within this context it looks like only tlb flushing for the kernel
address space is required(?)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: hugetlb: fix HVO crash on s390
  2025-10-28 16:14   ` Heiko Carstens
@ 2025-10-28 16:48     ` Joao Martins
  2025-10-28 17:02       ` Heiko Carstens
  0 siblings, 1 reply; 9+ messages in thread
From: Joao Martins @ 2025-10-28 16:48 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Luiz Capitulino, osalvador, akpm, david, aneesh.kumar,
	borntraeger, mike.kravetz, linux-kernel, linux-mm, linux-s390

On 28/10/2025 16:14, Heiko Carstens wrote:
> On Tue, Oct 28, 2025 at 04:05:45PM +0000, Joao Martins wrote:
>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>>> index ba0fb1b6a5a8..5819a3088850 100644
>>> --- a/mm/hugetlb_vmemmap.c
>>> +++ b/mm/hugetlb_vmemmap.c
>>> @@ -48,6 +48,15 @@ struct vmemmap_remap_walk {
>>>  	unsigned long		flags;
>>>  };
>>>  
>>> +static inline void vmemmap_flush_tlb_all(void)
>>> +{
>>> +#ifdef CONFIG_S390
>>> +	__tlb_flush_kernel();
>>> +#else
>>> +	flush_tlb_all();
>>> +#endif
>>> +}
>>> +
>>
>> Wouldn't a better fix be to implement flush_tlb_all() in
>> s390/include/asm/tlbflush.h since that aliases to __tlb_flush_kernel()?
> 
> The question is rather what is flush_tlb_all() supposed to flush? Is
> it supposed to flush only tlb entries corresponding to the kernel
> address space, or should it flush just everything?
> 
The latter i.e. everything

At least as far as I understand

> Within this context it looks like only tlb flushing for the kernel
> address space is required(?)

That's correct. We are changing the vmemmap which is in the kernel address
space, so that's the intent.

flush_tlb_all() however is the *closest* equivalent to this that's behind an
arch generic API i.e. flushing kernel address space on all CPUs TLBs. IIUC, x86
when doing flush_tlb_kernel_range with enough pages it switches to flush_tlb_all
(these days on modern AMDs it's even one instruction solely in the calling CPU).


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: hugetlb: fix HVO crash on s390
  2025-10-28 16:48     ` Joao Martins
@ 2025-10-28 17:02       ` Heiko Carstens
  2025-10-28 17:15         ` Luiz Capitulino
  0 siblings, 1 reply; 9+ messages in thread
From: Heiko Carstens @ 2025-10-28 17:02 UTC (permalink / raw)
  To: Joao Martins
  Cc: Luiz Capitulino, osalvador, akpm, david, aneesh.kumar,
	borntraeger, mike.kravetz, linux-kernel, linux-mm, linux-s390

On Tue, Oct 28, 2025 at 04:48:57PM +0000, Joao Martins wrote:
> On 28/10/2025 16:14, Heiko Carstens wrote:
> > On Tue, Oct 28, 2025 at 04:05:45PM +0000, Joao Martins wrote:
> >>> +static inline void vmemmap_flush_tlb_all(void)
> >>> +{
> >>> +#ifdef CONFIG_S390
> >>> +	__tlb_flush_kernel();
> >>> +#else
> >>> +	flush_tlb_all();
> >>> +#endif
> >>> +}
> >>> +
> >>
> >> Wouldn't a better fix be to implement flush_tlb_all() in
> >> s390/include/asm/tlbflush.h since that aliases to __tlb_flush_kernel()?
> > 
> > The question is rather what is flush_tlb_all() supposed to flush? Is
> > it supposed to flush only tlb entries corresponding to the kernel
> > address space, or should it flush just everything?
> > 
> The latter i.e. everything
> 
> At least as far as I understand
> 
> > Within this context it looks like only tlb flushing for the kernel
> > address space is required(?)
> 
> That's correct. We are changing the vmemmap which is in the kernel address
> space, so that's the intent.
> 
> flush_tlb_all() however is the *closest* equivalent to this that's behind an
> arch generic API i.e. flushing kernel address space on all CPUs TLBs. IIUC, x86
> when doing flush_tlb_kernel_range with enough pages it switches to flush_tlb_all
> (these days on modern AMDs it's even one instruction solely in the calling CPU).

Considering that flush_tlb_all() should be mapped to __tlb_flush_global()
and not __tlb_flush_kernel() on s390.

However if there is only a need to flush tlb entries for the complete(?)
kernel address space, then I'd rather propose a new tlb_flush_kernel()
instead of a big hammer. If I'm not mistaken flush_tlb_kernel_range()
exists for just avoiding that. And if architectures can avoid a global
flush of _all_ tlb entries then that should be made possible.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: hugetlb: fix HVO crash on s390
  2025-10-28 17:02       ` Heiko Carstens
@ 2025-10-28 17:15         ` Luiz Capitulino
  2025-10-28 19:37           ` Heiko Carstens
  0 siblings, 1 reply; 9+ messages in thread
From: Luiz Capitulino @ 2025-10-28 17:15 UTC (permalink / raw)
  To: Heiko Carstens, Joao Martins
  Cc: osalvador, akpm, david, aneesh.kumar, borntraeger, mike.kravetz,
	linux-kernel, linux-mm, linux-s390

On 2025-10-28 13:02, Heiko Carstens wrote:
> On Tue, Oct 28, 2025 at 04:48:57PM +0000, Joao Martins wrote:
>> On 28/10/2025 16:14, Heiko Carstens wrote:
>>> On Tue, Oct 28, 2025 at 04:05:45PM +0000, Joao Martins wrote:
>>>>> +static inline void vmemmap_flush_tlb_all(void)
>>>>> +{
>>>>> +#ifdef CONFIG_S390
>>>>> +	__tlb_flush_kernel();
>>>>> +#else
>>>>> +	flush_tlb_all();
>>>>> +#endif
>>>>> +}
>>>>> +
>>>>
>>>> Wouldn't a better fix be to implement flush_tlb_all() in
>>>> s390/include/asm/tlbflush.h since that aliases to __tlb_flush_kernel()?
>>>
>>> The question is rather what is flush_tlb_all() supposed to flush? Is
>>> it supposed to flush only tlb entries corresponding to the kernel
>>> address space, or should it flush just everything?
>>>
>> The latter i.e. everything
>>
>> At least as far as I understand
>>
>>> Within this context it looks like only tlb flushing for the kernel
>>> address space is required(?)
>>
>> That's correct. We are changing the vmemmap which is in the kernel address
>> space, so that's the intent.
>>
>> flush_tlb_all() however is the *closest* equivalent to this that's behind an
>> arch generic API i.e. flushing kernel address space on all CPUs TLBs. IIUC, x86
>> when doing flush_tlb_kernel_range with enough pages it switches to flush_tlb_all
>> (these days on modern AMDs it's even one instruction solely in the calling CPU).
> 
> Considering that flush_tlb_all() should be mapped to __tlb_flush_global()
> and not __tlb_flush_kernel() on s390.

You're right.

> However if there is only a need to flush tlb entries for the complete(?)
> kernel address space, then I'd rather propose a new tlb_flush_kernel()
> instead of a big hammer. If I'm not mistaken flush_tlb_kernel_range()
> exists for just avoiding that. And if architectures can avoid a global
> flush of _all_ tlb entries then that should be made possible.

Should we take a v2 doing your suggestion above for now and work on
the tlb_flush_kernel() idea as a follow up improvement? At least we
go from crashing to flushing more than we should...



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: hugetlb: fix HVO crash on s390
  2025-10-28 17:15         ` Luiz Capitulino
@ 2025-10-28 19:37           ` Heiko Carstens
  2025-10-28 21:14             ` Luiz Capitulino
  0 siblings, 1 reply; 9+ messages in thread
From: Heiko Carstens @ 2025-10-28 19:37 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Joao Martins, osalvador, akpm, david, aneesh.kumar, borntraeger,
	mike.kravetz, linux-kernel, linux-mm, linux-s390, Vasily Gorbik,
	Gerald Schaefer, Alexander Gordeev

On Tue, Oct 28, 2025 at 01:15:57PM -0400, Luiz Capitulino wrote:
> > > flush_tlb_all() however is the *closest* equivalent to this that's behind an
> > > arch generic API i.e. flushing kernel address space on all CPUs TLBs. IIUC, x86
> > > when doing flush_tlb_kernel_range with enough pages it switches to flush_tlb_all
> > > (these days on modern AMDs it's even one instruction solely in the calling CPU).
> > 
> > Considering that flush_tlb_all() should be mapped to __tlb_flush_global()
> > and not __tlb_flush_kernel() on s390.
> 
> You're right.
> 
> > However if there is only a need to flush tlb entries for the complete(?)
> > kernel address space, then I'd rather propose a new tlb_flush_kernel()
> > instead of a big hammer. If I'm not mistaken flush_tlb_kernel_range()
> > exists for just avoiding that. And if architectures can avoid a global
> > flush of _all_ tlb entries then that should be made possible.
> 
> Should we take a v2 doing your suggestion above for now and work on
> the tlb_flush_kernel() idea as a follow up improvement? At least we
> go from crashing to flushing more than we should...

That's of course fine. I guess for stable backports a small fix is the
best way forward anyway.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: hugetlb: fix HVO crash on s390
  2025-10-28 19:37           ` Heiko Carstens
@ 2025-10-28 21:14             ` Luiz Capitulino
  0 siblings, 0 replies; 9+ messages in thread
From: Luiz Capitulino @ 2025-10-28 21:14 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Joao Martins, osalvador, akpm, david, aneesh.kumar, borntraeger,
	mike.kravetz, linux-kernel, linux-mm, linux-s390, Vasily Gorbik,
	Gerald Schaefer, Alexander Gordeev

On 2025-10-28 15:37, Heiko Carstens wrote:
> On Tue, Oct 28, 2025 at 01:15:57PM -0400, Luiz Capitulino wrote:
>>>> flush_tlb_all() however is the *closest* equivalent to this that's behind an
>>>> arch generic API i.e. flushing kernel address space on all CPUs TLBs. IIUC, x86
>>>> when doing flush_tlb_kernel_range with enough pages it switches to flush_tlb_all
>>>> (these days on modern AMDs it's even one instruction solely in the calling CPU).
>>>
>>> Considering that flush_tlb_all() should be mapped to __tlb_flush_global()
>>> and not __tlb_flush_kernel() on s390.
>>
>> You're right.
>>
>>> However if there is only a need to flush tlb entries for the complete(?)
>>> kernel address space, then I'd rather propose a new tlb_flush_kernel()
>>> instead of a big hammer. If I'm not mistaken flush_tlb_kernel_range()
>>> exists for just avoiding that. And if architectures can avoid a global
>>> flush of _all_ tlb entries then that should be made possible.
>>
>> Should we take a v2 doing your suggestion above for now and work on
>> the tlb_flush_kernel() idea as a follow up improvement? At least we
>> go from crashing to flushing more than we should...
> 
> That's of course fine. I guess for stable backports a small fix is the
> best way forward anyway.

Exactly. I'll also see if I can find time to explore your API
improvement suggestion. I'll send v2 shortly.



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-10-28 21:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-28 15:39 [PATCH] mm: hugetlb: fix HVO crash on s390 Luiz Capitulino
2025-10-28 16:05 ` Joao Martins
2025-10-28 16:13   ` David Hildenbrand
2025-10-28 16:14   ` Heiko Carstens
2025-10-28 16:48     ` Joao Martins
2025-10-28 17:02       ` Heiko Carstens
2025-10-28 17:15         ` Luiz Capitulino
2025-10-28 19:37           ` Heiko Carstens
2025-10-28 21:14             ` Luiz Capitulino

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox