* [REGRESSION] NULL pointer dereference on ARM (AT91SAM9G25) during compaction
@ 2025-02-10 16:49 Ezra Buehler
2025-02-10 17:03 ` Russell King (Oracle)
0 siblings, 1 reply; 15+ messages in thread
From: Ezra Buehler @ 2025-02-10 16:49 UTC (permalink / raw)
To: linux-mm
Cc: Qi Zheng, Russell King, Andrew Morton, David Hildenbrand,
Russell King (Oracle), Mike Rapoport (Microsoft),
Muchun Song, Vlastimil Babka, Ryan Roberts, Vishal Moola (Oracle),
Hugh Dickins, Matthew Wilcox, Peter Xu, Nicolas Ferre,
Alexandre Belloni, Claudiu Beznea, open list, linux-arm-kernel
When running vanilla Linux 6.13 or newer (6.14-rc2) on the
AT91SAM9G25-based GARDENA smart Gateway, we are seeing a NULL pointer
dereference resulting in a kernel panic. The culprit seems to be commit
fc9c45b71f43 ("arm: adjust_pte() usepte_offset_map_rw_nolock()").
Reverting the commit apparently fixes the issue.
Any ideas what the root cause might be? Or any hints where to dig
deeper are highly appreciated.
After the system being up for several minutes, we get the following:
[ 490.632656] Unable to handle kernel NULL pointer dereference at
virtual address 00000030 when read
[ 490.641557] [00000030] *pgd=00000000
[ 490.645101] Internal error: Oops - BUG: 17 [#1] ARM
[ 490.649939] Modules linked in: nft_compat rtl8xxxu mac80211 libarc4
cfg80211 firmware_class
[ 490.658358] CPU: 0 UID: 0 PID: 17 Comm: kcompactd0 Not tainted
6.14.0-rc2-r0.0.16-yocto-tiny #1
[ 490.667010] Hardware name: Atmel AT91SAM9
[ 490.670986] PC is at update_mmu_cache_range+0x1e0/0x278
[ 490.676204] LR is at pte_offset_map_rw_nolock+0x18/0x2c
[ 490.681422] pc : [<c010faf4>] lr : [<c0205d38>] psr: a0000093
[ 490.687641] sp : c0d8bbf0 ip : 20000000 fp : 00000000
[ 490.692824] r10: 00000000 r9 : c201677c r8 : b61df000
[ 490.698009] r7 : 00000000 r6 : 00025c0d r5 : c14c3480 r4 : c14c3600
[ 490.704488] r3 : c207b77c r2 : 0000077c r1 : 002d877c r0 : c207b77c
[ 490.710970] Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM
Segment user
[ 490.718147] Control: 0005317f Table: 22080000 DAC: 00000055
[ 490.723843] Register r0 information: non-slab/vmalloc memory
[ 490.729467] Register r1 information: non-paged memory
[ 490.734478] Register r2 information: non-paged memory
[ 490.739490] Register r3 information: non-slab/vmalloc memory
[ 490.745114] Register r4 information: slab vm_area_struct start
c14c3600 pointer offset 0 size 64
[ 490.753936] Register r5 information: slab vm_area_struct start
c14c3480 pointer offset 0 size 64
[ 490.762748] Register r6 information: non-paged memory
[ 490.767759] Register r7 information: NULL pointer
[ 490.772425] Register r8 information: non-paged memory
[ 490.777435] Register r9 information: non-slab/vmalloc memory
[ 490.783052] Register r10 information: NULL pointer
[ 490.787804] Register r11 information: NULL pointer
[ 490.792556] Register r12 information: non-paged memory
[ 490.797654] Process kcompactd0 (pid: 17, stack limit = 0x29a0a8ac)
[ 490.803790] Stack: (0xc0d8bbf0 to 0xc0d8c000)
[ 490.808126] bbe0: c0d8bc0c
c021c680 c14d7250 b65df000
[ 490.816259] bc00: c1655c40 c2082d80 c14c3480 c1655c6c 2207b831
4fde8caf c7faa1d4 c7faa1d4
[ 490.824389] bc20: c14c3480 c7faa1d4 c0abf87c 38e38e39 c0a076e4
00000001 c0d8bccc c021a500
[ 490.832516] bc40: 00000001 c021aaa0 00000000 25c0d18d 00022711
00000001 00000000 c14c3480
[ 490.840648] bc60: b65df000 c2082d90 c201677c c1655c6c 00000003
4fde8caf 00000001 c14c3480
[ 490.848777] bc80: c0d8bcd4 c7faa1d4 c12309a8 00000000 00000000
00000000 b65df000 c0206348
[ 490.856906] bca0: c021a264 c7faa1d4 00000000 c0d8bdec 00000001
00000000 00000000 00000000
[ 490.865037] bcc0: 00000000 c021b0e0 c101c080 c7f32e64 00000000
c0d8bccc 00000000 c021a264
[ 490.873168] bce0: 00000000 00000000 00000000 4fde8caf 00000000
c7f32e64 c7faa1d4 c021b924
[ 490.881301] bd00: c0d8bdbc c0a0add0 c0a0a980 00000000 00000001
c7faa948 c7f32e40 c0d8be9c
[ 490.889430] bd20: c0d8bd54 c12309a8 c01f2b68 c0d8bdcc 00000000
00000000 c7f32e68 c12309a8
[ 490.897561] bd40: c01f2e34 c0c5a580 c0f7ab40 c7f32e68 c7f2fc10
c7faa94c c7fad01c 4fde8caf
[ 490.905691] bd60: c0c3c010 c0d8bef4 c7f34fe0 c0d8bdcc 00000000
c0d8bdbc c0d8bdc4 c7f2c000
[ 490.913820] bd80: c01f2e34 c021bdc8 00000000 00000000 c0d8bdcc
c0d8bdc4 c0d8bdec 00000003
[ 490.921951] bda0: c0d8be9c c01f2b68 c0d8be9c c0d8be68 c0c13cd0
c0d8bdb4 c0d8bdb4 c0d8bdbc
[ 490.930079] bdc0: c0d8bdbc c0d8bdc4 c0d8bdc4 c0d8bdcc c0d8bdcc
00000000 00000000 00000000
[ 490.938209] bde0: 00000000 00000000 00000000 000000d2 00000000
00000000 00000000 00000000
[ 490.946340] be00: 00000000 4fde8caf c7f245a8 c0d8be9c 00000000
00022400 00022800 00000000
[ 490.954472] be20: c0d8bef4 c7f2c000 000003ff c01f4a10 00000001
00000000 c0d8be68 0010f3cf
[ 490.962598] be40: 00000000 00000000 00020000 00000000 00000001
0000000c c0a0a980 000000f1
[ 490.970732] be60: 00000041 00000000 00000020 4fde8caf c23cab48
c0a8ede4 00000001 c0a8f25c
[ 490.978861] be80: 00002001 c0a8f2dc 00000000 00000000 00000000
c01f4e88 c0f7ac00 c7fad040
[ 490.986993] bea0: c7fad040 c7fad184 c7fb2d7c c7fbf454 c7fb2b84
c7fbfde4 c7fae6e4 c0d8bebc
[ 490.995124] bec0: c0d8bebc c0d8bec4 c0d8bec4 c0d8becc c0d8becc
c0d8bed4 c0d8bed4 c0d8bedc
[ 491.003254] bee0: c0d8bedc c0d8bee4 c0d8bee4 c0d8beec c0d8beec
c7f2fbec c7f2c4f0 00000135
[ 491.011382] bf00: 00000135 00025ff5 00022800 00000000 c0a8ede4
00002800 000021f6 00000000
[ 491.019511] bf20: 00000cc0 ffffffff 00000000 00000000 00000000
00000001 00000001 01000101
[ 491.027643] bf40: 00000000 4fde8caf c0a8ede4 0000005c c0a0345c
c01f53d8 00000000 c0c5a580
[ 491.035774] bf60: c014081c c0d8bf64 c0d8bf64 4fde8caf c0d8bf84
c0cdc680 c0ce4120 c0c5a580
[ 491.043903] bf80: c0ce4120 c01f5204 c0a8ede4 c0131ae0 c0cdc680
c01319c0 00000000 00000000
[ 491.052032] bfa0: 00000000 00000000 00000000 c01000fc 00000000
00000000 00000000 00000000
[ 491.060160] bfc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[ 491.068291] bfe0: 00000000 00000000 00000000 00000000 00000013
00000000 00000000 00000000
[ 491.076390] Call trace:
[ 491.076431] update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec
[ 491.085774] remove_migration_pte from rmap_walk_file+0xcc/0x130
[ 491.091814] rmap_walk_file from remove_migration_ptes+0x90/0xa4
[ 491.097843] remove_migration_ptes from migrate_pages_batch+0x6d4/0x858
[ 491.104470] migrate_pages_batch from migrate_pages+0x188/0x488
[ 491.110405] migrate_pages from compact_zone+0x56c/0x954
[ 491.115737] compact_zone from compact_node+0x90/0xf0
[ 491.120799] compact_node from kcompactd+0x1d4/0x204
[ 491.125767] kcompactd from kthread+0x120/0x12c
[ 491.130322] kthread from ret_from_fork+0x14/0x38
[ 491.135031] Exception stack(0xc0d8bfb0 to 0xc0d8bff8)
[ 491.140056] bfa0: 00000000
00000000 00000000 00000000
[ 491.148185] bfc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[ 491.156310] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 491.162888] Code: e58d1014 eb03d88c e2503000 0affffee (e59a0030)
[ 491.168919] ---[ end trace 0000000000000000 ]---
[ 491.173500] Kernel panic - not syncing: Fatal exception
[ 491.178701] ---[ end Kernel panic - not syncing: Fatal exception ]---
Cheers,
Ezra.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [REGRESSION] NULL pointer dereference on ARM (AT91SAM9G25) during compaction
2025-02-10 16:49 [REGRESSION] NULL pointer dereference on ARM (AT91SAM9G25) during compaction Ezra Buehler
@ 2025-02-10 17:03 ` Russell King (Oracle)
2025-02-11 3:45 ` Qi Zheng
0 siblings, 1 reply; 15+ messages in thread
From: Russell King (Oracle) @ 2025-02-10 17:03 UTC (permalink / raw)
To: Ezra Buehler
Cc: linux-mm, Qi Zheng, Andrew Morton, David Hildenbrand,
Mike Rapoport (Microsoft),
Muchun Song, Vlastimil Babka, Ryan Roberts, Vishal Moola (Oracle),
Hugh Dickins, Matthew Wilcox, Peter Xu, Nicolas Ferre,
Alexandre Belloni, Claudiu Beznea, open list, linux-arm-kernel
On Mon, Feb 10, 2025 at 05:49:38PM +0100, Ezra Buehler wrote:
> When running vanilla Linux 6.13 or newer (6.14-rc2) on the
> AT91SAM9G25-based GARDENA smart Gateway, we are seeing a NULL pointer
> dereference resulting in a kernel panic. The culprit seems to be commit
> fc9c45b71f43 ("arm: adjust_pte() usepte_offset_map_rw_nolock()").
> Reverting the commit apparently fixes the issue.
The blamed commit is buggy:
arch/arm/include/asm/tlbflush.h:
#define update_mmu_cache(vma, addr, ptep) \
update_mmu_cache_range(NULL, vma, addr, ptep, 1)
So vmf can be NULL. This didn't used to matter before this commit,
because vmf was not used by ARM's update_mmu_cache_range(). However,
the commit introduced a dereference of it, which now causes a NULL
point dereference.
Not sure what the correct solution is, but at a guess, both:
if (ptl != vmf->ptl)
need to become:
if (!vmf || ptl != vmf->ptl)
but I haven't checked wha tthe locking context actually is here
(I've been out of MM stuff too long to know this off the top of my
head.)
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [REGRESSION] NULL pointer dereference on ARM (AT91SAM9G25) during compaction
2025-02-10 17:03 ` Russell King (Oracle)
@ 2025-02-11 3:45 ` Qi Zheng
2025-02-11 9:14 ` David Hildenbrand
0 siblings, 1 reply; 15+ messages in thread
From: Qi Zheng @ 2025-02-11 3:45 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Ezra Buehler, linux-mm, Andrew Morton, David Hildenbrand,
Mike Rapoport (Microsoft),
Muchun Song, Vlastimil Babka, Ryan Roberts, Vishal Moola (Oracle),
Hugh Dickins, Matthew Wilcox, Peter Xu, Nicolas Ferre,
Alexandre Belloni, Claudiu Beznea, open list, linux-arm-kernel
Hi Russell,
On 2025/2/11 01:03, Russell King (Oracle) wrote:
> On Mon, Feb 10, 2025 at 05:49:38PM +0100, Ezra Buehler wrote:
>> When running vanilla Linux 6.13 or newer (6.14-rc2) on the
>> AT91SAM9G25-based GARDENA smart Gateway, we are seeing a NULL pointer
>> dereference resulting in a kernel panic. The culprit seems to be commit
>> fc9c45b71f43 ("arm: adjust_pte() usepte_offset_map_rw_nolock()").
>> Reverting the commit apparently fixes the issue.
>
> The blamed commit is buggy:
>
> arch/arm/include/asm/tlbflush.h:
> #define update_mmu_cache(vma, addr, ptep) \
> update_mmu_cache_range(NULL, vma, addr, ptep, 1)
>
> So vmf can be NULL. This didn't used to matter before this commit,
> because vmf was not used by ARM's update_mmu_cache_range(). However,
> the commit introduced a dereference of it, which now causes a NULL
> point dereference.
>
> Not sure what the correct solution is, but at a guess, both:
>
> if (ptl != vmf->ptl)
>
> need to become:
>
> if (!vmf || ptl != vmf->ptl)
No, we can't do that, because without using split PTE locks, we would
use shared mm->page_table_lock, which would create a deadlock.
But it seems that we cannot simply bring back do_pte_lock() and
do_pte_unlock()? In make_coherent(), we traverse the vmas and exclude
the same vma, but different vmas may also map to the same PTE page,
right? In this case, we still cannot directly hold the pte lock.
But this part of code is quite old, maybe I missed something?
Thanks,
Qi
>
> but I haven't checked wha tthe locking context actually is here
> (I've been out of MM stuff too long to know this off the top of my
> head.)
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [REGRESSION] NULL pointer dereference on ARM (AT91SAM9G25) during compaction
2025-02-11 3:45 ` Qi Zheng
@ 2025-02-11 9:14 ` David Hildenbrand
2025-02-11 9:29 ` Qi Zheng
0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2025-02-11 9:14 UTC (permalink / raw)
To: Qi Zheng, Russell King (Oracle)
Cc: Ezra Buehler, linux-mm, Andrew Morton, Mike Rapoport (Microsoft),
Muchun Song, Vlastimil Babka, Ryan Roberts, Vishal Moola (Oracle),
Hugh Dickins, Matthew Wilcox, Peter Xu, Nicolas Ferre,
Alexandre Belloni, Claudiu Beznea, open list, linux-arm-kernel
On 11.02.25 04:45, Qi Zheng wrote:
> Hi Russell,
>
> On 2025/2/11 01:03, Russell King (Oracle) wrote:
>> On Mon, Feb 10, 2025 at 05:49:38PM +0100, Ezra Buehler wrote:
>>> When running vanilla Linux 6.13 or newer (6.14-rc2) on the
>>> AT91SAM9G25-based GARDENA smart Gateway, we are seeing a NULL pointer
>>> dereference resulting in a kernel panic. The culprit seems to be commit
>>> fc9c45b71f43 ("arm: adjust_pte() usepte_offset_map_rw_nolock()").
>>> Reverting the commit apparently fixes the issue.
>>
>> The blamed commit is buggy:
>>
>> arch/arm/include/asm/tlbflush.h:
>> #define update_mmu_cache(vma, addr, ptep) \
>> update_mmu_cache_range(NULL, vma, addr, ptep, 1)
>>
>> So vmf can be NULL. This didn't used to matter before this commit,
>> because vmf was not used by ARM's update_mmu_cache_range(). However,
>> the commit introduced a dereference of it, which now causes a NULL
>> point dereference.
>>
>> Not sure what the correct solution is, but at a guess, both:
>>
>> if (ptl != vmf->ptl)
>>
>> need to become:
>>
>> if (!vmf || ptl != vmf->ptl)
>
> No, we can't do that, because without using split PTE locks, we would
> use shared mm->page_table_lock, which would create a deadlock.
Maybe we can simply special-case on CONFIG_SPLIT_PTE_PTLOCKS ?
if (IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS)) {
...
}
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [REGRESSION] NULL pointer dereference on ARM (AT91SAM9G25) during compaction
2025-02-11 9:14 ` David Hildenbrand
@ 2025-02-11 9:29 ` Qi Zheng
2025-02-11 9:37 ` David Hildenbrand
0 siblings, 1 reply; 15+ messages in thread
From: Qi Zheng @ 2025-02-11 9:29 UTC (permalink / raw)
To: David Hildenbrand
Cc: Russell King (Oracle),
Ezra Buehler, linux-mm, Andrew Morton, Mike Rapoport (Microsoft),
Muchun Song, Vlastimil Babka, Ryan Roberts, Vishal Moola (Oracle),
Hugh Dickins, Matthew Wilcox, Peter Xu, Nicolas Ferre,
Alexandre Belloni, Claudiu Beznea, open list, linux-arm-kernel
On 2025/2/11 17:14, David Hildenbrand wrote:
> On 11.02.25 04:45, Qi Zheng wrote:
>> Hi Russell,
>>
>> On 2025/2/11 01:03, Russell King (Oracle) wrote:
>>> On Mon, Feb 10, 2025 at 05:49:38PM +0100, Ezra Buehler wrote:
>>>> When running vanilla Linux 6.13 or newer (6.14-rc2) on the
>>>> AT91SAM9G25-based GARDENA smart Gateway, we are seeing a NULL pointer
>>>> dereference resulting in a kernel panic. The culprit seems to be commit
>>>> fc9c45b71f43 ("arm: adjust_pte() usepte_offset_map_rw_nolock()").
>>>> Reverting the commit apparently fixes the issue.
>>>
>>> The blamed commit is buggy:
>>>
>>> arch/arm/include/asm/tlbflush.h:
>>> #define update_mmu_cache(vma, addr, ptep) \
>>> update_mmu_cache_range(NULL, vma, addr, ptep, 1)
>>>
>>> So vmf can be NULL. This didn't used to matter before this commit,
>>> because vmf was not used by ARM's update_mmu_cache_range(). However,
>>> the commit introduced a dereference of it, which now causes a NULL
>>> point dereference.
>>>
>>> Not sure what the correct solution is, but at a guess, both:
>>>
>>> if (ptl != vmf->ptl)
>>>
>>> need to become:
>>>
>>> if (!vmf || ptl != vmf->ptl)
>>
>> No, we can't do that, because without using split PTE locks, we would
>> use shared mm->page_table_lock, which would create a deadlock.
>
> Maybe we can simply special-case on CONFIG_SPLIT_PTE_PTLOCKS ?
>
> if (IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS)) {
In this case, if two vmas map the same PTE page, then the same PTE lock
will be held repeatedly. Right?
This seems to be a problem that existed before commit fc9c45b71f43
("arm: adjust_pte() use pte_offset_map_rw_nolock()").
>
> ...
> }
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [REGRESSION] NULL pointer dereference on ARM (AT91SAM9G25) during compaction
2025-02-11 9:29 ` Qi Zheng
@ 2025-02-11 9:37 ` David Hildenbrand
2025-02-11 9:43 ` Qi Zheng
0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2025-02-11 9:37 UTC (permalink / raw)
To: Qi Zheng
Cc: Russell King (Oracle),
Ezra Buehler, linux-mm, Andrew Morton, Mike Rapoport (Microsoft),
Muchun Song, Vlastimil Babka, Ryan Roberts, Vishal Moola (Oracle),
Hugh Dickins, Matthew Wilcox, Peter Xu, Nicolas Ferre,
Alexandre Belloni, Claudiu Beznea, open list, linux-arm-kernel
On 11.02.25 10:29, Qi Zheng wrote:
>
>
> On 2025/2/11 17:14, David Hildenbrand wrote:
>> On 11.02.25 04:45, Qi Zheng wrote:
>>> Hi Russell,
>>>
>>> On 2025/2/11 01:03, Russell King (Oracle) wrote:
>>>> On Mon, Feb 10, 2025 at 05:49:38PM +0100, Ezra Buehler wrote:
>>>>> When running vanilla Linux 6.13 or newer (6.14-rc2) on the
>>>>> AT91SAM9G25-based GARDENA smart Gateway, we are seeing a NULL pointer
>>>>> dereference resulting in a kernel panic. The culprit seems to be commit
>>>>> fc9c45b71f43 ("arm: adjust_pte() usepte_offset_map_rw_nolock()").
>>>>> Reverting the commit apparently fixes the issue.
>>>>
>>>> The blamed commit is buggy:
>>>>
>>>> arch/arm/include/asm/tlbflush.h:
>>>> #define update_mmu_cache(vma, addr, ptep) \
>>>> update_mmu_cache_range(NULL, vma, addr, ptep, 1)
>>>>
>>>> So vmf can be NULL. This didn't used to matter before this commit,
>>>> because vmf was not used by ARM's update_mmu_cache_range(). However,
>>>> the commit introduced a dereference of it, which now causes a NULL
>>>> point dereference.
>>>>
>>>> Not sure what the correct solution is, but at a guess, both:
>>>>
>>>> if (ptl != vmf->ptl)
>>>>
>>>> need to become:
>>>>
>>>> if (!vmf || ptl != vmf->ptl)
>>>
>>> No, we can't do that, because without using split PTE locks, we would
>>> use shared mm->page_table_lock, which would create a deadlock.
>>
>> Maybe we can simply special-case on CONFIG_SPLIT_PTE_PTLOCKS ?
>>
>> if (IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS)) {
>
> In this case, if two vmas map the same PTE page, then the same PTE lock
> will be held repeatedly. Right?
Hmm, the comment says:
/*
* This is called while another page table is mapped, so we
* must use the nested version. This also means we need to
* open-code the spin-locking.
*/
"another page table" implies that it cannot be the same. But maybe that
comment was also wrong?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [REGRESSION] NULL pointer dereference on ARM (AT91SAM9G25) during compaction
2025-02-11 9:37 ` David Hildenbrand
@ 2025-02-11 9:43 ` Qi Zheng
2025-02-11 12:09 ` David Hildenbrand
0 siblings, 1 reply; 15+ messages in thread
From: Qi Zheng @ 2025-02-11 9:43 UTC (permalink / raw)
To: David Hildenbrand
Cc: Russell King (Oracle),
Ezra Buehler, linux-mm, Andrew Morton, Mike Rapoport (Microsoft),
Muchun Song, Vlastimil Babka, Ryan Roberts, Vishal Moola (Oracle),
Hugh Dickins, Matthew Wilcox, Peter Xu, Nicolas Ferre,
Alexandre Belloni, Claudiu Beznea, open list, linux-arm-kernel
On 2025/2/11 17:37, David Hildenbrand wrote:
> On 11.02.25 10:29, Qi Zheng wrote:
>>
>>
>> On 2025/2/11 17:14, David Hildenbrand wrote:
>>> On 11.02.25 04:45, Qi Zheng wrote:
>>>> Hi Russell,
>>>>
>>>> On 2025/2/11 01:03, Russell King (Oracle) wrote:
>>>>> On Mon, Feb 10, 2025 at 05:49:38PM +0100, Ezra Buehler wrote:
>>>>>> When running vanilla Linux 6.13 or newer (6.14-rc2) on the
>>>>>> AT91SAM9G25-based GARDENA smart Gateway, we are seeing a NULL pointer
>>>>>> dereference resulting in a kernel panic. The culprit seems to be
>>>>>> commit
>>>>>> fc9c45b71f43 ("arm: adjust_pte() usepte_offset_map_rw_nolock()").
>>>>>> Reverting the commit apparently fixes the issue.
>>>>>
>>>>> The blamed commit is buggy:
>>>>>
>>>>> arch/arm/include/asm/tlbflush.h:
>>>>> #define update_mmu_cache(vma, addr, ptep) \
>>>>> update_mmu_cache_range(NULL, vma, addr, ptep, 1)
>>>>>
>>>>> So vmf can be NULL. This didn't used to matter before this commit,
>>>>> because vmf was not used by ARM's update_mmu_cache_range(). However,
>>>>> the commit introduced a dereference of it, which now causes a NULL
>>>>> point dereference.
>>>>>
>>>>> Not sure what the correct solution is, but at a guess, both:
>>>>>
>>>>> if (ptl != vmf->ptl)
>>>>>
>>>>> need to become:
>>>>>
>>>>> if (!vmf || ptl != vmf->ptl)
>>>>
>>>> No, we can't do that, because without using split PTE locks, we would
>>>> use shared mm->page_table_lock, which would create a deadlock.
>>>
>>> Maybe we can simply special-case on CONFIG_SPLIT_PTE_PTLOCKS ?
>>>
>>> if (IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS)) {
>>
>> In this case, if two vmas map the same PTE page, then the same PTE lock
>> will be held repeatedly. Right?
>
> Hmm, the comment says:
>
> /*
> * This is called while another page table is mapped, so we
> * must use the nested version. This also means we need to
> * open-code the spin-locking.
> */
>
> "another page table" implies that it cannot be the same. But maybe that
> comment was also wrong?
I don't see make_coherent() ensuring this when traversing vma. I
therefore propose the following changes:
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index 2bec87c3327d2..dddbca9a2597e 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -61,8 +61,41 @@ static int do_adjust_pte(struct vm_area_struct *vma,
unsigned long address,
return ret;
}
+#if defined(CONFIG_SPLIT_PTE_PTLOCKS)
+/*
+ * If we are using split PTE locks, then we need to take the pte
+ * lock here. Otherwise we are using shared mm->page_table_lock
+ * which is already locked, thus cannot take it.
+ */
+static inline bool do_pte_lock(spinlock_t *ptl, pmd_t pmdval, pmd_t *pmd)
+{
+ /*
+ * Use nested version here to indicate that we are already
+ * holding one similar spinlock.
+ */
+ spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
+ if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
+ spin_unlock(ptl);
+ return false;
+ }
+
+ return true;
+}
+
+static inline void do_pte_unlock(spinlock_t *ptl)
+{
+ spin_unlock(ptl);
+}
+#else /* !defined(CONFIG_SPLIT_PTE_PTLOCKS) */
+static inline bool do_pte_lock(spinlock_t *ptl)
+{
+ return true;
+}
+static inline void do_pte_unlock(spinlock_t *ptl) {}
+#endif /* defined(CONFIG_SPLIT_PTE_PTLOCKS) */
+
static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
- unsigned long pfn, struct vm_fault *vmf)
+ unsigned long pfn)
{
spinlock_t *ptl;
pgd_t *pgd;
@@ -99,23 +132,14 @@ static int adjust_pte(struct vm_area_struct *vma,
unsigned long address,
if (!pte)
return 0;
- /*
- * If we are using split PTE locks, then we need to take the page
- * lock here. Otherwise we are using shared mm->page_table_lock
- * which is already locked, thus cannot take it.
- */
- if (ptl != vmf->ptl) {
- spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
- if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
- pte_unmap_unlock(pte, ptl);
- goto again;
- }
+ if (!do_pte_lock(ptl, pmdval, pmd)) {
+ pte_unmap(pte);
+ goto again;
}
ret = do_adjust_pte(vma, address, pfn, pte);
- if (ptl != vmf->ptl)
- spin_unlock(ptl);
+ do_pte_unlock(ptl);
pte_unmap(pte);
return ret;
@@ -123,16 +147,17 @@ static int adjust_pte(struct vm_area_struct *vma,
unsigned long address,
static void
make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep, unsigned long pfn,
- struct vm_fault *vmf)
+ unsigned long addr, pte_t *ptep, unsigned long pfn)
{
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *mpnt;
unsigned long offset;
+ unsigned long start;
pgoff_t pgoff;
int aliases = 0;
pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
+ start = ALIGN_DOWN(addr, PMD_SIZE);
/*
* If we have any shared mappings that are in the same mm
@@ -141,6 +166,8 @@ make_coherent(struct address_space *mapping, struct
vm_area_struct *vma,
*/
flush_dcache_mmap_lock(mapping);
vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
+ unsigned long mpnt_addr;
+
/*
* If this VMA is not in our MM, we can ignore it.
* Note that we intentionally mask out the VMA
@@ -151,7 +178,14 @@ make_coherent(struct address_space *mapping, struct
vm_area_struct *vma,
if (!(mpnt->vm_flags & VM_MAYSHARE))
continue;
offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
- aliases += adjust_pte(mpnt, mpnt->vm_start + offset,
pfn, vmf);
+ mpnt_addr = mpnt->vm_start + offset;
+ /*
+ * If mpnt_addr and addr are mapped to the same PTE page,
+ * also skip this vma.
+ */
+ if (mpnt_addr >= start && mpnt_addr - start < PMD_SIZE)
+ continue;
+ aliases += adjust_pte(mpnt, mpnt_addr, pfn);
}
flush_dcache_mmap_unlock(mapping);
if (aliases)
@@ -194,7 +228,7 @@ void update_mmu_cache_range(struct vm_fault *vmf,
struct vm_area_struct *vma,
__flush_dcache_folio(mapping, folio);
if (mapping) {
if (cache_is_vivt())
- make_coherent(mapping, vma, addr, ptep, pfn, vmf);
+ make_coherent(mapping, vma, addr, ptep, pfn);
else if (vma->vm_flags & VM_EXEC)
__flush_icache_all();
}
Make sense?
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [REGRESSION] NULL pointer dereference on ARM (AT91SAM9G25) during compaction
2025-02-11 9:43 ` Qi Zheng
@ 2025-02-11 12:09 ` David Hildenbrand
2025-02-11 12:41 ` Qi Zheng
0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2025-02-11 12:09 UTC (permalink / raw)
To: Qi Zheng
Cc: Russell King (Oracle),
Ezra Buehler, linux-mm, Andrew Morton, Mike Rapoport (Microsoft),
Muchun Song, Vlastimil Babka, Ryan Roberts, Vishal Moola (Oracle),
Hugh Dickins, Matthew Wilcox, Peter Xu, Nicolas Ferre,
Alexandre Belloni, Claudiu Beznea, open list, linux-arm-kernel
On 11.02.25 10:43, Qi Zheng wrote:
>
>
> On 2025/2/11 17:37, David Hildenbrand wrote:
>> On 11.02.25 10:29, Qi Zheng wrote:
>>>
>>>
>>> On 2025/2/11 17:14, David Hildenbrand wrote:
>>>> On 11.02.25 04:45, Qi Zheng wrote:
>>>>> Hi Russell,
>>>>>
>>>>> On 2025/2/11 01:03, Russell King (Oracle) wrote:
>>>>>> On Mon, Feb 10, 2025 at 05:49:38PM +0100, Ezra Buehler wrote:
>>>>>>> When running vanilla Linux 6.13 or newer (6.14-rc2) on the
>>>>>>> AT91SAM9G25-based GARDENA smart Gateway, we are seeing a NULL pointer
>>>>>>> dereference resulting in a kernel panic. The culprit seems to be
>>>>>>> commit
>>>>>>> fc9c45b71f43 ("arm: adjust_pte() usepte_offset_map_rw_nolock()").
>>>>>>> Reverting the commit apparently fixes the issue.
>>>>>>
>>>>>> The blamed commit is buggy:
>>>>>>
>>>>>> arch/arm/include/asm/tlbflush.h:
>>>>>> #define update_mmu_cache(vma, addr, ptep) \
>>>>>> update_mmu_cache_range(NULL, vma, addr, ptep, 1)
>>>>>>
>>>>>> So vmf can be NULL. This didn't used to matter before this commit,
>>>>>> because vmf was not used by ARM's update_mmu_cache_range(). However,
>>>>>> the commit introduced a dereference of it, which now causes a NULL
>>>>>> point dereference.
>>>>>>
>>>>>> Not sure what the correct solution is, but at a guess, both:
>>>>>>
>>>>>> if (ptl != vmf->ptl)
>>>>>>
>>>>>> need to become:
>>>>>>
>>>>>> if (!vmf || ptl != vmf->ptl)
>>>>>
>>>>> No, we can't do that, because without using split PTE locks, we would
>>>>> use shared mm->page_table_lock, which would create a deadlock.
>>>>
>>>> Maybe we can simply special-case on CONFIG_SPLIT_PTE_PTLOCKS ?
>>>>
>>>> if (IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS)) {
>>>
>>> In this case, if two vmas map the same PTE page, then the same PTE lock
>>> will be held repeatedly. Right?
>>
>> Hmm, the comment says:
>>
>> /*
>> * This is called while another page table is mapped, so we
>> * must use the nested version. This also means we need to
>> * open-code the spin-locking.
>> */
>>
>> "another page table" implies that it cannot be the same. But maybe that
>> comment was also wrong?
>
> I don't see make_coherent() ensuring this when traversing vma.
Right, we could just have the same file range mapped MAP_SHARED into the
same page table using two VMAs ... I suspect writing a reproducer for
the deadlock should be easy.
I therefore propose the following changes:
>
> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
> index 2bec87c3327d2..dddbca9a2597e 100644
> --- a/arch/arm/mm/fault-armv.c
> +++ b/arch/arm/mm/fault-armv.c
> @@ -61,8 +61,41 @@ static int do_adjust_pte(struct vm_area_struct *vma,
> unsigned long address,
> return ret;
> }
>
> +#if defined(CONFIG_SPLIT_PTE_PTLOCKS)
> +/*
> + * If we are using split PTE locks, then we need to take the pte
> + * lock here. Otherwise we are using shared mm->page_table_lock
> + * which is already locked, thus cannot take it.
> + */
> +static inline bool do_pte_lock(spinlock_t *ptl, pmd_t pmdval, pmd_t *pmd)
> +{
> + /*
> + * Use nested version here to indicate that we are already
> + * holding one similar spinlock.
> + */
> + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
> + if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
> + spin_unlock(ptl);
> + return false;
> + }
> +
> + return true;
> +}
> +
> +static inline void do_pte_unlock(spinlock_t *ptl)
> +{
> + spin_unlock(ptl);
> +}
> +#else /* !defined(CONFIG_SPLIT_PTE_PTLOCKS) */
> +static inline bool do_pte_lock(spinlock_t *ptl)
> +{
> + return true;
> +}
> +static inline void do_pte_unlock(spinlock_t *ptl) {}
> +#endif /* defined(CONFIG_SPLIT_PTE_PTLOCKS) */
> +
> static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
> - unsigned long pfn, struct vm_fault *vmf)
> + unsigned long pfn)
> {
> spinlock_t *ptl;
> pgd_t *pgd;
> @@ -99,23 +132,14 @@ static int adjust_pte(struct vm_area_struct *vma,
> unsigned long address,
> if (!pte)
> return 0;
>
> - /*
> - * If we are using split PTE locks, then we need to take the page
> - * lock here. Otherwise we are using shared mm->page_table_lock
> - * which is already locked, thus cannot take it.
> - */
> - if (ptl != vmf->ptl) {
> - spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
> - if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
> - pte_unmap_unlock(pte, ptl);
> - goto again;
> - }
> + if (!do_pte_lock(ptl, pmdval, pmd)) {
> + pte_unmap(pte);
> + goto again;
> }
>
> ret = do_adjust_pte(vma, address, pfn, pte);
>
> - if (ptl != vmf->ptl)
> - spin_unlock(ptl);
> + do_pte_unlock(ptl);
> pte_unmap(pte);
>
> return ret;
> @@ -123,16 +147,17 @@ static int adjust_pte(struct vm_area_struct *vma,
> unsigned long address,
>
> static void
> make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
> - unsigned long addr, pte_t *ptep, unsigned long pfn,
> - struct vm_fault *vmf)
> + unsigned long addr, pte_t *ptep, unsigned long pfn)
> {
> struct mm_struct *mm = vma->vm_mm;
> struct vm_area_struct *mpnt;
> unsigned long offset;
> + unsigned long start;
> pgoff_t pgoff;
> int aliases = 0;
>
> pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
> + start = ALIGN_DOWN(addr, PMD_SIZE);
>
> /*
> * If we have any shared mappings that are in the same mm
> @@ -141,6 +166,8 @@ make_coherent(struct address_space *mapping, struct
> vm_area_struct *vma,
> */
> flush_dcache_mmap_lock(mapping);
> vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
> + unsigned long mpnt_addr;
> +
> /*
> * If this VMA is not in our MM, we can ignore it.
> * Note that we intentionally mask out the VMA
> @@ -151,7 +178,14 @@ make_coherent(struct address_space *mapping, struct
> vm_area_struct *vma,
> if (!(mpnt->vm_flags & VM_MAYSHARE))
> continue;
> offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
> - aliases += adjust_pte(mpnt, mpnt->vm_start + offset,
> pfn, vmf);
> + mpnt_addr = mpnt->vm_start + offset;
> + /*
> + * If mpnt_addr and addr are mapped to the same PTE page,
> + * also skip this vma.
> + */
> + if (mpnt_addr >= start && mpnt_addr - start < PMD_SIZE)
> + continue;
Hmm, but is skipping the right thing to do? Maybe you would want to
indicate to adjust_pte() whether it must take the PTL or not.
I did not study that code in detail, though ...
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [REGRESSION] NULL pointer dereference on ARM (AT91SAM9G25) during compaction
2025-02-11 12:09 ` David Hildenbrand
@ 2025-02-11 12:41 ` Qi Zheng
2025-02-12 6:40 ` [PATCH] arm: pgtable: fix NULL pointer dereference issue Qi Zheng
0 siblings, 1 reply; 15+ messages in thread
From: Qi Zheng @ 2025-02-11 12:41 UTC (permalink / raw)
To: David Hildenbrand
Cc: Russell King (Oracle),
Ezra Buehler, linux-mm, Andrew Morton, Mike Rapoport (Microsoft),
Muchun Song, Vlastimil Babka, Ryan Roberts, Vishal Moola (Oracle),
Hugh Dickins, Matthew Wilcox, Peter Xu, Nicolas Ferre,
Alexandre Belloni, Claudiu Beznea, open list, linux-arm-kernel
On 2025/2/11 20:09, David Hildenbrand wrote:
> On 11.02.25 10:43, Qi Zheng wrote:
>>
>>
>> On 2025/2/11 17:37, David Hildenbrand wrote:
>>> On 11.02.25 10:29, Qi Zheng wrote:
>>>>
>>>>
>>>> On 2025/2/11 17:14, David Hildenbrand wrote:
>>>>> On 11.02.25 04:45, Qi Zheng wrote:
>>>>>> Hi Russell,
>>>>>>
>>>>>> On 2025/2/11 01:03, Russell King (Oracle) wrote:
>>>>>>> On Mon, Feb 10, 2025 at 05:49:38PM +0100, Ezra Buehler wrote:
>>>>>>>> When running vanilla Linux 6.13 or newer (6.14-rc2) on the
>>>>>>>> AT91SAM9G25-based GARDENA smart Gateway, we are seeing a NULL
>>>>>>>> pointer
>>>>>>>> dereference resulting in a kernel panic. The culprit seems to be
>>>>>>>> commit
>>>>>>>> fc9c45b71f43 ("arm: adjust_pte() usepte_offset_map_rw_nolock()").
>>>>>>>> Reverting the commit apparently fixes the issue.
>>>>>>>
>>>>>>> The blamed commit is buggy:
>>>>>>>
>>>>>>> arch/arm/include/asm/tlbflush.h:
>>>>>>> #define update_mmu_cache(vma, addr, ptep) \
>>>>>>> update_mmu_cache_range(NULL, vma, addr, ptep, 1)
>>>>>>>
>>>>>>> So vmf can be NULL. This didn't used to matter before this commit,
>>>>>>> because vmf was not used by ARM's update_mmu_cache_range(). However,
>>>>>>> the commit introduced a dereference of it, which now causes a NULL
>>>>>>> point dereference.
>>>>>>>
>>>>>>> Not sure what the correct solution is, but at a guess, both:
>>>>>>>
>>>>>>> if (ptl != vmf->ptl)
>>>>>>>
>>>>>>> need to become:
>>>>>>>
>>>>>>> if (!vmf || ptl != vmf->ptl)
>>>>>>
>>>>>> No, we can't do that, because without using split PTE locks, we would
>>>>>> use shared mm->page_table_lock, which would create a deadlock.
>>>>>
>>>>> Maybe we can simply special-case on CONFIG_SPLIT_PTE_PTLOCKS ?
>>>>>
>>>>> if (IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS)) {
>>>>
>>>> In this case, if two vmas map the same PTE page, then the same PTE lock
>>>> will be held repeatedly. Right?
>>>
>>> Hmm, the comment says:
>>>
>>> /*
>>> * This is called while another page table is mapped, so we
>>> * must use the nested version. This also means we need to
>>> * open-code the spin-locking.
>>> */
>>>
>>> "another page table" implies that it cannot be the same. But maybe that
>>> comment was also wrong?
>>
>> I don't see make_coherent() ensuring this when traversing vma.
>
> Right, we could just have the same file range mapped MAP_SHARED into the
> same page table using two VMAs ... I suspect writing a reproducer for
> the deadlock should be easy.
I guess so, but I don't have an arm32 test environment yet. :(
[...]
>> vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff,
>> pgoff) {
>> + unsigned long mpnt_addr;
>> +
>> /*
>> * If this VMA is not in our MM, we can ignore it.
>> * Note that we intentionally mask out the VMA
>> @@ -151,7 +178,14 @@ make_coherent(struct address_space *mapping, struct
>> vm_area_struct *vma,
>> if (!(mpnt->vm_flags & VM_MAYSHARE))
>> continue;
>> offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
>> - aliases += adjust_pte(mpnt, mpnt->vm_start + offset,
>> pfn, vmf);
>> + mpnt_addr = mpnt->vm_start + offset;
>> + /*
>> + * If mpnt_addr and addr are mapped to the same PTE page,
>> + * also skip this vma.
>> + */
>> + if (mpnt_addr >= start && mpnt_addr - start < PMD_SIZE)
>> + continue;
>
> Hmm, but is skipping the right thing to do? Maybe you would want to
Ah, your concern is that in this case we may also need to maintain cache
coherency. I'm not sure about this.
> indicate to adjust_pte() whether it must take the PTL or not.
Indeed, this is a better approach. Will do it, and hope arm people
can help test it.
Thanks!
>
> I did not study that code in detail, though ...
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] arm: pgtable: fix NULL pointer dereference issue
2025-02-11 12:41 ` Qi Zheng
@ 2025-02-12 6:40 ` Qi Zheng
2025-02-12 7:27 ` Ezra Buehler
2025-02-12 8:20 ` David Hildenbrand
0 siblings, 2 replies; 15+ messages in thread
From: Qi Zheng @ 2025-02-12 6:40 UTC (permalink / raw)
To: linux, ezra, david, hughd, ryan.roberts, akpm, muchun.song
Cc: linux-arm-kernel, linux-mm, linux-kernel, Qi Zheng, stable
When update_mmu_cache_range() is called by update_mmu_cache(), the vmf
parameter is NULL, which will cause a NULL pointer dereference issue in
adjust_pte():
Unable to handle kernel NULL pointer dereference at virtual address 00000030 when read
Hardware name: Atmel AT91SAM9
PC is at update_mmu_cache_range+0x1e0/0x278
LR is at pte_offset_map_rw_nolock+0x18/0x2c
Call trace:
update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec
remove_migration_pte from rmap_walk_file+0xcc/0x130
rmap_walk_file from remove_migration_ptes+0x90/0xa4
remove_migration_ptes from migrate_pages_batch+0x6d4/0x858
migrate_pages_batch from migrate_pages+0x188/0x488
migrate_pages from compact_zone+0x56c/0x954
compact_zone from compact_node+0x90/0xf0
compact_node from kcompactd+0x1d4/0x204
kcompactd from kthread+0x120/0x12c
kthread from ret_from_fork+0x14/0x38
Exception stack(0xc0d8bfb0 to 0xc0d8bff8)
To fix it, do not rely on whether 'ptl' is equal to decide whether to hold
the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is
enabled. In addition, if two vmas map to the same PTE page, there is no
need to hold the pte lock again, otherwise a deadlock will occur. Just add
the need_lock parameter to let adjust_pte() know this information.
Reported-by: Ezra Buehler <ezra@easyb.ch>
Closes: https://lore.kernel.org/lkml/CAM1KZSmZ2T_riHvay+7cKEFxoPgeVpHkVFTzVVEQ1BO0cLkHEQ@mail.gmail.com/
Fixes: fc9c45b71f43 ("arm: adjust_pte() use pte_offset_map_rw_nolock()")
Cc: stable@vger.kernel.org
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
arch/arm/mm/fault-armv.c | 40 ++++++++++++++++++++++++++++------------
1 file changed, 28 insertions(+), 12 deletions(-)
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index 2bec87c3327d2..3627bf0957c75 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -62,7 +62,7 @@ static int do_adjust_pte(struct vm_area_struct *vma, unsigned long address,
}
static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
- unsigned long pfn, struct vm_fault *vmf)
+ unsigned long pfn, bool need_lock)
{
spinlock_t *ptl;
pgd_t *pgd;
@@ -99,12 +99,11 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
if (!pte)
return 0;
- /*
- * If we are using split PTE locks, then we need to take the page
- * lock here. Otherwise we are using shared mm->page_table_lock
- * which is already locked, thus cannot take it.
- */
- if (ptl != vmf->ptl) {
+ if (need_lock) {
+ /*
+ * Use nested version here to indicate that we are already
+ * holding one similar spinlock.
+ */
spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
pte_unmap_unlock(pte, ptl);
@@ -114,7 +113,7 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
ret = do_adjust_pte(vma, address, pfn, pte);
- if (ptl != vmf->ptl)
+ if (need_lock)
spin_unlock(ptl);
pte_unmap(pte);
@@ -123,16 +122,17 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
static void
make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep, unsigned long pfn,
- struct vm_fault *vmf)
+ unsigned long addr, pte_t *ptep, unsigned long pfn)
{
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *mpnt;
unsigned long offset;
+ unsigned long start;
pgoff_t pgoff;
int aliases = 0;
pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
+ start = ALIGN_DOWN(addr, PMD_SIZE);
/*
* If we have any shared mappings that are in the same mm
@@ -141,6 +141,14 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
*/
flush_dcache_mmap_lock(mapping);
vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
+ unsigned long mpnt_addr;
+ /*
+ * If we are using split PTE locks, then we need to take the pte
+ * lock. Otherwise we are using shared mm->page_table_lock which
+ * is already locked, thus cannot take it.
+ */
+ bool need_lock = IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS);
+
/*
* If this VMA is not in our MM, we can ignore it.
* Note that we intentionally mask out the VMA
@@ -151,7 +159,15 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
if (!(mpnt->vm_flags & VM_MAYSHARE))
continue;
offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
- aliases += adjust_pte(mpnt, mpnt->vm_start + offset, pfn, vmf);
+ mpnt_addr = mpnt->vm_start + offset;
+ /*
+ * If mpnt_addr and addr are mapped to the same PTE page, there
+ * is no need to hold the pte lock again, otherwise a deadlock
+ * will occur.
+ */
+ if (mpnt_addr >= start && mpnt_addr - start < PMD_SIZE)
+ need_lock = false;
+ aliases += adjust_pte(mpnt, mpnt_addr, pfn, need_lock);
}
flush_dcache_mmap_unlock(mapping);
if (aliases)
@@ -194,7 +210,7 @@ void update_mmu_cache_range(struct vm_fault *vmf, struct vm_area_struct *vma,
__flush_dcache_folio(mapping, folio);
if (mapping) {
if (cache_is_vivt())
- make_coherent(mapping, vma, addr, ptep, pfn, vmf);
+ make_coherent(mapping, vma, addr, ptep, pfn);
else if (vma->vm_flags & VM_EXEC)
__flush_icache_all();
}
--
2.20.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] arm: pgtable: fix NULL pointer dereference issue
2025-02-12 6:40 ` [PATCH] arm: pgtable: fix NULL pointer dereference issue Qi Zheng
@ 2025-02-12 7:27 ` Ezra Buehler
2025-02-12 7:32 ` Qi Zheng
2025-02-12 8:20 ` David Hildenbrand
1 sibling, 1 reply; 15+ messages in thread
From: Ezra Buehler @ 2025-02-12 7:27 UTC (permalink / raw)
To: Qi Zheng
Cc: linux, david, hughd, ryan.roberts, akpm, muchun.song,
linux-arm-kernel, linux-mm, linux-kernel, stable
Hi Qi,
Thanks for the fix. I will test it as well as I can.
On Wed, Feb 12, 2025 at 7:41 AM Qi Zheng <zhengqi.arch@bytedance.com> wrote:
>
> When update_mmu_cache_range() is called by update_mmu_cache(), the vmf
> parameter is NULL, which will cause a NULL pointer dereference issue in
> adjust_pte():
>
> Unable to handle kernel NULL pointer dereference at virtual address 00000030 when read
> Hardware name: Atmel AT91SAM9
> PC is at update_mmu_cache_range+0x1e0/0x278
> LR is at pte_offset_map_rw_nolock+0x18/0x2c
> Call trace:
> update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec
> remove_migration_pte from rmap_walk_file+0xcc/0x130
> rmap_walk_file from remove_migration_ptes+0x90/0xa4
> remove_migration_ptes from migrate_pages_batch+0x6d4/0x858
> migrate_pages_batch from migrate_pages+0x188/0x488
> migrate_pages from compact_zone+0x56c/0x954
> compact_zone from compact_node+0x90/0xf0
> compact_node from kcompactd+0x1d4/0x204
> kcompactd from kthread+0x120/0x12c
> kthread from ret_from_fork+0x14/0x38
> Exception stack(0xc0d8bfb0 to 0xc0d8bff8)
>
> To fix it, do not rely on whether 'ptl' is equal to decide whether to hold
> the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is
> enabled. In addition, if two vmas map to the same PTE page, there is no
> need to hold the pte lock again, otherwise a deadlock will occur. Just add
> the need_lock parameter to let adjust_pte() know this information.
>
> Reported-by: Ezra Buehler <ezra@easyb.ch>
Perhaps a detail but, maybe better use "Ezra Buehler
<ezra.buehler@husqvarnagroup.com>" here.
Cheers,
Ezra.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] arm: pgtable: fix NULL pointer dereference issue
2025-02-12 7:27 ` Ezra Buehler
@ 2025-02-12 7:32 ` Qi Zheng
0 siblings, 0 replies; 15+ messages in thread
From: Qi Zheng @ 2025-02-12 7:32 UTC (permalink / raw)
To: Ezra Buehler
Cc: linux, david, hughd, ryan.roberts, akpm, muchun.song,
linux-arm-kernel, linux-mm, linux-kernel, stable
Hi Ezra,
On 2025/2/12 15:27, Ezra Buehler wrote:
> Hi Qi,
>
> Thanks for the fix. I will test it as well as I can.
Thanks!
>
> On Wed, Feb 12, 2025 at 7:41 AM Qi Zheng <zhengqi.arch@bytedance.com> wrote:
>>
>> When update_mmu_cache_range() is called by update_mmu_cache(), the vmf
>> parameter is NULL, which will cause a NULL pointer dereference issue in
>> adjust_pte():
>>
>> Unable to handle kernel NULL pointer dereference at virtual address 00000030 when read
>> Hardware name: Atmel AT91SAM9
>> PC is at update_mmu_cache_range+0x1e0/0x278
>> LR is at pte_offset_map_rw_nolock+0x18/0x2c
>> Call trace:
>> update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec
>> remove_migration_pte from rmap_walk_file+0xcc/0x130
>> rmap_walk_file from remove_migration_ptes+0x90/0xa4
>> remove_migration_ptes from migrate_pages_batch+0x6d4/0x858
>> migrate_pages_batch from migrate_pages+0x188/0x488
>> migrate_pages from compact_zone+0x56c/0x954
>> compact_zone from compact_node+0x90/0xf0
>> compact_node from kcompactd+0x1d4/0x204
>> kcompactd from kthread+0x120/0x12c
>> kthread from ret_from_fork+0x14/0x38
>> Exception stack(0xc0d8bfb0 to 0xc0d8bff8)
>>
>> To fix it, do not rely on whether 'ptl' is equal to decide whether to hold
>> the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is
>> enabled. In addition, if two vmas map to the same PTE page, there is no
>> need to hold the pte lock again, otherwise a deadlock will occur. Just add
>> the need_lock parameter to let adjust_pte() know this information.
>>
>> Reported-by: Ezra Buehler <ezra@easyb.ch>
>
> Perhaps a detail but, maybe better use "Ezra Buehler
> <ezra.buehler@husqvarnagroup.com>" here.
Got it. Will wait for your test results first.
Thanks,
Qi
>
> Cheers,
> Ezra.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] arm: pgtable: fix NULL pointer dereference issue
2025-02-12 6:40 ` [PATCH] arm: pgtable: fix NULL pointer dereference issue Qi Zheng
2025-02-12 7:27 ` Ezra Buehler
@ 2025-02-12 8:20 ` David Hildenbrand
2025-02-12 8:28 ` Qi Zheng
1 sibling, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2025-02-12 8:20 UTC (permalink / raw)
To: Qi Zheng, linux, ezra, hughd, ryan.roberts, akpm, muchun.song
Cc: linux-arm-kernel, linux-mm, linux-kernel, stable
On 12.02.25 07:40, Qi Zheng wrote:
> When update_mmu_cache_range() is called by update_mmu_cache(), the vmf
> parameter is NULL, which will cause a NULL pointer dereference issue in
> adjust_pte():
>
> Unable to handle kernel NULL pointer dereference at virtual address 00000030 when read
> Hardware name: Atmel AT91SAM9
> PC is at update_mmu_cache_range+0x1e0/0x278
> LR is at pte_offset_map_rw_nolock+0x18/0x2c
> Call trace:
> update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec
> remove_migration_pte from rmap_walk_file+0xcc/0x130
> rmap_walk_file from remove_migration_ptes+0x90/0xa4
> remove_migration_ptes from migrate_pages_batch+0x6d4/0x858
> migrate_pages_batch from migrate_pages+0x188/0x488
> migrate_pages from compact_zone+0x56c/0x954
> compact_zone from compact_node+0x90/0xf0
> compact_node from kcompactd+0x1d4/0x204
> kcompactd from kthread+0x120/0x12c
> kthread from ret_from_fork+0x14/0x38
> Exception stack(0xc0d8bfb0 to 0xc0d8bff8)
>
> To fix it, do not rely on whether 'ptl' is equal to decide whether to hold
> the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is
> enabled. In addition, if two vmas map to the same PTE page, there is no
> need to hold the pte lock again, otherwise a deadlock will occur. Just add
> the need_lock parameter to let adjust_pte() know this information.
>
> Reported-by: Ezra Buehler <ezra@easyb.ch>
> Closes: https://lore.kernel.org/lkml/CAM1KZSmZ2T_riHvay+7cKEFxoPgeVpHkVFTzVVEQ1BO0cLkHEQ@mail.gmail.com/
> Fixes: fc9c45b71f43 ("arm: adjust_pte() use pte_offset_map_rw_nolock()")
> Cc: stable@vger.kernel.org
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> ---
> arch/arm/mm/fault-armv.c | 40 ++++++++++++++++++++++++++++------------
> 1 file changed, 28 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
> index 2bec87c3327d2..3627bf0957c75 100644
> --- a/arch/arm/mm/fault-armv.c
> +++ b/arch/arm/mm/fault-armv.c
> @@ -62,7 +62,7 @@ static int do_adjust_pte(struct vm_area_struct *vma, unsigned long address,
> }
>
> static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
> - unsigned long pfn, struct vm_fault *vmf)
> + unsigned long pfn, bool need_lock)
> {
> spinlock_t *ptl;
> pgd_t *pgd;
> @@ -99,12 +99,11 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
> if (!pte)
> return 0;
>
> - /*
> - * If we are using split PTE locks, then we need to take the page
> - * lock here. Otherwise we are using shared mm->page_table_lock
> - * which is already locked, thus cannot take it.
> - */
> - if (ptl != vmf->ptl) {
> + if (need_lock) {
> + /*
> + * Use nested version here to indicate that we are already
> + * holding one similar spinlock.
> + */
> spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
> if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
> pte_unmap_unlock(pte, ptl);
> @@ -114,7 +113,7 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
>
> ret = do_adjust_pte(vma, address, pfn, pte);
>
> - if (ptl != vmf->ptl)
> + if (need_lock)
> spin_unlock(ptl);
> pte_unmap(pte);
>
> @@ -123,16 +122,17 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
>
> static void
> make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
> - unsigned long addr, pte_t *ptep, unsigned long pfn,
> - struct vm_fault *vmf)
> + unsigned long addr, pte_t *ptep, unsigned long pfn)
> {
> struct mm_struct *mm = vma->vm_mm;
> struct vm_area_struct *mpnt;
> unsigned long offset;
> + unsigned long start;
> pgoff_t pgoff;
> int aliases = 0;
>
> pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
> + start = ALIGN_DOWN(addr, PMD_SIZE);
I assume you can come up with a better name than "start" :)
aligned_addr ... pmd_start_addr ...
Maybe simply
pmd_start_addr = ALIGN_DOWN(addr, PMD_SIZE);
pmd_end_addr = addr + PMD_SIZE;
Then the comparison below also becomes easier to read.
>
> /*
> * If we have any shared mappings that are in the same mm
> @@ -141,6 +141,14 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
> */
> flush_dcache_mmap_lock(mapping);
> vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
> + unsigned long mpnt_addr;
> + /*
> + * If we are using split PTE locks, then we need to take the pte
> + * lock. Otherwise we are using shared mm->page_table_lock which
> + * is already locked, thus cannot take it.
> + */
> + bool need_lock = IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS);
Nit: move "unsigned long mpnt_addr;" below this longer variable+init.
> +
> /*
> * If this VMA is not in our MM, we can ignore it.
> * Note that we intentionally mask out the VMA
> @@ -151,7 +159,15 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
> if (!(mpnt->vm_flags & VM_MAYSHARE))
> continue;
> offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
> - aliases += adjust_pte(mpnt, mpnt->vm_start + offset, pfn, vmf);
> + mpnt_addr = mpnt->vm_start + offset;
> + /*
> + * If mpnt_addr and addr are mapped to the same PTE page, there
> + * is no need to hold the pte lock again, otherwise a deadlock
> + * will occur.
/*
* Avoid deadlocks by not grabbing the PTE lock if we already hold the
* PTE lock of this PTE table in the caller.
*/
?
> + */
> + if (mpnt_addr >= start && mpnt_addr - start < PMD_SIZE)
> + need_lock = false;
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] arm: pgtable: fix NULL pointer dereference issue
2025-02-12 8:20 ` David Hildenbrand
@ 2025-02-12 8:28 ` Qi Zheng
2025-02-12 8:30 ` David Hildenbrand
0 siblings, 1 reply; 15+ messages in thread
From: Qi Zheng @ 2025-02-12 8:28 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux, ezra, hughd, ryan.roberts, akpm, muchun.song,
linux-arm-kernel, linux-mm, linux-kernel, stable
On 2025/2/12 16:20, David Hildenbrand wrote:
> On 12.02.25 07:40, Qi Zheng wrote:
>> When update_mmu_cache_range() is called by update_mmu_cache(), the vmf
>> parameter is NULL, which will cause a NULL pointer dereference issue in
>> adjust_pte():
>>
>> Unable to handle kernel NULL pointer dereference at virtual address
>> 00000030 when read
>> Hardware name: Atmel AT91SAM9
>> PC is at update_mmu_cache_range+0x1e0/0x278
>> LR is at pte_offset_map_rw_nolock+0x18/0x2c
>> Call trace:
>> update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec
>> remove_migration_pte from rmap_walk_file+0xcc/0x130
>> rmap_walk_file from remove_migration_ptes+0x90/0xa4
>> remove_migration_ptes from migrate_pages_batch+0x6d4/0x858
>> migrate_pages_batch from migrate_pages+0x188/0x488
>> migrate_pages from compact_zone+0x56c/0x954
>> compact_zone from compact_node+0x90/0xf0
>> compact_node from kcompactd+0x1d4/0x204
>> kcompactd from kthread+0x120/0x12c
>> kthread from ret_from_fork+0x14/0x38
>> Exception stack(0xc0d8bfb0 to 0xc0d8bff8)
>>
>> To fix it, do not rely on whether 'ptl' is equal to decide whether to
>> hold
>> the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is
>> enabled. In addition, if two vmas map to the same PTE page, there is no
>> need to hold the pte lock again, otherwise a deadlock will occur. Just
>> add
>> the need_lock parameter to let adjust_pte() know this information.
>>
>> Reported-by: Ezra Buehler <ezra@easyb.ch>
>> Closes:
>> https://lore.kernel.org/lkml/CAM1KZSmZ2T_riHvay+7cKEFxoPgeVpHkVFTzVVEQ1BO0cLkHEQ@mail.gmail.com/
>> Fixes: fc9c45b71f43 ("arm: adjust_pte() use pte_offset_map_rw_nolock()")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>> ---
>> arch/arm/mm/fault-armv.c | 40 ++++++++++++++++++++++++++++------------
>> 1 file changed, 28 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
>> index 2bec87c3327d2..3627bf0957c75 100644
>> --- a/arch/arm/mm/fault-armv.c
>> +++ b/arch/arm/mm/fault-armv.c
>> @@ -62,7 +62,7 @@ static int do_adjust_pte(struct vm_area_struct *vma,
>> unsigned long address,
>> }
>> static int adjust_pte(struct vm_area_struct *vma, unsigned long
>> address,
>> - unsigned long pfn, struct vm_fault *vmf)
>> + unsigned long pfn, bool need_lock)
>> {
>> spinlock_t *ptl;
>> pgd_t *pgd;
>> @@ -99,12 +99,11 @@ static int adjust_pte(struct vm_area_struct *vma,
>> unsigned long address,
>> if (!pte)
>> return 0;
>> - /*
>> - * If we are using split PTE locks, then we need to take the page
>> - * lock here. Otherwise we are using shared mm->page_table_lock
>> - * which is already locked, thus cannot take it.
>> - */
>> - if (ptl != vmf->ptl) {
>> + if (need_lock) {
>> + /*
>> + * Use nested version here to indicate that we are already
>> + * holding one similar spinlock.
>> + */
>> spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
>> if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
>> pte_unmap_unlock(pte, ptl);
>> @@ -114,7 +113,7 @@ static int adjust_pte(struct vm_area_struct *vma,
>> unsigned long address,
>> ret = do_adjust_pte(vma, address, pfn, pte);
>> - if (ptl != vmf->ptl)
>> + if (need_lock)
>> spin_unlock(ptl);
>> pte_unmap(pte);
>> @@ -123,16 +122,17 @@ static int adjust_pte(struct vm_area_struct
>> *vma, unsigned long address,
>> static void
>> make_coherent(struct address_space *mapping, struct vm_area_struct
>> *vma,
>> - unsigned long addr, pte_t *ptep, unsigned long pfn,
>> - struct vm_fault *vmf)
>> + unsigned long addr, pte_t *ptep, unsigned long pfn)
>> {
>> struct mm_struct *mm = vma->vm_mm;
>> struct vm_area_struct *mpnt;
>> unsigned long offset;
>> + unsigned long start;
>> pgoff_t pgoff;
>> int aliases = 0;
>> pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
>> + start = ALIGN_DOWN(addr, PMD_SIZE);
>
> I assume you can come up with a better name than "start" :)
>
> aligned_addr ... pmd_start_addr ...
>
> Maybe simply
>
> pmd_start_addr = ALIGN_DOWN(addr, PMD_SIZE);
> pmd_end_addr = addr + PMD_SIZE;
you mean:
pmd_end_addr = pmd_start_addr + PMD_SIZE;
Right?
>
> Then the comparison below also becomes easier to read.
>
>> /*
>> * If we have any shared mappings that are in the same mm
>> @@ -141,6 +141,14 @@ make_coherent(struct address_space *mapping,
>> struct vm_area_struct *vma,
>> */
>> flush_dcache_mmap_lock(mapping);
>> vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
>> + unsigned long mpnt_addr;
>> + /*
>> + * If we are using split PTE locks, then we need to take the pte
>> + * lock. Otherwise we are using shared mm->page_table_lock which
>> + * is already locked, thus cannot take it.
>> + */
>> + bool need_lock = IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS);
>
> Nit: move "unsigned long mpnt_addr;" below this longer variable+init.
OK, will do.
>
>> +
>> /*
>> * If this VMA is not in our MM, we can ignore it.
>> * Note that we intentionally mask out the VMA
>> @@ -151,7 +159,15 @@ make_coherent(struct address_space *mapping,
>> struct vm_area_struct *vma,
>> if (!(mpnt->vm_flags & VM_MAYSHARE))
>> continue;
>> offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
>> - aliases += adjust_pte(mpnt, mpnt->vm_start + offset, pfn, vmf);
>> + mpnt_addr = mpnt->vm_start + offset;
>> + /*
>> + * If mpnt_addr and addr are mapped to the same PTE page, there
>> + * is no need to hold the pte lock again, otherwise a deadlock
>> + * will occur.
>
> /*
> * Avoid deadlocks by not grabbing the PTE lock if we already hold the
> * PTE lock of this PTE table in the caller.
> */
Maybe just:
/* Avoid deadlocks by not grabbing the same PTE lock again. */
Thanks,
Qi
>
> ?
>
>> + */
>> + if (mpnt_addr >= start && mpnt_addr - start < PMD_SIZE)
>> + need_lock = false;
>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] arm: pgtable: fix NULL pointer dereference issue
2025-02-12 8:28 ` Qi Zheng
@ 2025-02-12 8:30 ` David Hildenbrand
0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2025-02-12 8:30 UTC (permalink / raw)
To: Qi Zheng
Cc: linux, ezra, hughd, ryan.roberts, akpm, muchun.song,
linux-arm-kernel, linux-mm, linux-kernel, stable
On 12.02.25 09:28, Qi Zheng wrote:
>
>
> On 2025/2/12 16:20, David Hildenbrand wrote:
>> On 12.02.25 07:40, Qi Zheng wrote:
>>> When update_mmu_cache_range() is called by update_mmu_cache(), the vmf
>>> parameter is NULL, which will cause a NULL pointer dereference issue in
>>> adjust_pte():
>>>
>>> Unable to handle kernel NULL pointer dereference at virtual address
>>> 00000030 when read
>>> Hardware name: Atmel AT91SAM9
>>> PC is at update_mmu_cache_range+0x1e0/0x278
>>> LR is at pte_offset_map_rw_nolock+0x18/0x2c
>>> Call trace:
>>> update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec
>>> remove_migration_pte from rmap_walk_file+0xcc/0x130
>>> rmap_walk_file from remove_migration_ptes+0x90/0xa4
>>> remove_migration_ptes from migrate_pages_batch+0x6d4/0x858
>>> migrate_pages_batch from migrate_pages+0x188/0x488
>>> migrate_pages from compact_zone+0x56c/0x954
>>> compact_zone from compact_node+0x90/0xf0
>>> compact_node from kcompactd+0x1d4/0x204
>>> kcompactd from kthread+0x120/0x12c
>>> kthread from ret_from_fork+0x14/0x38
>>> Exception stack(0xc0d8bfb0 to 0xc0d8bff8)
>>>
>>> To fix it, do not rely on whether 'ptl' is equal to decide whether to
>>> hold
>>> the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is
>>> enabled. In addition, if two vmas map to the same PTE page, there is no
>>> need to hold the pte lock again, otherwise a deadlock will occur. Just
>>> add
>>> the need_lock parameter to let adjust_pte() know this information.
>>>
>>> Reported-by: Ezra Buehler <ezra@easyb.ch>
>>> Closes:
>>> https://lore.kernel.org/lkml/CAM1KZSmZ2T_riHvay+7cKEFxoPgeVpHkVFTzVVEQ1BO0cLkHEQ@mail.gmail.com/
>>> Fixes: fc9c45b71f43 ("arm: adjust_pte() use pte_offset_map_rw_nolock()")
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>>> ---
>>> arch/arm/mm/fault-armv.c | 40 ++++++++++++++++++++++++++++------------
>>> 1 file changed, 28 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
>>> index 2bec87c3327d2..3627bf0957c75 100644
>>> --- a/arch/arm/mm/fault-armv.c
>>> +++ b/arch/arm/mm/fault-armv.c
>>> @@ -62,7 +62,7 @@ static int do_adjust_pte(struct vm_area_struct *vma,
>>> unsigned long address,
>>> }
>>> static int adjust_pte(struct vm_area_struct *vma, unsigned long
>>> address,
>>> - unsigned long pfn, struct vm_fault *vmf)
>>> + unsigned long pfn, bool need_lock)
>>> {
>>> spinlock_t *ptl;
>>> pgd_t *pgd;
>>> @@ -99,12 +99,11 @@ static int adjust_pte(struct vm_area_struct *vma,
>>> unsigned long address,
>>> if (!pte)
>>> return 0;
>>> - /*
>>> - * If we are using split PTE locks, then we need to take the page
>>> - * lock here. Otherwise we are using shared mm->page_table_lock
>>> - * which is already locked, thus cannot take it.
>>> - */
>>> - if (ptl != vmf->ptl) {
>>> + if (need_lock) {
>>> + /*
>>> + * Use nested version here to indicate that we are already
>>> + * holding one similar spinlock.
>>> + */
>>> spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
>>> if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
>>> pte_unmap_unlock(pte, ptl);
>>> @@ -114,7 +113,7 @@ static int adjust_pte(struct vm_area_struct *vma,
>>> unsigned long address,
>>> ret = do_adjust_pte(vma, address, pfn, pte);
>>> - if (ptl != vmf->ptl)
>>> + if (need_lock)
>>> spin_unlock(ptl);
>>> pte_unmap(pte);
>>> @@ -123,16 +122,17 @@ static int adjust_pte(struct vm_area_struct
>>> *vma, unsigned long address,
>>> static void
>>> make_coherent(struct address_space *mapping, struct vm_area_struct
>>> *vma,
>>> - unsigned long addr, pte_t *ptep, unsigned long pfn,
>>> - struct vm_fault *vmf)
>>> + unsigned long addr, pte_t *ptep, unsigned long pfn)
>>> {
>>> struct mm_struct *mm = vma->vm_mm;
>>> struct vm_area_struct *mpnt;
>>> unsigned long offset;
>>> + unsigned long start;
>>> pgoff_t pgoff;
>>> int aliases = 0;
>>> pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
>>> + start = ALIGN_DOWN(addr, PMD_SIZE);
>>
>> I assume you can come up with a better name than "start" :)
>>
>> aligned_addr ... pmd_start_addr ...
>>
>> Maybe simply
>>
>> pmd_start_addr = ALIGN_DOWN(addr, PMD_SIZE);
>> pmd_end_addr = addr + PMD_SIZE;
>
> you mean:
>
> pmd_end_addr = pmd_start_addr + PMD_SIZE;
>
> Right?
Yes :)
>>
>>> +
>>> /*
>>> * If this VMA is not in our MM, we can ignore it.
>>> * Note that we intentionally mask out the VMA
>>> @@ -151,7 +159,15 @@ make_coherent(struct address_space *mapping,
>>> struct vm_area_struct *vma,
>>> if (!(mpnt->vm_flags & VM_MAYSHARE))
>>> continue;
>>> offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
>>> - aliases += adjust_pte(mpnt, mpnt->vm_start + offset, pfn, vmf);
>>> + mpnt_addr = mpnt->vm_start + offset;
>>> + /*
>>> + * If mpnt_addr and addr are mapped to the same PTE page, there
>>> + * is no need to hold the pte lock again, otherwise a deadlock
>>> + * will occur.
>>
>> /*
>> * Avoid deadlocks by not grabbing the PTE lock if we already hold the
>> * PTE lock of this PTE table in the caller.
>> */
>
> Maybe just:
>
> /* Avoid deadlocks by not grabbing the same PTE lock again. */
>
Agreed.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-02-12 8:30 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-10 16:49 [REGRESSION] NULL pointer dereference on ARM (AT91SAM9G25) during compaction Ezra Buehler
2025-02-10 17:03 ` Russell King (Oracle)
2025-02-11 3:45 ` Qi Zheng
2025-02-11 9:14 ` David Hildenbrand
2025-02-11 9:29 ` Qi Zheng
2025-02-11 9:37 ` David Hildenbrand
2025-02-11 9:43 ` Qi Zheng
2025-02-11 12:09 ` David Hildenbrand
2025-02-11 12:41 ` Qi Zheng
2025-02-12 6:40 ` [PATCH] arm: pgtable: fix NULL pointer dereference issue Qi Zheng
2025-02-12 7:27 ` Ezra Buehler
2025-02-12 7:32 ` Qi Zheng
2025-02-12 8:20 ` David Hildenbrand
2025-02-12 8:28 ` Qi Zheng
2025-02-12 8:30 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox