* Potential Double Scanning of struct page Physical Memory by kmemleak
@ 2024-12-25 8:09 Weikang Guo
2024-12-31 13:16 ` Catalin Marinas
0 siblings, 1 reply; 2+ messages in thread
From: Weikang Guo @ 2024-12-25 8:09 UTC (permalink / raw)
To: Catalin Marinas, Andrew Morton; +Cc: Linux Memory Management List, linux-kernel
Dear Catalin and Andrew,
I hope this message finds you well.
While reviewing the implementation of kmemleak, I noticed a potential
issue where the physical memory occupied by struct page may be scanned
twice by kmemleak. To verify this hypothesis, I added some logging to
the code, and I would like to discuss my findings with you. Below are
the details of my environment, the logic I used, and the relevant log
output.
Test Platform:
Architecture: ARM64
Platform: qemu
branch: local branch based on V6.12
defconfig: use defconfig with CONFIG_KMEMLEAK open
Problem Description:
When CONFIG_SPARSEMEM is enabled, the memory for `struct page`objects
in the sections is allocated via `sparse_buffer` using the
`memblock_alloc_xxx` interface. Since memblock does not explicitly
specify `MEMBLOCK_ALLOC_NOLEAKTRACE`, kmemleak will treat the memory
as a gray object with mincount=0. In other words, the physical memory
occupied by `struct page` will be scanned by kmemleak as a gray object
during the scan thread.
Additionally, kmemleak also traverses and scans valid struct page
objects in the zone . As a result, the physical memory occupied by
struct page may end up being scanned twice by scan thread.
Logging Added:
To investigate further, I added the following logs:
Logs for the memory allocated for `struct page` objects through sparse memblock.
diff --git a/mm/sparse.c b/mm/sparse.c
index 3174b3ca3e9e..f2465b394344 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -362,6 +362,9 @@ static void __init sparse_buffer_init(unsigned
long size, int nid)
*/
sparsemap_buf = memmap_alloc(size, section_map_size(), addr, nid, true);
sparsemap_buf_end = sparsemap_buf + size;
+ pr_info("sparse buf alloc VA: [0x%08lx: 0x%08lx] to PA: \
+ [0x%08lx: 0x%08lx]\n",sparsemap_buf, sparsemap_buf_end,
+ virt_to_phys(sparsemap_buf), virt_to_phys(sparsemap_buf_end));
#ifndef CONFIG_SPARSEMEM_VMEMMAP
memmap_boot_pages_add(DIV_ROUND_UP(size, PAGE_SIZE));
#endif
Logs to show the mapping between VMEMAP VA and PA.
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index c0388b2e959d..f1a312f2a281 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -342,6 +342,8 @@ int __meminit vmemmap_populate_hugepages(unsigned
long start, unsigned long end,
p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
if (p) {
+ pr_info("==== vmemap 2M: VA:0x%08lx to
PA:0x%08lx \n",
+ start, virt_to_phys(p));
vmemmap_set_pmd(pmd, p, node, addr, next);
continue;
} else if (altmap) {
Logs showing the memblock and struct page objects scanned by kmemleak.
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 0efa41d77bcc..a803825dcb18 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1529,7 +1529,10 @@ static void scan_object(struct kmemleak_object *object)
(void *)object->pointer;
void *end = start + object->size;
void *next;
-
+ if (object->flags & OBJECT_PHYS) {
+ pr_info("======= scan memblock
OBJECT_PHYS:[0x%08lx:0x%08lx]\n",
+ start, end);
+ }
do {
next = min(start + MAX_SCAN_SIZE, end);
scan_block(start, next, object);
@@ -1677,6 +1680,8 @@ static void kmemleak_scan(void)
* Struct page scanning for each node.
*/
for_each_populated_zone(zone) {
+ static int print_scan_page = 0;
+
unsigned long start_pfn = zone->zone_start_pfn;
unsigned long end_pfn = zone_end_pfn(zone);
unsigned long pfn;
@@ -1696,6 +1701,14 @@ static void kmemleak_scan(void)
/* only scan if page is in use */
if (page_count(page) == 0)
continue;
+
+
+ if (print_scan_page == 0) {
+ pr_info("======= scan page
[0x%08lx:0x%08lx] \n",
+ page, page+1);
+ print_scan_page++;
+ }
+
scan_block(page, page + 1, NULL);
}
}
The log output is as follows:
[ 0.000000] sparse buf alloc VA: [0xffff39253b600000:
0xffff39253f600000] to PA: [0x13b600000: 0x13f600000]
[ 0.000000] ==== vmemap 2M: VA:0xfffffee451000000 to PA:0x13b600000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee451200000 to PA:0x13b800000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee451400000 to PA:0x13ba00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee451600000 to PA:0x13bc00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee451800000 to PA:0x13be00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee451a00000 to PA:0x13c000000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee451c00000 to PA:0x13c200000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee451e00000 to PA:0x13c400000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee452000000 to PA:0x13c600000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee452200000 to PA:0x13c800000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee452400000 to PA:0x13ca00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee452600000 to PA:0x13cc00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee452800000 to PA:0x13ce00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee452a00000 to PA:0x13d000000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee452c00000 to PA:0x13d200000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee452e00000 to PA:0x13d400000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee453000000 to PA:0x13d600000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee453200000 to PA:0x13d800000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee453400000 to PA:0x13da00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee453600000 to PA:0x13dc00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee453800000 to PA:0x13de00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee453a00000 to PA:0x13e000000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee453c00000 to PA:0x13e200000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee453e00000 to PA:0x13e400000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee454000000 to PA:0x13e600000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee454200000 to PA:0x13e800000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee454400000 to PA:0x13ea00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee454600000 to PA:0x13ec00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee454800000 to PA:0x13ee00000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee454a00000 to PA:0x13f000000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee454c00000 to PA:0x13f200000
[ 0.000000] ==== vmemap 2M: VA:0xfffffee454e00000 to PA:0x13f400000
[ 62.480979] kmemleak: ======= scan page
[0xfffffee451008400:0xfffffee451008440]
[ 62.705876] kmemleak: ======= scan memblock
OBJECT_PHYS:[0xffff39253b600000:0xffff39253f600000]
scan_page: VMEMAP VA is [ 0xfffffee451008400 - 0xfffffee451008440]
scan_page:Real PA is [0x13b608400 - 0x13b608440]
scan OBJECT_PHYS: VA is [0xffff39253b600000:0xffff39253f600000]
scan OBJECT_PHYS: PA is [0x13b600000:0x13f600000]
The first record page `0x13b608400` was scanned twice
Could you please help confirm whether this issue exists and if it
could lead to any unintended consequences? I would appreciate any
insights or suggestions you may have.
Possible solution: Specify `MEMBLOCK_ALLOC_NOLEAKTRACE` when alloc
`struct page memory`
Thank you for your time and assistance.
Best regards,
Guo
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Potential Double Scanning of struct page Physical Memory by kmemleak
2024-12-25 8:09 Potential Double Scanning of struct page Physical Memory by kmemleak Weikang Guo
@ 2024-12-31 13:16 ` Catalin Marinas
0 siblings, 0 replies; 2+ messages in thread
From: Catalin Marinas @ 2024-12-31 13:16 UTC (permalink / raw)
To: Weikang Guo; +Cc: Andrew Morton, Linux Memory Management List, linux-kernel
Hi Guo,
On Wed, Dec 25, 2024 at 04:09:12PM +0800, Weikang Guo wrote:
> Problem Description:
>
> When CONFIG_SPARSEMEM is enabled, the memory for `struct page`objects
> in the sections is allocated via `sparse_buffer` using the
> `memblock_alloc_xxx` interface. Since memblock does not explicitly
> specify `MEMBLOCK_ALLOC_NOLEAKTRACE`, kmemleak will treat the memory
> as a gray object with mincount=0. In other words, the physical memory
> occupied by `struct page` will be scanned by kmemleak as a gray object
> during the scan thread.
>
> Additionally, kmemleak also traverses and scans valid struct page
> objects in the zone . As a result, the physical memory occupied by
> struct page may end up being scanned twice by scan thread.
Yes, I can see how this happens. I don't remember how we ended up like
this, maybe kmemleak did not track memblock allocations in the early
days.
> Possible solution: Specify `MEMBLOCK_ALLOC_NOLEAKTRACE` when alloc
> `struct page memory`
I think that's the easiest and just let kmemleak scan the mem_map
explicitly, whether it's in the linear map or vmemmap.
The way we ended up with marking 'nokleaktrace' blocks in the memblock
API isn't great. This "flag" used to be called MEMBLOCK_ALLOC_KASAN and
only used by KASAN (implying accessible). But it's not an actual flag,
just some random value passed as the 'end' argument to memblock_alloc()
and friends. Luckily memmap_alloc() only needs MEMBLOCK_ALLOC_ACCESSIBLE
which is implied by MEMBLOCK_ALLOC_NOLEAKTRACE (though I can't find any
documentation about this).
Anyway, if you fix memmap_alloc(), please add a comment that kmemleak
scans this explicitly. Also add a comment where we define
MEMBLOCK_ALLOC_NOLEAKTRACE to state that it implies
MEMBLOCK_ALLOC_ACCESSIBLE. Ideally we should have named this
MEMBLOCK_ALLOC_ACCESSIBLE_NOLEAKTRACE but it's nearly half the
recommended line length.
Thanks.
--
Catalin
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-12-31 13:16 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-12-25 8:09 Potential Double Scanning of struct page Physical Memory by kmemleak Weikang Guo
2024-12-31 13:16 ` Catalin Marinas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox