* [PATCH 0/3] selftests/mm: virtual_address_range: Two bugfixes and a cleanup
@ 2025-01-07 15:14 Thomas Weißschuh
2025-01-07 15:14 ` [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB Thomas Weißschuh
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Thomas Weißschuh @ 2025-01-07 15:14 UTC (permalink / raw)
To: Andrew Morton, Shuah Khan, Dev Jain, Thomas Gleixner
Cc: linux-mm, linux-kselftest, linux-kernel, Thomas Weißschuh,
stable, kernel test robot
The selftest started failing since commit e93d2521b27f
("x86/vdso: Split virtual clock pages into dedicated mapping")
was merged. While debugging I stumbled upon another bug and potential
cleanup.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
---
Thomas Weißschuh (3):
selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB
selftests/mm: virtual_address_range: Avoid reading VVAR mappings
selftests/mm: virtual_address_range: Dump to /dev/null
tools/testing/selftests/mm/virtual_address_range.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
---
base-commit: fbfd64d25c7af3b8695201ebc85efe90be28c5a3
change-id: 20250107-virtual_address_range-tests-95843766fa97
Best regards,
--
Thomas Weißschuh <thomas.weissschuh@linutronix.de>
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB 2025-01-07 15:14 [PATCH 0/3] selftests/mm: virtual_address_range: Two bugfixes and a cleanup Thomas Weißschuh @ 2025-01-07 15:14 ` Thomas Weißschuh 2025-01-08 6:16 ` Dev Jain 2025-01-07 15:14 ` [PATCH 2/3] selftests/mm: virtual_address_range: Avoid reading VVAR mappings Thomas Weißschuh 2025-01-07 15:14 ` [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null Thomas Weißschuh 2 siblings, 1 reply; 18+ messages in thread From: Thomas Weißschuh @ 2025-01-07 15:14 UTC (permalink / raw) To: Andrew Morton, Shuah Khan, Dev Jain, Thomas Gleixner Cc: linux-mm, linux-kselftest, linux-kernel, Thomas Weißschuh, stable If not enough physical memory is available the kernel may fail mmap(); see __vm_enough_memory() and vm_commit_limit(). In that case the logic in validate_complete_va_space() does not make sense and will even incorrectly fail. Instead skip the test if no mmap() succeeded. Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> --- The logic in __vm_enough_memory() seems weird. It describes itself as "Check that a process has enough memory to allocate a new virtual mapping", however it never checks the current memory usage of the process. So it only disallows large mappings. But many small mappings taking the same amount of memory are allowed; and then even automatically merged into one big mapping. --- tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -178,6 +178,12 @@ int main(int argc, char *argv[]) validate_addr(ptr[i], 0); } lchunks = i; + + if (!lchunks) { + ksft_test_result_skip("Not enough memory for a single chunk\n"); + ksft_finished(); + } + hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *)); if (hptr == NULL) { ksft_test_result_skip("Memory constraint not fulfilled\n"); -- 2.47.1 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB 2025-01-07 15:14 ` [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB Thomas Weißschuh @ 2025-01-08 6:16 ` Dev Jain 2025-01-08 8:05 ` Thomas Weißschuh 0 siblings, 1 reply; 18+ messages in thread From: Dev Jain @ 2025-01-08 6:16 UTC (permalink / raw) To: Thomas Weißschuh, Andrew Morton, Shuah Khan, Thomas Gleixner Cc: linux-mm, linux-kselftest, linux-kernel, stable, Ryan Roberts, David Hildenbrand On 07/01/25 8:44 pm, Thomas Weißschuh wrote: > If not enough physical memory is available the kernel may fail mmap(); > see __vm_enough_memory() and vm_commit_limit(). > In that case the logic in validate_complete_va_space() does not make > sense and will even incorrectly fail. > Instead skip the test if no mmap() succeeded. > > Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") > Cc: stable@vger.kernel.org > Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> > > --- > The logic in __vm_enough_memory() seems weird. > It describes itself as "Check that a process has enough memory to > allocate a new virtual mapping", however it never checks the current > memory usage of the process. > So it only disallows large mappings. But many small mappings taking the > same amount of memory are allowed; and then even automatically merged > into one big mapping. > --- > tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c > index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644 > --- a/tools/testing/selftests/mm/virtual_address_range.c > +++ b/tools/testing/selftests/mm/virtual_address_range.c > @@ -178,6 +178,12 @@ int main(int argc, char *argv[]) > validate_addr(ptr[i], 0); > } > lchunks = i; > + > + if (!lchunks) { > + ksft_test_result_skip("Not enough memory for a single chunk\n"); > + ksft_finished(); > + } > + > hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *)); > if (hptr == NULL) { > ksft_test_result_skip("Memory constraint not fulfilled\n"); > I do not know about __vm_enough_memory(), but I am going by your description: You say that the kernel may fail mmap() when enough physical memory is not there, but it may happen that we have already done 100 mmap()'s, and then the kernel fails mmap(), so if (!lchunks) won't be able to handle this case. Basically, lchunks == 0 is not a complete indicator of kernel failing mmap(). The basic assumption of the test is that any process should be able to exhaust its virtual address space, and running the test under memory pressure and the kernel violating this behaviour defeats the point of the test I think? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB 2025-01-08 6:16 ` Dev Jain @ 2025-01-08 8:05 ` Thomas Weißschuh 2025-01-08 13:36 ` David Hildenbrand 0 siblings, 1 reply; 18+ messages in thread From: Thomas Weißschuh @ 2025-01-08 8:05 UTC (permalink / raw) To: Dev Jain Cc: Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm, linux-kselftest, linux-kernel, stable, Ryan Roberts, David Hildenbrand On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote: > > On 07/01/25 8:44 pm, Thomas Weißschuh wrote: > > If not enough physical memory is available the kernel may fail mmap(); > > see __vm_enough_memory() and vm_commit_limit(). > > In that case the logic in validate_complete_va_space() does not make > > sense and will even incorrectly fail. > > Instead skip the test if no mmap() succeeded. > > > > Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") > > Cc: stable@vger.kernel.org > > Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> > > > > --- > > The logic in __vm_enough_memory() seems weird. > > It describes itself as "Check that a process has enough memory to > > allocate a new virtual mapping", however it never checks the current > > memory usage of the process. > > So it only disallows large mappings. But many small mappings taking the > > same amount of memory are allowed; and then even automatically merged > > into one big mapping. > > --- > > tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c > > index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644 > > --- a/tools/testing/selftests/mm/virtual_address_range.c > > +++ b/tools/testing/selftests/mm/virtual_address_range.c > > @@ -178,6 +178,12 @@ int main(int argc, char *argv[]) > > validate_addr(ptr[i], 0); > > } > > lchunks = i; > > + > > + if (!lchunks) { > > + ksft_test_result_skip("Not enough memory for a single chunk\n"); > > + ksft_finished(); > > + } > > + > > hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *)); > > if (hptr == NULL) { > > ksft_test_result_skip("Memory constraint not fulfilled\n"); > > > > I do not know about __vm_enough_memory(), but I am going by your description: > You say that the kernel may fail mmap() when enough physical memory is not > there, but it may happen that we have already done 100 mmap()'s, and then > the kernel fails mmap(), so if (!lchunks) won't be able to handle this case. > Basically, lchunks == 0 is not a complete indicator of kernel failing mmap(). __vm_enough_memory() only checks the size of each single mmap() on its own. It does not actually check the current memory or address space usage of the process. This seems a bit weird, as indicated in my after-the-fold explanation. > The basic assumption of the test is that any process should be able to exhaust > its virtual address space, and running the test under memory pressure and the > kernel violating this behaviour defeats the point of the test I think? The assumption is correct, as soon as one mapping succeeds the others will also succeed, until the actual address space is exhausted. Looking at it again, __vm_enough_memory() is only called for writable mappings, so it would be possible to use only readable mappings in the test. The test will still fail with OOM, as the many PTEs need more than 1GiB of physical memory anyways, but at least that produces a usable error message. However I'm not sure if this would violate other test assumptions. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB 2025-01-08 8:05 ` Thomas Weißschuh @ 2025-01-08 13:36 ` David Hildenbrand 2025-01-08 16:13 ` Thomas Weißschuh 0 siblings, 1 reply; 18+ messages in thread From: David Hildenbrand @ 2025-01-08 13:36 UTC (permalink / raw) To: Thomas Weißschuh, Dev Jain Cc: Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm, linux-kselftest, linux-kernel, stable, Ryan Roberts On 08.01.25 09:05, Thomas Weißschuh wrote: > On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote: >> >> On 07/01/25 8:44 pm, Thomas Weißschuh wrote: >>> If not enough physical memory is available the kernel may fail mmap(); >>> see __vm_enough_memory() and vm_commit_limit(). >>> In that case the logic in validate_complete_va_space() does not make >>> sense and will even incorrectly fail. >>> Instead skip the test if no mmap() succeeded. >>> >>> Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") >>> Cc: stable@vger.kernel.org CC stable on tests is ... odd. >>> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> >>> >>> --- >>> The logic in __vm_enough_memory() seems weird. >>> It describes itself as "Check that a process has enough memory to >>> allocate a new virtual mapping", however it never checks the current >>> memory usage of the process. >>> So it only disallows large mappings. But many small mappings taking the >>> same amount of memory are allowed; and then even automatically merged >>> into one big mapping. >>> --- >>> tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c >>> index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644 >>> --- a/tools/testing/selftests/mm/virtual_address_range.c >>> +++ b/tools/testing/selftests/mm/virtual_address_range.c >>> @@ -178,6 +178,12 @@ int main(int argc, char *argv[]) >>> validate_addr(ptr[i], 0); >>> } >>> lchunks = i; >>> + >>> + if (!lchunks) { >>> + ksft_test_result_skip("Not enough memory for a single chunk\n"); >>> + ksft_finished(); >>> + } >>> + >>> hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *)); >>> if (hptr == NULL) { >>> ksft_test_result_skip("Memory constraint not fulfilled\n"); >>> >> >> I do not know about __vm_enough_memory(), but I am going by your description: >> You say that the kernel may fail mmap() when enough physical memory is not >> there, but it may happen that we have already done 100 mmap()'s, and then >> the kernel fails mmap(), so if (!lchunks) won't be able to handle this case. >> Basically, lchunks == 0 is not a complete indicator of kernel failing mmap(). > > __vm_enough_memory() only checks the size of each single mmap() on its > own. It does not actually check the current memory or address space > usage of the process. > This seems a bit weird, as indicated in my after-the-fold explanation. > >> The basic assumption of the test is that any process should be able to exhaust >> its virtual address space, and running the test under memory pressure and the >> kernel violating this behaviour defeats the point of the test I think? > > The assumption is correct, as soon as one mapping succeeds the others > will also succeed, until the actual address space is exhausted. > > Looking at it again, __vm_enough_memory() is only called for writable > mappings, so it would be possible to use only readable mappings in the > test. The test will still fail with OOM, as the many PTEs need more than > 1GiB of physical memory anyways, but at least that produces a usable > error message. > However I'm not sure if this would violate other test assumptions. > Note that with MAP_NORESRVE, most setups we care about will allow mapping as much as you want, but on access OOM will fire. So one could require that /proc/sys/vm/overcommit_memory is setup properly and use MAP_NORESRVE. Reading from anonymous memory will populate the shared zeropage. To mitigate OOM from "too many page tables", one could simply unmap the pieces as they are verified (or MAP_FIXED over them, to free page tables). -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB 2025-01-08 13:36 ` David Hildenbrand @ 2025-01-08 16:13 ` Thomas Weißschuh 2025-01-08 16:46 ` David Hildenbrand 2025-01-09 5:40 ` Dev Jain 0 siblings, 2 replies; 18+ messages in thread From: Thomas Weißschuh @ 2025-01-08 16:13 UTC (permalink / raw) To: David Hildenbrand Cc: Dev Jain, Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm, linux-kselftest, linux-kernel, stable, Ryan Roberts On Wed, Jan 08, 2025 at 02:36:57PM +0100, David Hildenbrand wrote: > On 08.01.25 09:05, Thomas Weißschuh wrote: > > On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote: > > > > > > On 07/01/25 8:44 pm, Thomas Weißschuh wrote: > > > > If not enough physical memory is available the kernel may fail mmap(); > > > > see __vm_enough_memory() and vm_commit_limit(). > > > > In that case the logic in validate_complete_va_space() does not make > > > > sense and will even incorrectly fail. > > > > Instead skip the test if no mmap() succeeded. > > > > > > > > Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") > > > > Cc: stable@vger.kernel.org > > CC stable on tests is ... odd. I thought it was fairly common, but it isn't. Will drop it. > > > > Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> > > > > > > > > --- > > > > The logic in __vm_enough_memory() seems weird. > > > > It describes itself as "Check that a process has enough memory to > > > > allocate a new virtual mapping", however it never checks the current > > > > memory usage of the process. > > > > So it only disallows large mappings. But many small mappings taking the > > > > same amount of memory are allowed; and then even automatically merged > > > > into one big mapping. > > > > --- > > > > tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++ > > > > 1 file changed, 6 insertions(+) > > > > > > > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c > > > > index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644 > > > > --- a/tools/testing/selftests/mm/virtual_address_range.c > > > > +++ b/tools/testing/selftests/mm/virtual_address_range.c > > > > @@ -178,6 +178,12 @@ int main(int argc, char *argv[]) > > > > validate_addr(ptr[i], 0); > > > > } > > > > lchunks = i; > > > > + > > > > + if (!lchunks) { > > > > + ksft_test_result_skip("Not enough memory for a single chunk\n"); > > > > + ksft_finished(); > > > > + } > > > > + > > > > hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *)); > > > > if (hptr == NULL) { > > > > ksft_test_result_skip("Memory constraint not fulfilled\n"); > > > > > > > > > > I do not know about __vm_enough_memory(), but I am going by your description: > > > You say that the kernel may fail mmap() when enough physical memory is not > > > there, but it may happen that we have already done 100 mmap()'s, and then > > > the kernel fails mmap(), so if (!lchunks) won't be able to handle this case. > > > Basically, lchunks == 0 is not a complete indicator of kernel failing mmap(). > > > > __vm_enough_memory() only checks the size of each single mmap() on its > > own. It does not actually check the current memory or address space > > usage of the process. > > This seems a bit weird, as indicated in my after-the-fold explanation. > > > > > The basic assumption of the test is that any process should be able to exhaust > > > its virtual address space, and running the test under memory pressure and the > > > kernel violating this behaviour defeats the point of the test I think? > > > > The assumption is correct, as soon as one mapping succeeds the others > > will also succeed, until the actual address space is exhausted. > > > > Looking at it again, __vm_enough_memory() is only called for writable > > mappings, so it would be possible to use only readable mappings in the > > test. The test will still fail with OOM, as the many PTEs need more than > > 1GiB of physical memory anyways, but at least that produces a usable > > error message. > > However I'm not sure if this would violate other test assumptions. > > > > Note that with MAP_NORESRVE, most setups we care about will allow mapping as > much as you want, but on access OOM will fire. Thanks for the hint. > So one could require that /proc/sys/vm/overcommit_memory is setup properly > and use MAP_NORESRVE. Isn't the check for lchunks == 0 essentially exactly this? > Reading from anonymous memory will populate the shared zeropage. To mitigate > OOM from "too many page tables", one could simply unmap the pieces as they > are verified (or MAP_FIXED over them, to free page tables). The code has to figure out if a verified region was created by mmap(), otherwise an munmap() could crash the process. As the entries from /proc/self/maps may have been merged and (I assume) the ordering of mappings is not guaranteed, some bespoke logic to establish the link will be needed. Is it fine to rely on CONFIG_ANON_VMA_NAME? That would make it much easier to implement. Using MAP_NORESERVE and eager munmap()s, the testcase works nicely even in very low physical memory conditions. Thomas ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB 2025-01-08 16:13 ` Thomas Weißschuh @ 2025-01-08 16:46 ` David Hildenbrand 2025-01-09 7:47 ` Thomas Weißschuh 2025-01-09 5:40 ` Dev Jain 1 sibling, 1 reply; 18+ messages in thread From: David Hildenbrand @ 2025-01-08 16:46 UTC (permalink / raw) To: Thomas Weißschuh Cc: Dev Jain, Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm, linux-kselftest, linux-kernel, stable, Ryan Roberts On 08.01.25 17:13, Thomas Weißschuh wrote: > On Wed, Jan 08, 2025 at 02:36:57PM +0100, David Hildenbrand wrote: >> On 08.01.25 09:05, Thomas Weißschuh wrote: >>> On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote: >>>> >>>> On 07/01/25 8:44 pm, Thomas Weißschuh wrote: >>>>> If not enough physical memory is available the kernel may fail mmap(); >>>>> see __vm_enough_memory() and vm_commit_limit(). >>>>> In that case the logic in validate_complete_va_space() does not make >>>>> sense and will even incorrectly fail. >>>>> Instead skip the test if no mmap() succeeded. >>>>> >>>>> Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") >>>>> Cc: stable@vger.kernel.org >> >> CC stable on tests is ... odd. > > I thought it was fairly common, but it isn't. > Will drop it. As it's not really a "kernel BUG", it's rather uncommon. >> >> Note that with MAP_NORESRVE, most setups we care about will allow mapping as >> much as you want, but on access OOM will fire. > > Thanks for the hint. > >> So one could require that /proc/sys/vm/overcommit_memory is setup properly >> and use MAP_NORESRVE. > > Isn't the check for lchunks == 0 essentially exactly this? I assume paired with MAP_NORESERVE? Maybe, but it could be better to have something that says "if overcommit_memory is not setup properly I will SKIP this test", but otherwise I expect this to work and will FAIL if it doesn't". Or would you expect to run into lchunks == 0 even if overcommit_memory is setup properly and MAP_NORESERVE is used? (very very low memory that we cannot even create all the VMAs?) > >> Reading from anonymous memory will populate the shared zeropage. To mitigate >> OOM from "too many page tables", one could simply unmap the pieces as they >> are verified (or MAP_FIXED over them, to free page tables). > > The code has to figure out if a verified region was created by mmap(), > otherwise an munmap() could crash the process. > As the entries from /proc/self/maps may have been merged and (I assume) Yes, and partial unmap (in chunk granularity?) would split them again. > the ordering of mappings is not guaranteed, some bespoke logic to establish > the link will be needed. My thinking was that you simply process one /proc/self/maps entry in some chunks. After processing a chunk, you munmap() it. So you would process + munmap in chunks. > > Is it fine to rely on CONFIG_ANON_VMA_NAME? > That would make it much easier to implement. Can you elaborate how you would do it? > > Using MAP_NORESERVE and eager munmap()s, the testcase works nicely even > in very low physical memory conditions. Cool. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB 2025-01-08 16:46 ` David Hildenbrand @ 2025-01-09 7:47 ` Thomas Weißschuh 2025-01-09 13:05 ` David Hildenbrand 0 siblings, 1 reply; 18+ messages in thread From: Thomas Weißschuh @ 2025-01-09 7:47 UTC (permalink / raw) To: David Hildenbrand Cc: Dev Jain, Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm, linux-kselftest, linux-kernel, stable, Ryan Roberts On Wed, Jan 08, 2025 at 05:46:37PM +0100, David Hildenbrand wrote: > On 08.01.25 17:13, Thomas Weißschuh wrote: > > On Wed, Jan 08, 2025 at 02:36:57PM +0100, David Hildenbrand wrote: > > > On 08.01.25 09:05, Thomas Weißschuh wrote: > > > > On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote: > > > > > > > > > > On 07/01/25 8:44 pm, Thomas Weißschuh wrote: > > > > > > If not enough physical memory is available the kernel may fail mmap(); > > > > > > see __vm_enough_memory() and vm_commit_limit(). > > > > > > In that case the logic in validate_complete_va_space() does not make > > > > > > sense and will even incorrectly fail. > > > > > > Instead skip the test if no mmap() succeeded. > > > > > > > > > > > > Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") > > > > > > Cc: stable@vger.kernel.org > > > > > > CC stable on tests is ... odd. > > > > I thought it was fairly common, but it isn't. > > Will drop it. > > As it's not really a "kernel BUG", it's rather uncommon. I also used it on patch 2, which is now reproducibly broken on x86 mainline since my commit mentioned in that patch. But I'll drop it there, too. > > > Note that with MAP_NORESRVE, most setups we care about will allow mapping as > > > much as you want, but on access OOM will fire. > > > > Thanks for the hint. > > > > > So one could require that /proc/sys/vm/overcommit_memory is setup properly > > > and use MAP_NORESRVE. > > > > Isn't the check for lchunks == 0 essentially exactly this? > > I assume paired with MAP_NORESERVE? Yes. > Maybe, but it could be better to have something that says "if > overcommit_memory is not setup properly I will SKIP this test", but > otherwise I expect this to work and will FAIL if it doesn't". Ok, I'll validate the sysctl value. > Or would you expect to run into lchunks == 0 even if overcommit_memory is > setup properly and MAP_NORESERVE is used? (very very low memory that we > cannot even create all the VMAs?) No. > > > Reading from anonymous memory will populate the shared zeropage. To mitigate > > > OOM from "too many page tables", one could simply unmap the pieces as they > > > are verified (or MAP_FIXED over them, to free page tables). > > > > The code has to figure out if a verified region was created by mmap(), > > otherwise an munmap() could crash the process. > > As the entries from /proc/self/maps may have been merged and (I assume) > > Yes, and partial unmap (in chunk granularity?) would split them again. > > > the ordering of mappings is not guaranteed, some bespoke logic to establish > > the link will be needed. > > My thinking was that you simply process one /proc/self/maps entry in some > chunks. After processing a chunk, you munmap() it. > > So you would process + munmap in chunks. That is clear. The issue would be to figure which chunks are valid to unmap. If something critical like the executable file is unmapped, the process crashes. But see below. > > Is it fine to rely on CONFIG_ANON_VMA_NAME? > > That would make it much easier to implement. > > Can you elaborate how you would do it? First set the VMA name after mmap(): for (i = 0; i < NR_CHUNKS_LOW; i++) { ptr[i] = mmap(NULL, MAP_CHUNK_SIZE, PROT_READ | PROT_WRITE, MAP_NORESERVE | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (ptr[i] == MAP_FAILED) { if (validate_lower_address_hint()) ksft_exit_fail_msg("mmap unexpectedly succeeded with hint\n"); break; } validate_addr(ptr[i], 0); if (prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ptr[i], MAP_CHUNK_SIZE, "virtual_address_range")) ksft_exit_fail_msg("prctl(PR_SET_VMA_ANON_NAME) failed: %s\n", strerror(errno)); } During validation: hop = 0; while (start_addr + hop < end_addr) { if (write(fd, (void *)(start_addr + hop), 1) != 1) return 1; lseek(fd, 0, SEEK_SET); if (!strncmp(line + path_offset, "[anon:virtual_address_range]", 28)) munmap((char *)(start_addr + hop), MAP_CHUNK_SIZE); hop += MAP_CHUNK_SIZE; } It is done for each chunk, as all chunks may have been merged into a single VMA and a per-VMA unmap would not happen before OOM. > > Using MAP_NORESERVE and eager munmap()s, the testcase works nicely even > > in very low physical memory conditions. > > Cool. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB 2025-01-09 7:47 ` Thomas Weißschuh @ 2025-01-09 13:05 ` David Hildenbrand 2025-01-09 13:19 ` David Hildenbrand 2025-01-09 13:38 ` Thomas Weißschuh 0 siblings, 2 replies; 18+ messages in thread From: David Hildenbrand @ 2025-01-09 13:05 UTC (permalink / raw) To: Thomas Weißschuh Cc: Dev Jain, Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm, linux-kselftest, linux-kernel, stable, Ryan Roberts > > That is clear. The issue would be to figure which chunks are valid to > unmap. If something critical like the executable file is unmapped, > the process crashes. But see below. Ah, now I see what you mean. Yes, also the stack etc. will be problematic. So IIUC, you want to limit the munmap optimization only to the manually mmap()ed parts. > >>> Is it fine to rely on CONFIG_ANON_VMA_NAME? >>> That would make it much easier to implement. >> >> Can you elaborate how you would do it? > > First set the VMA name after mmap(): > > for (i = 0; i < NR_CHUNKS_LOW; i++) { > ptr[i] = mmap(NULL, MAP_CHUNK_SIZE, PROT_READ | PROT_WRITE, > MAP_NORESERVE | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > > if (ptr[i] == MAP_FAILED) { > if (validate_lower_address_hint()) > ksft_exit_fail_msg("mmap unexpectedly succeeded with hint\n"); > break; > } > > validate_addr(ptr[i], 0); > if (prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ptr[i], MAP_CHUNK_SIZE, "virtual_address_range")) > ksft_exit_fail_msg("prctl(PR_SET_VMA_ANON_NAME) failed: %s\n", strerror(errno)); Likely this would prevent merging of VMAs. With a 1 GiB chunk size, and NR_CHUNKS_LOW == 128TiB, you'd already require 128k VMAs. The default limit is frequently 64k. We could just scan the ptr / hptr array to see if this is a manual mmap area or not. If this takes too long, one could sort the arrays by address and perform a binary search. Not the most efficient way of doing it, but maybe good enough for this test? Alternatively, store the pointer in a xarray-like tree instead of two arrays. Requires a bit more memory ... and we'd have to find a simple implementation we could just reuse in this test. So maybe there is a simpler way to get it done. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB 2025-01-09 13:05 ` David Hildenbrand @ 2025-01-09 13:19 ` David Hildenbrand 2025-01-09 13:38 ` Thomas Weißschuh 1 sibling, 0 replies; 18+ messages in thread From: David Hildenbrand @ 2025-01-09 13:19 UTC (permalink / raw) To: Thomas Weißschuh Cc: Dev Jain, Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm, linux-kselftest, linux-kernel, stable, Ryan Roberts On 09.01.25 14:05, David Hildenbrand wrote: > > >> That is clear. The issue would be to figure which chunks are valid to >> unmap. If something critical like the executable file is unmapped, >> the process crashes. But see below. > > Ah, now I see what you mean. Yes, also the stack etc. will be > problematic. So IIUC, you want to limit the munmap optimization only to > the manually mmap()ed parts. > >> >>>> Is it fine to rely on CONFIG_ANON_VMA_NAME? >>>> That would make it much easier to implement. >>> >>> Can you elaborate how you would do it? >> >> First set the VMA name after mmap(): I took a look at the implementation, and VMA merging seems to be able to merge such VMAs that share the same name (even when set separately). So assuming you use the same name for all, that should indeed also work. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB 2025-01-09 13:05 ` David Hildenbrand 2025-01-09 13:19 ` David Hildenbrand @ 2025-01-09 13:38 ` Thomas Weißschuh 1 sibling, 0 replies; 18+ messages in thread From: Thomas Weißschuh @ 2025-01-09 13:38 UTC (permalink / raw) To: David Hildenbrand Cc: Dev Jain, Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm, linux-kselftest, linux-kernel, stable, Ryan Roberts On Thu, Jan 09, 2025 at 02:05:43PM +0100, David Hildenbrand wrote: > > > > That is clear. The issue would be to figure which chunks are valid to > > unmap. If something critical like the executable file is unmapped, > > the process crashes. But see below. > > Ah, now I see what you mean. Yes, also the stack etc. will be problematic. > So IIUC, you want to limit the munmap optimization only to the manually > mmap()ed parts. Correct. > > > > Is it fine to rely on CONFIG_ANON_VMA_NAME? > > > > That would make it much easier to implement. > > > > > > Can you elaborate how you would do it? > > > > First set the VMA name after mmap(): > > > > for (i = 0; i < NR_CHUNKS_LOW; i++) { > > ptr[i] = mmap(NULL, MAP_CHUNK_SIZE, PROT_READ | PROT_WRITE, > > MAP_NORESERVE | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > > > > if (ptr[i] == MAP_FAILED) { > > if (validate_lower_address_hint()) > > ksft_exit_fail_msg("mmap unexpectedly succeeded with hint\n"); > > break; > > } > > > > validate_addr(ptr[i], 0); > > if (prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ptr[i], MAP_CHUNK_SIZE, "virtual_address_range")) > > ksft_exit_fail_msg("prctl(PR_SET_VMA_ANON_NAME) failed: %s\n", strerror(errno)); > > Likely this would prevent merging of VMAs. > > With a 1 GiB chunk size, and NR_CHUNKS_LOW == 128TiB, you'd already require > 128k VMAs. The default limit is frequently 64k. They are merged for me, as they all share the same name. PR_SET_VMA(2const) even mentions merging: Note that assigning an attribute to a virtual memory area might prevent it from being merged with adjacent virtual memory areas due to the difference in that attribute's value. is_mergeable_vma() has an explicit check using anon_vma_name_eq(). > We could just scan the ptr / hptr array to see if this is a manual mmap area > or not. If this takes too long, one could sort the arrays by address and > perform a binary search. > > Not the most efficient way of doing it, but maybe good enough for this test? A naive loop is what I tried first, but it took forever. > Alternatively, store the pointer in a xarray-like tree instead of two > arrays. Requires a bit more memory ... and we'd have to find a simple > implementation we could just reuse in this test. So maybe there is a simpler > way to get it done. IMO the prctl() is that simpler way. The only real drawback is the dependency on CONFIG_ANON_VMA_NAME. We can add an entry to tools/testing/selftests/mm/config for it. Thomas ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB 2025-01-08 16:13 ` Thomas Weißschuh 2025-01-08 16:46 ` David Hildenbrand @ 2025-01-09 5:40 ` Dev Jain 1 sibling, 0 replies; 18+ messages in thread From: Dev Jain @ 2025-01-09 5:40 UTC (permalink / raw) To: Thomas Weißschuh, David Hildenbrand Cc: Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm, linux-kselftest, linux-kernel, stable, Ryan Roberts On 08/01/25 9:43 pm, Thomas Weißschuh wrote: > On Wed, Jan 08, 2025 at 02:36:57PM +0100, David Hildenbrand wrote: >> On 08.01.25 09:05, Thomas Weißschuh wrote: >>> On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote: >>>> On 07/01/25 8:44 pm, Thomas Weißschuh wrote: >>>>> If not enough physical memory is available the kernel may fail mmap(); >>>>> see __vm_enough_memory() and vm_commit_limit(). >>>>> In that case the logic in validate_complete_va_space() does not make >>>>> sense and will even incorrectly fail. >>>>> Instead skip the test if no mmap() succeeded. >>>>> >>>>> Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") >>>>> Cc: stable@vger.kernel.org >> CC stable on tests is ... odd. > I thought it was fairly common, but it isn't. > Will drop it. Oh, well... https://lore.kernel.org/all/20240521074358.675031-4-dev.jain@arm.com/ I have done that before :) although the change I was making was fixing a fundamental flaw in the test and your change is fixing the test for a specific case (memory pressure), so I tend to concur with David. > >>>>> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> >>>>> >>>>> --- >>>>> The logic in __vm_enough_memory() seems weird. >>>>> It describes itself as "Check that a process has enough memory to >>>>> allocate a new virtual mapping", however it never checks the current >>>>> memory usage of the process. >>>>> So it only disallows large mappings. But many small mappings taking the >>>>> same amount of memory are allowed; and then even automatically merged >>>>> into one big mapping. >>>>> --- >>>>> tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++ >>>>> 1 file changed, 6 insertions(+) >>>>> >>>>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c >>>>> index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644 >>>>> --- a/tools/testing/selftests/mm/virtual_address_range.c >>>>> +++ b/tools/testing/selftests/mm/virtual_address_range.c >>>>> @@ -178,6 +178,12 @@ int main(int argc, char *argv[]) >>>>> validate_addr(ptr[i], 0); >>>>> } >>>>> lchunks = i; >>>>> + >>>>> + if (!lchunks) { >>>>> + ksft_test_result_skip("Not enough memory for a single chunk\n"); >>>>> + ksft_finished(); >>>>> + } >>>>> + >>>>> hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *)); >>>>> if (hptr == NULL) { >>>>> ksft_test_result_skip("Memory constraint not fulfilled\n"); >>>>> >>>> I do not know about __vm_enough_memory(), but I am going by your description: >>>> You say that the kernel may fail mmap() when enough physical memory is not >>>> there, but it may happen that we have already done 100 mmap()'s, and then >>>> the kernel fails mmap(), so if (!lchunks) won't be able to handle this case. >>>> Basically, lchunks == 0 is not a complete indicator of kernel failing mmap(). >>> __vm_enough_memory() only checks the size of each single mmap() on its >>> own. It does not actually check the current memory or address space >>> usage of the process. >>> This seems a bit weird, as indicated in my after-the-fold explanation. >>> >>>> The basic assumption of the test is that any process should be able to exhaust >>>> its virtual address space, and running the test under memory pressure and the >>>> kernel violating this behaviour defeats the point of the test I think? >>> The assumption is correct, as soon as one mapping succeeds the others >>> will also succeed, until the actual address space is exhausted. >>> >>> Looking at it again, __vm_enough_memory() is only called for writable >>> mappings, so it would be possible to use only readable mappings in the >>> test. The test will still fail with OOM, as the many PTEs need more than >>> 1GiB of physical memory anyways, but at least that produces a usable >>> error message. >>> However I'm not sure if this would violate other test assumptions. >>> >> Note that with MAP_NORESRVE, most setups we care about will allow mapping as >> much as you want, but on access OOM will fire. > Thanks for the hint. > >> So one could require that /proc/sys/vm/overcommit_memory is setup properly >> and use MAP_NORESRVE. > Isn't the check for lchunks == 0 essentially exactly this? > >> Reading from anonymous memory will populate the shared zeropage. To mitigate >> OOM from "too many page tables", one could simply unmap the pieces as they >> are verified (or MAP_FIXED over them, to free page tables). > The code has to figure out if a verified region was created by mmap(), > otherwise an munmap() could crash the process. > As the entries from /proc/self/maps may have been merged and (I assume) > the ordering of mappings is not guaranteed, some bespoke logic to establish > the link will be needed. > > Is it fine to rely on CONFIG_ANON_VMA_NAME? > That would make it much easier to implement. > > Using MAP_NORESERVE and eager munmap()s, the testcase works nicely even > in very low physical memory conditions. > > Thomas ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/3] selftests/mm: virtual_address_range: Avoid reading VVAR mappings 2025-01-07 15:14 [PATCH 0/3] selftests/mm: virtual_address_range: Two bugfixes and a cleanup Thomas Weißschuh 2025-01-07 15:14 ` [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB Thomas Weißschuh @ 2025-01-07 15:14 ` Thomas Weißschuh 2025-01-07 15:14 ` [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null Thomas Weißschuh 2 siblings, 0 replies; 18+ messages in thread From: Thomas Weißschuh @ 2025-01-07 15:14 UTC (permalink / raw) To: Andrew Morton, Shuah Khan, Dev Jain, Thomas Gleixner Cc: linux-mm, linux-kselftest, linux-kernel, Thomas Weißschuh, stable, kernel test robot The virtual_address_range selftest reads from the start of each mapping listed in /proc/self/maps. However not all mappings are valid to be arbitrarily accessed. For example the vvar data used for virtual clocks on x86 can only be accessed if 1) the kernel configuration enables virtual clocks and 2) the hypervisor provided the data for it, which can only determined by the VDSO code itself. Since commit e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") the virtual clock data was split out into its own mapping, triggering faulting accesses by virtual_address_range. Skip the various vvar mappings in virtual_address_range to avoid errors. Fixes: e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Cc: stable@vger.kernel.org Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202412271148.2656e485-lkp@intel.com Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> --- tools/testing/selftests/mm/virtual_address_range.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index d7bf8094d8bcd4bc96e2db4dc3fcb41968def859..484f82c7b7c871f82a7d9ec6d6c649f2ab1eb0cd 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -116,10 +116,11 @@ static int validate_complete_va_space(void) prev_end_addr = 0; while (fgets(line, sizeof(line), file)) { + int path_offset = 0; unsigned long hop; - if (sscanf(line, "%lx-%lx %s[rwxp-]", - &start_addr, &end_addr, prot) != 3) + if (sscanf(line, "%lx-%lx %4s %*s %*s %*s %n", + &start_addr, &end_addr, prot, &path_offset) != 3) ksft_exit_fail_msg("cannot parse /proc/self/maps\n"); /* end of userspace mappings; ignore vsyscall mapping */ @@ -135,6 +136,10 @@ static int validate_complete_va_space(void) if (prot[0] != 'r') continue; + /* Only the VDSO can know if a VVAR mapping is really readable */ + if (path_offset && !strncmp(line + path_offset, "[vvar", 5)) + continue; + /* * Confirm whether MAP_CHUNK_SIZE chunk can be found or not. * If write succeeds, no need to check MAP_CHUNK_SIZE - 1 -- 2.47.1 ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null 2025-01-07 15:14 [PATCH 0/3] selftests/mm: virtual_address_range: Two bugfixes and a cleanup Thomas Weißschuh 2025-01-07 15:14 ` [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB Thomas Weißschuh 2025-01-07 15:14 ` [PATCH 2/3] selftests/mm: virtual_address_range: Avoid reading VVAR mappings Thomas Weißschuh @ 2025-01-07 15:14 ` Thomas Weißschuh 2025-01-08 6:09 ` Dev Jain 2 siblings, 1 reply; 18+ messages in thread From: Thomas Weißschuh @ 2025-01-07 15:14 UTC (permalink / raw) To: Andrew Morton, Shuah Khan, Dev Jain, Thomas Gleixner Cc: linux-mm, linux-kselftest, linux-kernel, Thomas Weißschuh During the execution of validate_complete_va_space() a lot of memory is on the VM subsystem. When running on a low memory subsystem an OOM may be triggered, when writing to the dump file as the filesystem may also require memory. On my test system with 1100MiB physical memory: Tasks state (memory values in pages): [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name [ 57] 0 57 34359215953 695 256 0 439 1064390656 0 0 virtual_address Out of memory: Killed process 57 (virtual_address) total-vm:137436863812kB, anon-rss:1024kB, file-rss:0kB, shmem-rss:1756kB, UID:0 pgtables:1039444kB oom_score_adj:0 <snip> fault_in_iov_iter_readable+0x4a/0xd0 generic_perform_write+0x9c/0x280 shmem_file_write_iter+0x86/0x90 vfs_write+0x29c/0x480 ksys_write+0x6c/0xe0 do_syscall_64+0x9e/0x1a0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Write the dumped data into /dev/null instead which does not require additional memory during write(), making the code simpler as a side-effect. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> --- tools/testing/selftests/mm/virtual_address_range.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index 484f82c7b7c871f82a7d9ec6d6c649f2ab1eb0cd..4042fd878acd702d23da2c3293292de33bd48143 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -103,10 +103,9 @@ static int validate_complete_va_space(void) FILE *file; int fd; - fd = open("va_dump", O_CREAT | O_WRONLY, 0600); - unlink("va_dump"); + fd = open("/dev/null", O_WRONLY); if (fd < 0) { - ksft_test_result_skip("cannot create or open dump file\n"); + ksft_test_result_skip("cannot create or open /dev/null\n"); ksft_finished(); } @@ -152,7 +151,6 @@ static int validate_complete_va_space(void) while (start_addr + hop < end_addr) { if (write(fd, (void *)(start_addr + hop), 1) != 1) return 1; - lseek(fd, 0, SEEK_SET); hop += MAP_CHUNK_SIZE; } -- 2.47.1 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null 2025-01-07 15:14 ` [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null Thomas Weißschuh @ 2025-01-08 6:09 ` Dev Jain 2025-01-08 7:38 ` Thomas Weißschuh 2025-01-08 13:30 ` David Hildenbrand 0 siblings, 2 replies; 18+ messages in thread From: Dev Jain @ 2025-01-08 6:09 UTC (permalink / raw) To: Thomas Weißschuh, Andrew Morton, Shuah Khan, Thomas Gleixner Cc: linux-mm, linux-kselftest, linux-kernel, Ryan Roberts, David Hildenbrand [-- Attachment #1: Type: text/plain, Size: 3162 bytes --] On 07/01/25 8:44 pm, Thomas Weißschuh wrote: > During the execution of validate_complete_va_space() a lot of memory is > on the VM subsystem. When running on a low memory subsystem an OOM may > be triggered, when writing to the dump file as the filesystem may also > require memory. > > On my test system with 1100MiB physical memory: > > Tasks state (memory values in pages): > [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name > [ 57] 0 57 34359215953 695 256 0 439 1064390656 0 0 virtual_address > > Out of memory: Killed process 57 (virtual_address) total-vm:137436863812kB, anon-rss:1024kB, file-rss:0kB, shmem-rss:1756kB, UID:0 pgtables:1039444kB oom_score_adj:0 > <snip> > fault_in_iov_iter_readable+0x4a/0xd0 > generic_perform_write+0x9c/0x280 > shmem_file_write_iter+0x86/0x90 > vfs_write+0x29c/0x480 > ksys_write+0x6c/0xe0 > do_syscall_64+0x9e/0x1a0 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > Write the dumped data into /dev/null instead which does not require > additional memory during write(), making the code simpler as a > side-effect. > > Signed-off-by: Thomas Weißschuh<thomas.weissschuh@linutronix.de> > --- > tools/testing/selftests/mm/virtual_address_range.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c > index 484f82c7b7c871f82a7d9ec6d6c649f2ab1eb0cd..4042fd878acd702d23da2c3293292de33bd48143 100644 > --- a/tools/testing/selftests/mm/virtual_address_range.c > +++ b/tools/testing/selftests/mm/virtual_address_range.c > @@ -103,10 +103,9 @@ static int validate_complete_va_space(void) > FILE *file; > int fd; > > - fd = open("va_dump", O_CREAT | O_WRONLY, 0600); > - unlink("va_dump"); > + fd = open("/dev/null", O_WRONLY); > if (fd < 0) { > - ksft_test_result_skip("cannot create or open dump file\n"); > + ksft_test_result_skip("cannot create or open /dev/null\n"); > ksft_finished(); > } > > @@ -152,7 +151,6 @@ static int validate_complete_va_space(void) > while (start_addr + hop < end_addr) { > if (write(fd, (void *)(start_addr + hop), 1) != 1) > return 1; > - lseek(fd, 0, SEEK_SET); > > hop += MAP_CHUNK_SIZE; > } > The reason I had not used /dev/null was that write() was succeeding to /dev/null even from an address not in my VA space. I was puzzled about this behaviour of /dev/null and I chose to ignore it and just use a real file. To test this behaviour, run the following program: #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <sys/mman.h> intmain() { intfd; fd = open("va_dump", O_CREAT| O_WRONLY, 0600); unlink("va_dump"); // fd = open("/dev/null", O_WRONLY); intret = munmap((void*)(1UL<< 30), 100); if(!ret) printf("munmap succeeded\n"); intres = write(fd, (void*)(1UL<< 30), 1); if(res == 1) printf("write succeeded\n"); return0; } The write will fail as expected, but if you comment out the va_dump lines and use /dev/null, the write will succeed. [-- Attachment #2: Type: text/html, Size: 7148 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null 2025-01-08 6:09 ` Dev Jain @ 2025-01-08 7:38 ` Thomas Weißschuh 2025-01-08 13:30 ` David Hildenbrand 1 sibling, 0 replies; 18+ messages in thread From: Thomas Weißschuh @ 2025-01-08 7:38 UTC (permalink / raw) To: Dev Jain Cc: Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm, linux-kselftest, linux-kernel, Ryan Roberts, David Hildenbrand On Wed, Jan 08, 2025 at 11:39:40AM +0530, Dev Jain wrote: > > On 07/01/25 8:44 pm, Thomas Weißschuh wrote: > > During the execution of validate_complete_va_space() a lot of memory is > > on the VM subsystem. When running on a low memory subsystem an OOM may > > be triggered, when writing to the dump file as the filesystem may also > > require memory. > > > > On my test system with 1100MiB physical memory: > > > > Tasks state (memory values in pages): > > [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name > > [ 57] 0 57 34359215953 695 256 0 439 1064390656 0 0 virtual_address > > > > Out of memory: Killed process 57 (virtual_address) total-vm:137436863812kB, anon-rss:1024kB, file-rss:0kB, shmem-rss:1756kB, UID:0 pgtables:1039444kB oom_score_adj:0 > > <snip> > > fault_in_iov_iter_readable+0x4a/0xd0 > > generic_perform_write+0x9c/0x280 > > shmem_file_write_iter+0x86/0x90 > > vfs_write+0x29c/0x480 > > ksys_write+0x6c/0xe0 > > do_syscall_64+0x9e/0x1a0 > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > Write the dumped data into /dev/null instead which does not require > > additional memory during write(), making the code simpler as a > > side-effect. > > > > Signed-off-by: Thomas Weißschuh<thomas.weissschuh@linutronix.de> > > --- > > tools/testing/selftests/mm/virtual_address_range.c | 6 ++---- > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c > > index 484f82c7b7c871f82a7d9ec6d6c649f2ab1eb0cd..4042fd878acd702d23da2c3293292de33bd48143 100644 > > --- a/tools/testing/selftests/mm/virtual_address_range.c > > +++ b/tools/testing/selftests/mm/virtual_address_range.c > > @@ -103,10 +103,9 @@ static int validate_complete_va_space(void) > > FILE *file; > > int fd; > > - fd = open("va_dump", O_CREAT | O_WRONLY, 0600); > > - unlink("va_dump"); > > + fd = open("/dev/null", O_WRONLY); > > if (fd < 0) { > > - ksft_test_result_skip("cannot create or open dump file\n"); > > + ksft_test_result_skip("cannot create or open /dev/null\n"); > > ksft_finished(); > > } > > @@ -152,7 +151,6 @@ static int validate_complete_va_space(void) > > while (start_addr + hop < end_addr) { > > if (write(fd, (void *)(start_addr + hop), 1) != 1) > > return 1; > > - lseek(fd, 0, SEEK_SET); > > hop += MAP_CHUNK_SIZE; > > } > > > > The reason I had not used /dev/null was that write() was succeeding to /dev/null > even from an address not in my VA space. I was puzzled about this behaviour of > /dev/null and I chose to ignore it and just use a real file. That makes sense and I can reproduce your example. Switching to another dummy file which reads the written data like /dev/random also leads to OOM, so wouldn't help either. Thanks for the explanation. @Andrew, could you drop this patch? > To test this behaviour, run the following program: [..] PS: Your mail contained HTML and did not make it to the list archives. (And the text variant of the example program was corrupted) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null 2025-01-08 6:09 ` Dev Jain 2025-01-08 7:38 ` Thomas Weißschuh @ 2025-01-08 13:30 ` David Hildenbrand 2025-01-09 5:32 ` Dev Jain 1 sibling, 1 reply; 18+ messages in thread From: David Hildenbrand @ 2025-01-08 13:30 UTC (permalink / raw) To: Dev Jain, Thomas Weißschuh, Andrew Morton, Shuah Khan, Thomas Gleixner Cc: linux-mm, linux-kselftest, linux-kernel, Ryan Roberts On 08.01.25 07:09, Dev Jain wrote: > > On 07/01/25 8:44 pm, Thomas Weißschuh wrote: >> During the execution of validate_complete_va_space() a lot of memory is >> on the VM subsystem. When running on a low memory subsystem an OOM may >> be triggered, when writing to the dump file as the filesystem may also >> require memory. >> >> On my test system with 1100MiB physical memory: >> >> Tasks state (memory values in pages): >> [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name >> [ 57] 0 57 34359215953 695 256 0 439 1064390656 0 0 virtual_address >> >> Out of memory: Killed process 57 (virtual_address) total-vm:137436863812kB, anon-rss:1024kB, file-rss:0kB, shmem-rss:1756kB, UID:0 pgtables:1039444kB oom_score_adj:0 >> <snip> >> fault_in_iov_iter_readable+0x4a/0xd0 >> generic_perform_write+0x9c/0x280 >> shmem_file_write_iter+0x86/0x90 >> vfs_write+0x29c/0x480 >> ksys_write+0x6c/0xe0 >> do_syscall_64+0x9e/0x1a0 >> entry_SYSCALL_64_after_hwframe+0x77/0x7f >> >> Write the dumped data into /dev/null instead which does not require >> additional memory during write(), making the code simpler as a >> side-effect. >> >> Signed-off-by: Thomas Weißschuh<thomas.weissschuh@linutronix.de> >> --- >> tools/testing/selftests/mm/virtual_address_range.c | 6 ++---- >> 1 file changed, 2 insertions(+), 4 deletions(-) >> >> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c >> index 484f82c7b7c871f82a7d9ec6d6c649f2ab1eb0cd..4042fd878acd702d23da2c3293292de33bd48143 100644 >> --- a/tools/testing/selftests/mm/virtual_address_range.c >> +++ b/tools/testing/selftests/mm/virtual_address_range.c >> @@ -103,10 +103,9 @@ static int validate_complete_va_space(void) >> FILE *file; >> int fd; >> >> - fd = open("va_dump", O_CREAT | O_WRONLY, 0600); >> - unlink("va_dump"); >> + fd = open("/dev/null", O_WRONLY); >> if (fd < 0) { >> - ksft_test_result_skip("cannot create or open dump file\n"); >> + ksft_test_result_skip("cannot create or open /dev/null\n"); >> ksft_finished(); >> } >> >> @@ -152,7 +151,6 @@ static int validate_complete_va_space(void) >> while (start_addr + hop < end_addr) { >> if (write(fd, (void *)(start_addr + hop), 1) != 1) >> return 1; >> - lseek(fd, 0, SEEK_SET); >> >> hop += MAP_CHUNK_SIZE; >> } >> > > The reason I had not used /dev/null was that write() was succeeding to /dev/null > even from an address not in my VA space. I was puzzled about this behaviour of > /dev/null and I chose to ignore it and just use a real file. > > To test this behaviour, run the following program: > > #include <stdio.h> > #include <stdlib.h> > #include <unistd.h> > #include <fcntl.h> > #include <sys/mman.h> > intmain() > { > intfd; > fd = open("va_dump", O_CREAT| O_WRONLY, 0600); > unlink("va_dump"); > // fd = open("/dev/null", O_WRONLY); > intret = munmap((void*)(1UL<< 30), 100); > if(!ret) > printf("munmap succeeded\n"); > intres = write(fd, (void*)(1UL<< 30), 1); > if(res == 1) > printf("write succeeded\n"); > return0; > } > The write will fail as expected, but if you comment out the va_dump > lines and use /dev/null, the write will succeed. What exactly do we want to achieve with the write? Verify that the output of /proc/self/map is reasonable and we can actually resolve a fault / map a page? Why not access the memory directly+signal handler or using /proc/self/mem, so you can avoid the temp file completely? -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null 2025-01-08 13:30 ` David Hildenbrand @ 2025-01-09 5:32 ` Dev Jain 0 siblings, 0 replies; 18+ messages in thread From: Dev Jain @ 2025-01-09 5:32 UTC (permalink / raw) To: David Hildenbrand, Thomas Weißschuh, Andrew Morton, Shuah Khan, Thomas Gleixner Cc: linux-mm, linux-kselftest, linux-kernel, Ryan Roberts On 08/01/25 7:00 pm, David Hildenbrand wrote: > On 08.01.25 07:09, Dev Jain wrote: >> >> On 07/01/25 8:44 pm, Thomas Weißschuh wrote: >>> During the execution of validate_complete_va_space() a lot of memory is >>> on the VM subsystem. When running on a low memory subsystem an OOM may >>> be triggered, when writing to the dump file as the filesystem may also >>> require memory. >>> >>> On my test system with 1100MiB physical memory: >>> >>> Tasks state (memory values in pages): >>> [ pid ] uid tgid total_vm rss rss_anon rss_file >>> rss_shmem pgtables_bytes swapents oom_score_adj name >>> [ 57] 0 57 34359215953 695 256 0 439 >>> 1064390656 0 0 virtual_address >>> >>> Out of memory: Killed process 57 (virtual_address) >>> total-vm:137436863812kB, anon-rss:1024kB, file-rss:0kB, >>> shmem-rss:1756kB, UID:0 pgtables:1039444kB oom_score_adj:0 >>> <snip> >>> fault_in_iov_iter_readable+0x4a/0xd0 >>> generic_perform_write+0x9c/0x280 >>> shmem_file_write_iter+0x86/0x90 >>> vfs_write+0x29c/0x480 >>> ksys_write+0x6c/0xe0 >>> do_syscall_64+0x9e/0x1a0 >>> entry_SYSCALL_64_after_hwframe+0x77/0x7f >>> >>> Write the dumped data into /dev/null instead which does not require >>> additional memory during write(), making the code simpler as a >>> side-effect. >>> >>> Signed-off-by: Thomas Weißschuh<thomas.weissschuh@linutronix.de> >>> --- >>> tools/testing/selftests/mm/virtual_address_range.c | 6 ++---- >>> 1 file changed, 2 insertions(+), 4 deletions(-) >>> >>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c >>> b/tools/testing/selftests/mm/virtual_address_range.c >>> index >>> 484f82c7b7c871f82a7d9ec6d6c649f2ab1eb0cd..4042fd878acd702d23da2c3293292de33bd48143 >>> 100644 >>> --- a/tools/testing/selftests/mm/virtual_address_range.c >>> +++ b/tools/testing/selftests/mm/virtual_address_range.c >>> @@ -103,10 +103,9 @@ static int validate_complete_va_space(void) >>> FILE *file; >>> int fd; >>> - fd = open("va_dump", O_CREAT | O_WRONLY, 0600); >>> - unlink("va_dump"); >>> + fd = open("/dev/null", O_WRONLY); >>> if (fd < 0) { >>> - ksft_test_result_skip("cannot create or open dump file\n"); >>> + ksft_test_result_skip("cannot create or open /dev/null\n"); >>> ksft_finished(); >>> } > >> >> @@ -152,7 +151,6 @@ static int validate_complete_va_space(void) >>> while (start_addr + hop < end_addr) { >>> if (write(fd, (void *)(start_addr + hop), 1) != 1) >>> return 1; >>> - lseek(fd, 0, SEEK_SET); >>> hop += MAP_CHUNK_SIZE; >>> } >>> >> >> The reason I had not used /dev/null was that write() was succeeding >> to /dev/null >> even from an address not in my VA space. I was puzzled about this >> behaviour of >> /dev/null and I chose to ignore it and just use a real file. >> >> To test this behaviour, run the following program: >> >> #include <stdio.h> >> #include <stdlib.h> >> #include <unistd.h> >> #include <fcntl.h> >> #include <sys/mman.h> >> intmain() >> { >> intfd; >> fd = open("va_dump", O_CREAT| O_WRONLY, 0600); >> unlink("va_dump"); >> // fd = open("/dev/null", O_WRONLY); >> intret = munmap((void*)(1UL<< 30), 100); >> if(!ret) >> printf("munmap succeeded\n"); >> intres = write(fd, (void*)(1UL<< 30), 1); >> if(res == 1) >> printf("write succeeded\n"); >> return0; >> } >> The write will fail as expected, but if you comment out the va_dump >> lines and use /dev/null, the write will succeed. > > What exactly do we want to achieve with the write? Verify that the > output of /proc/self/map is reasonable and we can actually resolve a > fault / map a page? > > Why not access the memory directly+signal handler or using > /proc/self/mem, so you can avoid the temp file completely? > We want to determine whether an address belongs to our address space. The proper way to do that is to access the memory, get a segfault and jump to signal handler. I wanted to avoid this code churn, so chose to use write() so that I can validate the address without getting a segfault. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-01-09 13:38 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-01-07 15:14 [PATCH 0/3] selftests/mm: virtual_address_range: Two bugfixes and a cleanup Thomas Weißschuh 2025-01-07 15:14 ` [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB Thomas Weißschuh 2025-01-08 6:16 ` Dev Jain 2025-01-08 8:05 ` Thomas Weißschuh 2025-01-08 13:36 ` David Hildenbrand 2025-01-08 16:13 ` Thomas Weißschuh 2025-01-08 16:46 ` David Hildenbrand 2025-01-09 7:47 ` Thomas Weißschuh 2025-01-09 13:05 ` David Hildenbrand 2025-01-09 13:19 ` David Hildenbrand 2025-01-09 13:38 ` Thomas Weißschuh 2025-01-09 5:40 ` Dev Jain 2025-01-07 15:14 ` [PATCH 2/3] selftests/mm: virtual_address_range: Avoid reading VVAR mappings Thomas Weißschuh 2025-01-07 15:14 ` [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null Thomas Weißschuh 2025-01-08 6:09 ` Dev Jain 2025-01-08 7:38 ` Thomas Weißschuh 2025-01-08 13:30 ` David Hildenbrand 2025-01-09 5:32 ` Dev Jain
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox