linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] selftests/mm: virtual_address_range: Two bugfixes and a cleanup
@ 2025-01-07 15:14 Thomas Weißschuh
  2025-01-07 15:14 ` [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB Thomas Weißschuh
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Thomas Weißschuh @ 2025-01-07 15:14 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, Dev Jain, Thomas Gleixner
  Cc: linux-mm, linux-kselftest, linux-kernel, Thomas Weißschuh,
	stable, kernel test robot

The selftest started failing since commit e93d2521b27f
("x86/vdso: Split virtual clock pages into dedicated mapping")
was merged. While debugging I stumbled upon another bug and potential
cleanup.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
---
Thomas Weißschuh (3):
      selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB
      selftests/mm: virtual_address_range: Avoid reading VVAR mappings
      selftests/mm: virtual_address_range: Dump to /dev/null

 tools/testing/selftests/mm/virtual_address_range.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)
---
base-commit: fbfd64d25c7af3b8695201ebc85efe90be28c5a3
change-id: 20250107-virtual_address_range-tests-95843766fa97

Best regards,
-- 
Thomas Weißschuh <thomas.weissschuh@linutronix.de>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB
  2025-01-07 15:14 [PATCH 0/3] selftests/mm: virtual_address_range: Two bugfixes and a cleanup Thomas Weißschuh
@ 2025-01-07 15:14 ` Thomas Weißschuh
  2025-01-08  6:16   ` Dev Jain
  2025-01-07 15:14 ` [PATCH 2/3] selftests/mm: virtual_address_range: Avoid reading VVAR mappings Thomas Weißschuh
  2025-01-07 15:14 ` [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null Thomas Weißschuh
  2 siblings, 1 reply; 18+ messages in thread
From: Thomas Weißschuh @ 2025-01-07 15:14 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, Dev Jain, Thomas Gleixner
  Cc: linux-mm, linux-kselftest, linux-kernel, Thomas Weißschuh, stable

If not enough physical memory is available the kernel may fail mmap();
see __vm_enough_memory() and vm_commit_limit().
In that case the logic in validate_complete_va_space() does not make
sense and will even incorrectly fail.
Instead skip the test if no mmap() succeeded.

Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>

---
The logic in __vm_enough_memory() seems weird.
It describes itself as "Check that a process has enough memory to
allocate a new virtual mapping", however it never checks the current
memory usage of the process.
So it only disallows large mappings. But many small mappings taking the
same amount of memory are allowed; and then even automatically merged
into one big mapping.
---
 tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -178,6 +178,12 @@ int main(int argc, char *argv[])
 		validate_addr(ptr[i], 0);
 	}
 	lchunks = i;
+
+	if (!lchunks) {
+		ksft_test_result_skip("Not enough memory for a single chunk\n");
+		ksft_finished();
+	}
+
 	hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *));
 	if (hptr == NULL) {
 		ksft_test_result_skip("Memory constraint not fulfilled\n");

-- 
2.47.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/3] selftests/mm: virtual_address_range: Avoid reading VVAR mappings
  2025-01-07 15:14 [PATCH 0/3] selftests/mm: virtual_address_range: Two bugfixes and a cleanup Thomas Weißschuh
  2025-01-07 15:14 ` [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB Thomas Weißschuh
@ 2025-01-07 15:14 ` Thomas Weißschuh
  2025-01-07 15:14 ` [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null Thomas Weißschuh
  2 siblings, 0 replies; 18+ messages in thread
From: Thomas Weißschuh @ 2025-01-07 15:14 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, Dev Jain, Thomas Gleixner
  Cc: linux-mm, linux-kselftest, linux-kernel, Thomas Weißschuh,
	stable, kernel test robot

The virtual_address_range selftest reads from the start of each mapping
listed in /proc/self/maps.
However not all mappings are valid to be arbitrarily accessed.
For example the vvar data used for virtual clocks on x86 can only be
accessed if 1) the kernel configuration enables virtual clocks and 2)
the hypervisor provided the data for it, which can only determined by
the VDSO code itself.
Since commit e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping")
the virtual clock data was split out into its own mapping, triggering
faulting accesses by virtual_address_range.

Skip the various vvar mappings in virtual_address_range to avoid errors.

Fixes: e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping")
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202412271148.2656e485-lkp@intel.com
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
---
 tools/testing/selftests/mm/virtual_address_range.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
index d7bf8094d8bcd4bc96e2db4dc3fcb41968def859..484f82c7b7c871f82a7d9ec6d6c649f2ab1eb0cd 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -116,10 +116,11 @@ static int validate_complete_va_space(void)
 
 	prev_end_addr = 0;
 	while (fgets(line, sizeof(line), file)) {
+		int path_offset = 0;
 		unsigned long hop;
 
-		if (sscanf(line, "%lx-%lx %s[rwxp-]",
-			   &start_addr, &end_addr, prot) != 3)
+		if (sscanf(line, "%lx-%lx %4s %*s %*s %*s %n",
+			   &start_addr, &end_addr, prot, &path_offset) != 3)
 			ksft_exit_fail_msg("cannot parse /proc/self/maps\n");
 
 		/* end of userspace mappings; ignore vsyscall mapping */
@@ -135,6 +136,10 @@ static int validate_complete_va_space(void)
 		if (prot[0] != 'r')
 			continue;
 
+		/* Only the VDSO can know if a VVAR mapping is really readable */
+		if (path_offset && !strncmp(line + path_offset, "[vvar", 5))
+			continue;
+
 		/*
 		 * Confirm whether MAP_CHUNK_SIZE chunk can be found or not.
 		 * If write succeeds, no need to check MAP_CHUNK_SIZE - 1

-- 
2.47.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null
  2025-01-07 15:14 [PATCH 0/3] selftests/mm: virtual_address_range: Two bugfixes and a cleanup Thomas Weißschuh
  2025-01-07 15:14 ` [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB Thomas Weißschuh
  2025-01-07 15:14 ` [PATCH 2/3] selftests/mm: virtual_address_range: Avoid reading VVAR mappings Thomas Weißschuh
@ 2025-01-07 15:14 ` Thomas Weißschuh
  2025-01-08  6:09   ` Dev Jain
  2 siblings, 1 reply; 18+ messages in thread
From: Thomas Weißschuh @ 2025-01-07 15:14 UTC (permalink / raw)
  To: Andrew Morton, Shuah Khan, Dev Jain, Thomas Gleixner
  Cc: linux-mm, linux-kselftest, linux-kernel, Thomas Weißschuh

During the execution of validate_complete_va_space() a lot of memory is
on the VM subsystem. When running on a low memory subsystem an OOM may
be triggered, when writing to the dump file as the filesystem may also
require memory.

On my test system with 1100MiB physical memory:

	Tasks state (memory values in pages):
	[  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
	[     57]     0    57 34359215953      695      256        0       439 1064390656        0             0 virtual_address

	Out of memory: Killed process 57 (virtual_address) total-vm:137436863812kB, anon-rss:1024kB, file-rss:0kB, shmem-rss:1756kB, UID:0 pgtables:1039444kB oom_score_adj:0
	<snip>
	fault_in_iov_iter_readable+0x4a/0xd0
	generic_perform_write+0x9c/0x280
	shmem_file_write_iter+0x86/0x90
	vfs_write+0x29c/0x480
	ksys_write+0x6c/0xe0
	do_syscall_64+0x9e/0x1a0
	entry_SYSCALL_64_after_hwframe+0x77/0x7f

Write the dumped data into /dev/null instead which does not require
additional memory during write(), making the code simpler as a
side-effect.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
---
 tools/testing/selftests/mm/virtual_address_range.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
index 484f82c7b7c871f82a7d9ec6d6c649f2ab1eb0cd..4042fd878acd702d23da2c3293292de33bd48143 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -103,10 +103,9 @@ static int validate_complete_va_space(void)
 	FILE *file;
 	int fd;
 
-	fd = open("va_dump", O_CREAT | O_WRONLY, 0600);
-	unlink("va_dump");
+	fd = open("/dev/null", O_WRONLY);
 	if (fd < 0) {
-		ksft_test_result_skip("cannot create or open dump file\n");
+		ksft_test_result_skip("cannot create or open /dev/null\n");
 		ksft_finished();
 	}
 
@@ -152,7 +151,6 @@ static int validate_complete_va_space(void)
 		while (start_addr + hop < end_addr) {
 			if (write(fd, (void *)(start_addr + hop), 1) != 1)
 				return 1;
-			lseek(fd, 0, SEEK_SET);
 
 			hop += MAP_CHUNK_SIZE;
 		}

-- 
2.47.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null
  2025-01-07 15:14 ` [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null Thomas Weißschuh
@ 2025-01-08  6:09   ` Dev Jain
  2025-01-08  7:38     ` Thomas Weißschuh
  2025-01-08 13:30     ` David Hildenbrand
  0 siblings, 2 replies; 18+ messages in thread
From: Dev Jain @ 2025-01-08  6:09 UTC (permalink / raw)
  To: Thomas Weißschuh, Andrew Morton, Shuah Khan, Thomas Gleixner
  Cc: linux-mm, linux-kselftest, linux-kernel, Ryan Roberts, David Hildenbrand

[-- Attachment #1: Type: text/plain, Size: 3162 bytes --]


On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
> During the execution of validate_complete_va_space() a lot of memory is
> on the VM subsystem. When running on a low memory subsystem an OOM may
> be triggered, when writing to the dump file as the filesystem may also
> require memory.
>
> On my test system with 1100MiB physical memory:
>
> 	Tasks state (memory values in pages):
> 	[  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
> 	[     57]     0    57 34359215953      695      256        0       439 1064390656        0             0 virtual_address
>
> 	Out of memory: Killed process 57 (virtual_address) total-vm:137436863812kB, anon-rss:1024kB, file-rss:0kB, shmem-rss:1756kB, UID:0 pgtables:1039444kB oom_score_adj:0
> 	<snip>
> 	fault_in_iov_iter_readable+0x4a/0xd0
> 	generic_perform_write+0x9c/0x280
> 	shmem_file_write_iter+0x86/0x90
> 	vfs_write+0x29c/0x480
> 	ksys_write+0x6c/0xe0
> 	do_syscall_64+0x9e/0x1a0
> 	entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> Write the dumped data into /dev/null instead which does not require
> additional memory during write(), making the code simpler as a
> side-effect.
>
> Signed-off-by: Thomas Weißschuh<thomas.weissschuh@linutronix.de>
> ---
>   tools/testing/selftests/mm/virtual_address_range.c | 6 ++----
>   1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> index 484f82c7b7c871f82a7d9ec6d6c649f2ab1eb0cd..4042fd878acd702d23da2c3293292de33bd48143 100644
> --- a/tools/testing/selftests/mm/virtual_address_range.c
> +++ b/tools/testing/selftests/mm/virtual_address_range.c
> @@ -103,10 +103,9 @@ static int validate_complete_va_space(void)
>   	FILE *file;
>   	int fd;
>   
> -	fd = open("va_dump", O_CREAT | O_WRONLY, 0600);
> -	unlink("va_dump");
> +	fd = open("/dev/null", O_WRONLY);
>   	if (fd < 0) {
> -		ksft_test_result_skip("cannot create or open dump file\n");
> +		ksft_test_result_skip("cannot create or open /dev/null\n");
>   		ksft_finished();
>   	}
>   
> @@ -152,7 +151,6 @@ static int validate_complete_va_space(void)
>   		while (start_addr + hop < end_addr) {
>   			if (write(fd, (void *)(start_addr + hop), 1) != 1)
>   				return 1;
> -			lseek(fd, 0, SEEK_SET);
>   
>   			hop += MAP_CHUNK_SIZE;
>   		}
>

The reason I had not used /dev/null was that write() was succeeding to /dev/null
even from an address not in my VA space. I was puzzled about this behaviour of
/dev/null and I chose to ignore it and just use a real file.

To test this behaviour, run the following program:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
intmain()
{
intfd;
fd = open("va_dump", O_CREAT| O_WRONLY, 0600);
unlink("va_dump");
// fd = open("/dev/null", O_WRONLY);
intret = munmap((void*)(1UL<< 30), 100);
if(!ret)
printf("munmap succeeded\n");
intres = write(fd, (void*)(1UL<< 30), 1);
if(res == 1)
printf("write succeeded\n");
return0;
}
The write will fail as expected, but if you comment out the va_dump
lines and use /dev/null, the write will succeed.

[-- Attachment #2: Type: text/html, Size: 7148 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB
  2025-01-07 15:14 ` [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB Thomas Weißschuh
@ 2025-01-08  6:16   ` Dev Jain
  2025-01-08  8:05     ` Thomas Weißschuh
  0 siblings, 1 reply; 18+ messages in thread
From: Dev Jain @ 2025-01-08  6:16 UTC (permalink / raw)
  To: Thomas Weißschuh, Andrew Morton, Shuah Khan, Thomas Gleixner
  Cc: linux-mm, linux-kselftest, linux-kernel, stable, Ryan Roberts,
	David Hildenbrand


On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
> If not enough physical memory is available the kernel may fail mmap();
> see __vm_enough_memory() and vm_commit_limit().
> In that case the logic in validate_complete_va_space() does not make
> sense and will even incorrectly fail.
> Instead skip the test if no mmap() succeeded.
>
> Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
> Cc: stable@vger.kernel.org
> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
>
> ---
> The logic in __vm_enough_memory() seems weird.
> It describes itself as "Check that a process has enough memory to
> allocate a new virtual mapping", however it never checks the current
> memory usage of the process.
> So it only disallows large mappings. But many small mappings taking the
> same amount of memory are allowed; and then even automatically merged
> into one big mapping.
> ---
>   tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644
> --- a/tools/testing/selftests/mm/virtual_address_range.c
> +++ b/tools/testing/selftests/mm/virtual_address_range.c
> @@ -178,6 +178,12 @@ int main(int argc, char *argv[])
>   		validate_addr(ptr[i], 0);
>   	}
>   	lchunks = i;
> +
> +	if (!lchunks) {
> +		ksft_test_result_skip("Not enough memory for a single chunk\n");
> +		ksft_finished();
> +	}
> +
>   	hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *));
>   	if (hptr == NULL) {
>   		ksft_test_result_skip("Memory constraint not fulfilled\n");
>

I do not  know about __vm_enough_memory(), but I am going by your description:
You say that the kernel may fail mmap() when enough physical memory is not
there, but it may happen that we have already done 100 mmap()'s, and then
the kernel fails mmap(), so if (!lchunks) won't be able to handle this case.
Basically, lchunks == 0 is not a complete indicator of kernel failing mmap().

The basic assumption of the test is that any process should be able to exhaust
its virtual address space, and running the test under memory pressure and the
kernel violating this behaviour defeats the point of the test I think?



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null
  2025-01-08  6:09   ` Dev Jain
@ 2025-01-08  7:38     ` Thomas Weißschuh
  2025-01-08 13:30     ` David Hildenbrand
  1 sibling, 0 replies; 18+ messages in thread
From: Thomas Weißschuh @ 2025-01-08  7:38 UTC (permalink / raw)
  To: Dev Jain
  Cc: Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm,
	linux-kselftest, linux-kernel, Ryan Roberts, David Hildenbrand

On Wed, Jan 08, 2025 at 11:39:40AM +0530, Dev Jain wrote:
> 
> On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
> > During the execution of validate_complete_va_space() a lot of memory is
> > on the VM subsystem. When running on a low memory subsystem an OOM may
> > be triggered, when writing to the dump file as the filesystem may also
> > require memory.
> > 
> > On my test system with 1100MiB physical memory:
> > 
> > 	Tasks state (memory values in pages):
> > 	[  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
> > 	[     57]     0    57 34359215953      695      256        0       439 1064390656        0             0 virtual_address
> > 
> > 	Out of memory: Killed process 57 (virtual_address) total-vm:137436863812kB, anon-rss:1024kB, file-rss:0kB, shmem-rss:1756kB, UID:0 pgtables:1039444kB oom_score_adj:0
> > 	<snip>
> > 	fault_in_iov_iter_readable+0x4a/0xd0
> > 	generic_perform_write+0x9c/0x280
> > 	shmem_file_write_iter+0x86/0x90
> > 	vfs_write+0x29c/0x480
> > 	ksys_write+0x6c/0xe0
> > 	do_syscall_64+0x9e/0x1a0
> > 	entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > 
> > Write the dumped data into /dev/null instead which does not require
> > additional memory during write(), making the code simpler as a
> > side-effect.
> > 
> > Signed-off-by: Thomas Weißschuh<thomas.weissschuh@linutronix.de>
> > ---
> >   tools/testing/selftests/mm/virtual_address_range.c | 6 ++----
> >   1 file changed, 2 insertions(+), 4 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> > index 484f82c7b7c871f82a7d9ec6d6c649f2ab1eb0cd..4042fd878acd702d23da2c3293292de33bd48143 100644
> > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > @@ -103,10 +103,9 @@ static int validate_complete_va_space(void)
> >   	FILE *file;
> >   	int fd;
> > -	fd = open("va_dump", O_CREAT | O_WRONLY, 0600);
> > -	unlink("va_dump");
> > +	fd = open("/dev/null", O_WRONLY);
> >   	if (fd < 0) {
> > -		ksft_test_result_skip("cannot create or open dump file\n");
> > +		ksft_test_result_skip("cannot create or open /dev/null\n");
> >   		ksft_finished();
> >   	}
> > @@ -152,7 +151,6 @@ static int validate_complete_va_space(void)
> >   		while (start_addr + hop < end_addr) {
> >   			if (write(fd, (void *)(start_addr + hop), 1) != 1)
> >   				return 1;
> > -			lseek(fd, 0, SEEK_SET);
> >   			hop += MAP_CHUNK_SIZE;
> >   		}
> > 
> 
> The reason I had not used /dev/null was that write() was succeeding to /dev/null
> even from an address not in my VA space. I was puzzled about this behaviour of
> /dev/null and I chose to ignore it and just use a real file.

That makes sense and I can reproduce your example.
Switching to another dummy file which reads the written data like
/dev/random also leads to OOM, so wouldn't help either.

Thanks for the explanation.

@Andrew, could you drop this patch?

> To test this behaviour, run the following program:

[..]

PS: Your mail contained HTML and did not make it to the list archives.
(And the text variant of the example program was corrupted)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB
  2025-01-08  6:16   ` Dev Jain
@ 2025-01-08  8:05     ` Thomas Weißschuh
  2025-01-08 13:36       ` David Hildenbrand
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Weißschuh @ 2025-01-08  8:05 UTC (permalink / raw)
  To: Dev Jain
  Cc: Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm,
	linux-kselftest, linux-kernel, stable, Ryan Roberts,
	David Hildenbrand

On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote:
> 
> On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
> > If not enough physical memory is available the kernel may fail mmap();
> > see __vm_enough_memory() and vm_commit_limit().
> > In that case the logic in validate_complete_va_space() does not make
> > sense and will even incorrectly fail.
> > Instead skip the test if no mmap() succeeded.
> > 
> > Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
> > 
> > ---
> > The logic in __vm_enough_memory() seems weird.
> > It describes itself as "Check that a process has enough memory to
> > allocate a new virtual mapping", however it never checks the current
> > memory usage of the process.
> > So it only disallows large mappings. But many small mappings taking the
> > same amount of memory are allowed; and then even automatically merged
> > into one big mapping.
> > ---
> >   tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++
> >   1 file changed, 6 insertions(+)
> > 
> > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> > index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644
> > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > @@ -178,6 +178,12 @@ int main(int argc, char *argv[])
> >   		validate_addr(ptr[i], 0);
> >   	}
> >   	lchunks = i;
> > +
> > +	if (!lchunks) {
> > +		ksft_test_result_skip("Not enough memory for a single chunk\n");
> > +		ksft_finished();
> > +	}
> > +
> >   	hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *));
> >   	if (hptr == NULL) {
> >   		ksft_test_result_skip("Memory constraint not fulfilled\n");
> > 
> 
> I do not  know about __vm_enough_memory(), but I am going by your description:
> You say that the kernel may fail mmap() when enough physical memory is not
> there, but it may happen that we have already done 100 mmap()'s, and then
> the kernel fails mmap(), so if (!lchunks) won't be able to handle this case.
> Basically, lchunks == 0 is not a complete indicator of kernel failing mmap().

__vm_enough_memory() only checks the size of each single mmap() on its
own. It does not actually check the current memory or address space
usage of the process.
This seems a bit weird, as indicated in my after-the-fold explanation.

> The basic assumption of the test is that any process should be able to exhaust
> its virtual address space, and running the test under memory pressure and the
> kernel violating this behaviour defeats the point of the test I think?

The assumption is correct, as soon as one mapping succeeds the others
will also succeed, until the actual address space is exhausted.

Looking at it again, __vm_enough_memory() is only called for writable
mappings, so it would be possible to use only readable mappings in the
test. The test will still fail with OOM, as the many PTEs need more than
1GiB of physical memory anyways, but at least that produces a usable
error message.
However I'm not sure if this would violate other test assumptions.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null
  2025-01-08  6:09   ` Dev Jain
  2025-01-08  7:38     ` Thomas Weißschuh
@ 2025-01-08 13:30     ` David Hildenbrand
  2025-01-09  5:32       ` Dev Jain
  1 sibling, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2025-01-08 13:30 UTC (permalink / raw)
  To: Dev Jain, Thomas Weißschuh, Andrew Morton, Shuah Khan,
	Thomas Gleixner
  Cc: linux-mm, linux-kselftest, linux-kernel, Ryan Roberts

On 08.01.25 07:09, Dev Jain wrote:
> 
> On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
>> During the execution of validate_complete_va_space() a lot of memory is
>> on the VM subsystem. When running on a low memory subsystem an OOM may
>> be triggered, when writing to the dump file as the filesystem may also
>> require memory.
>>
>> On my test system with 1100MiB physical memory:
>>
>> 	Tasks state (memory values in pages):
>> 	[  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
>> 	[     57]     0    57 34359215953      695      256        0       439 1064390656        0             0 virtual_address
>>
>> 	Out of memory: Killed process 57 (virtual_address) total-vm:137436863812kB, anon-rss:1024kB, file-rss:0kB, shmem-rss:1756kB, UID:0 pgtables:1039444kB oom_score_adj:0
>> 	<snip>
>> 	fault_in_iov_iter_readable+0x4a/0xd0
>> 	generic_perform_write+0x9c/0x280
>> 	shmem_file_write_iter+0x86/0x90
>> 	vfs_write+0x29c/0x480
>> 	ksys_write+0x6c/0xe0
>> 	do_syscall_64+0x9e/0x1a0
>> 	entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>
>> Write the dumped data into /dev/null instead which does not require
>> additional memory during write(), making the code simpler as a
>> side-effect.
>>
>> Signed-off-by: Thomas Weißschuh<thomas.weissschuh@linutronix.de>
>> ---
>>   tools/testing/selftests/mm/virtual_address_range.c | 6 ++----
>>   1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
>> index 484f82c7b7c871f82a7d9ec6d6c649f2ab1eb0cd..4042fd878acd702d23da2c3293292de33bd48143 100644
>> --- a/tools/testing/selftests/mm/virtual_address_range.c
>> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>> @@ -103,10 +103,9 @@ static int validate_complete_va_space(void)
>>   	FILE *file;
>>   	int fd;
>>   
>> -	fd = open("va_dump", O_CREAT | O_WRONLY, 0600);
>> -	unlink("va_dump");
>> +	fd = open("/dev/null", O_WRONLY);
>>   	if (fd < 0) {
>> -		ksft_test_result_skip("cannot create or open dump file\n");
>> +		ksft_test_result_skip("cannot create or open /dev/null\n");
>>   		ksft_finished();
>>   	}
 >>   >> @@ -152,7 +151,6 @@ static int validate_complete_va_space(void)
>>   		while (start_addr + hop < end_addr) {
>>   			if (write(fd, (void *)(start_addr + hop), 1) != 1)
>>   				return 1;
>> -			lseek(fd, 0, SEEK_SET);
>>   
>>   			hop += MAP_CHUNK_SIZE;
>>   		}
>>
> 
> The reason I had not used /dev/null was that write() was succeeding to /dev/null
> even from an address not in my VA space. I was puzzled about this behaviour of
> /dev/null and I chose to ignore it and just use a real file.
> 
> To test this behaviour, run the following program:
> 
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <sys/mman.h>
> intmain()
> {
> intfd;
> fd = open("va_dump", O_CREAT| O_WRONLY, 0600);
> unlink("va_dump");
> // fd = open("/dev/null", O_WRONLY);
> intret = munmap((void*)(1UL<< 30), 100);
> if(!ret)
> printf("munmap succeeded\n");
> intres = write(fd, (void*)(1UL<< 30), 1);
> if(res == 1)
> printf("write succeeded\n");
> return0;
> }
> The write will fail as expected, but if you comment out the va_dump
> lines and use /dev/null, the write will succeed.

What exactly do we want to achieve with the write? Verify that the 
output of /proc/self/map is reasonable and we can actually resolve a 
fault / map a page?

Why not access the memory directly+signal handler or using 
/proc/self/mem, so you can avoid the temp file completely?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB
  2025-01-08  8:05     ` Thomas Weißschuh
@ 2025-01-08 13:36       ` David Hildenbrand
  2025-01-08 16:13         ` Thomas Weißschuh
  0 siblings, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2025-01-08 13:36 UTC (permalink / raw)
  To: Thomas Weißschuh, Dev Jain
  Cc: Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm,
	linux-kselftest, linux-kernel, stable, Ryan Roberts

On 08.01.25 09:05, Thomas Weißschuh wrote:
> On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote:
>>
>> On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
>>> If not enough physical memory is available the kernel may fail mmap();
>>> see __vm_enough_memory() and vm_commit_limit().
>>> In that case the logic in validate_complete_va_space() does not make
>>> sense and will even incorrectly fail.
>>> Instead skip the test if no mmap() succeeded.
>>>
>>> Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
>>> Cc: stable@vger.kernel.org

CC stable on tests is ... odd.

>>> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
>>>
>>> ---
>>> The logic in __vm_enough_memory() seems weird.
>>> It describes itself as "Check that a process has enough memory to
>>> allocate a new virtual mapping", however it never checks the current
>>> memory usage of the process.
>>> So it only disallows large mappings. But many small mappings taking the
>>> same amount of memory are allowed; and then even automatically merged
>>> into one big mapping.
>>> ---
>>>    tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++
>>>    1 file changed, 6 insertions(+)
>>>
>>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
>>> index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644
>>> --- a/tools/testing/selftests/mm/virtual_address_range.c
>>> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>>> @@ -178,6 +178,12 @@ int main(int argc, char *argv[])
>>>    		validate_addr(ptr[i], 0);
>>>    	}
>>>    	lchunks = i;
>>> +
>>> +	if (!lchunks) {
>>> +		ksft_test_result_skip("Not enough memory for a single chunk\n");
>>> +		ksft_finished();
>>> +	}
>>> +
>>>    	hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *));
>>>    	if (hptr == NULL) {
>>>    		ksft_test_result_skip("Memory constraint not fulfilled\n");
>>>
>>
>> I do not  know about __vm_enough_memory(), but I am going by your description:
>> You say that the kernel may fail mmap() when enough physical memory is not
>> there, but it may happen that we have already done 100 mmap()'s, and then
>> the kernel fails mmap(), so if (!lchunks) won't be able to handle this case.
>> Basically, lchunks == 0 is not a complete indicator of kernel failing mmap().
> 
> __vm_enough_memory() only checks the size of each single mmap() on its
> own. It does not actually check the current memory or address space
> usage of the process.
> This seems a bit weird, as indicated in my after-the-fold explanation.
> 
>> The basic assumption of the test is that any process should be able to exhaust
>> its virtual address space, and running the test under memory pressure and the
>> kernel violating this behaviour defeats the point of the test I think?
> 
> The assumption is correct, as soon as one mapping succeeds the others
> will also succeed, until the actual address space is exhausted.
> 
> Looking at it again, __vm_enough_memory() is only called for writable
> mappings, so it would be possible to use only readable mappings in the
> test. The test will still fail with OOM, as the many PTEs need more than
> 1GiB of physical memory anyways, but at least that produces a usable
> error message.
> However I'm not sure if this would violate other test assumptions.
> 

Note that with MAP_NORESRVE, most setups we care about will allow 
mapping as much as you want, but on access OOM will fire.

So one could require that /proc/sys/vm/overcommit_memory is setup 
properly and use MAP_NORESRVE.

Reading from anonymous memory will populate the shared zeropage. To 
mitigate OOM from "too many page tables", one could simply unmap the 
pieces as they are verified (or MAP_FIXED over them, to free page tables).

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB
  2025-01-08 13:36       ` David Hildenbrand
@ 2025-01-08 16:13         ` Thomas Weißschuh
  2025-01-08 16:46           ` David Hildenbrand
  2025-01-09  5:40           ` Dev Jain
  0 siblings, 2 replies; 18+ messages in thread
From: Thomas Weißschuh @ 2025-01-08 16:13 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dev Jain, Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm,
	linux-kselftest, linux-kernel, stable, Ryan Roberts

On Wed, Jan 08, 2025 at 02:36:57PM +0100, David Hildenbrand wrote:
> On 08.01.25 09:05, Thomas Weißschuh wrote:
> > On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote:
> > > 
> > > On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
> > > > If not enough physical memory is available the kernel may fail mmap();
> > > > see __vm_enough_memory() and vm_commit_limit().
> > > > In that case the logic in validate_complete_va_space() does not make
> > > > sense and will even incorrectly fail.
> > > > Instead skip the test if no mmap() succeeded.
> > > > 
> > > > Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
> > > > Cc: stable@vger.kernel.org
> 
> CC stable on tests is ... odd.

I thought it was fairly common, but it isn't.
Will drop it.

> > > > Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
> > > > 
> > > > ---
> > > > The logic in __vm_enough_memory() seems weird.
> > > > It describes itself as "Check that a process has enough memory to
> > > > allocate a new virtual mapping", however it never checks the current
> > > > memory usage of the process.
> > > > So it only disallows large mappings. But many small mappings taking the
> > > > same amount of memory are allowed; and then even automatically merged
> > > > into one big mapping.
> > > > ---
> > > >    tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++
> > > >    1 file changed, 6 insertions(+)
> > > > 
> > > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> > > > index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644
> > > > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > > > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > > > @@ -178,6 +178,12 @@ int main(int argc, char *argv[])
> > > >    		validate_addr(ptr[i], 0);
> > > >    	}
> > > >    	lchunks = i;
> > > > +
> > > > +	if (!lchunks) {
> > > > +		ksft_test_result_skip("Not enough memory for a single chunk\n");
> > > > +		ksft_finished();
> > > > +	}
> > > > +
> > > >    	hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *));
> > > >    	if (hptr == NULL) {
> > > >    		ksft_test_result_skip("Memory constraint not fulfilled\n");
> > > > 
> > > 
> > > I do not  know about __vm_enough_memory(), but I am going by your description:
> > > You say that the kernel may fail mmap() when enough physical memory is not
> > > there, but it may happen that we have already done 100 mmap()'s, and then
> > > the kernel fails mmap(), so if (!lchunks) won't be able to handle this case.
> > > Basically, lchunks == 0 is not a complete indicator of kernel failing mmap().
> > 
> > __vm_enough_memory() only checks the size of each single mmap() on its
> > own. It does not actually check the current memory or address space
> > usage of the process.
> > This seems a bit weird, as indicated in my after-the-fold explanation.
> > 
> > > The basic assumption of the test is that any process should be able to exhaust
> > > its virtual address space, and running the test under memory pressure and the
> > > kernel violating this behaviour defeats the point of the test I think?
> > 
> > The assumption is correct, as soon as one mapping succeeds the others
> > will also succeed, until the actual address space is exhausted.
> > 
> > Looking at it again, __vm_enough_memory() is only called for writable
> > mappings, so it would be possible to use only readable mappings in the
> > test. The test will still fail with OOM, as the many PTEs need more than
> > 1GiB of physical memory anyways, but at least that produces a usable
> > error message.
> > However I'm not sure if this would violate other test assumptions.
> > 
> 
> Note that with MAP_NORESRVE, most setups we care about will allow mapping as
> much as you want, but on access OOM will fire.

Thanks for the hint.

> So one could require that /proc/sys/vm/overcommit_memory is setup properly
> and use MAP_NORESRVE.

Isn't the check for lchunks == 0 essentially exactly this?

> Reading from anonymous memory will populate the shared zeropage. To mitigate
> OOM from "too many page tables", one could simply unmap the pieces as they
> are verified (or MAP_FIXED over them, to free page tables).

The code has to figure out if a verified region was created by mmap(),
otherwise an munmap() could crash the process.
As the entries from /proc/self/maps may have been merged and (I assume)
the ordering of mappings is not guaranteed, some bespoke logic to establish
the link will be needed.

Is it fine to rely on CONFIG_ANON_VMA_NAME?
That would make it much easier to implement.

Using MAP_NORESERVE and eager munmap()s, the testcase works nicely even
in very low physical memory conditions.

Thomas


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB
  2025-01-08 16:13         ` Thomas Weißschuh
@ 2025-01-08 16:46           ` David Hildenbrand
  2025-01-09  7:47             ` Thomas Weißschuh
  2025-01-09  5:40           ` Dev Jain
  1 sibling, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2025-01-08 16:46 UTC (permalink / raw)
  To: Thomas Weißschuh
  Cc: Dev Jain, Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm,
	linux-kselftest, linux-kernel, stable, Ryan Roberts

On 08.01.25 17:13, Thomas Weißschuh wrote:
> On Wed, Jan 08, 2025 at 02:36:57PM +0100, David Hildenbrand wrote:
>> On 08.01.25 09:05, Thomas Weißschuh wrote:
>>> On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote:
>>>>
>>>> On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
>>>>> If not enough physical memory is available the kernel may fail mmap();
>>>>> see __vm_enough_memory() and vm_commit_limit().
>>>>> In that case the logic in validate_complete_va_space() does not make
>>>>> sense and will even incorrectly fail.
>>>>> Instead skip the test if no mmap() succeeded.
>>>>>
>>>>> Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
>>>>> Cc: stable@vger.kernel.org
>>
>> CC stable on tests is ... odd.
> 
> I thought it was fairly common, but it isn't.
> Will drop it.

As it's not really a "kernel BUG", it's rather uncommon.

>>
>> Note that with MAP_NORESRVE, most setups we care about will allow mapping as
>> much as you want, but on access OOM will fire.
> 
> Thanks for the hint.
> 
>> So one could require that /proc/sys/vm/overcommit_memory is setup properly
>> and use MAP_NORESRVE.
> 
> Isn't the check for lchunks == 0 essentially exactly this?

I assume paired with MAP_NORESERVE?

Maybe, but it could be better to have something that says "if 
overcommit_memory is not setup properly I will SKIP this test", but 
otherwise I expect this to work and will FAIL if it doesn't".

Or would you expect to run into lchunks == 0 even if overcommit_memory 
is setup properly and MAP_NORESERVE is used? (very very low memory that 
we cannot even create all the VMAs?)

> 
>> Reading from anonymous memory will populate the shared zeropage. To mitigate
>> OOM from "too many page tables", one could simply unmap the pieces as they
>> are verified (or MAP_FIXED over them, to free page tables).
> 
> The code has to figure out if a verified region was created by mmap(),
> otherwise an munmap() could crash the process.
> As the entries from /proc/self/maps may have been merged and (I assume)

Yes, and partial unmap (in chunk granularity?) would split them again.

> the ordering of mappings is not guaranteed, some bespoke logic to establish
> the link will be needed.


My thinking was that you simply process one /proc/self/maps entry in 
some chunks. After processing a chunk, you munmap() it.

So you would process + munmap in chunks.

> 
> Is it fine to rely on CONFIG_ANON_VMA_NAME?
> That would make it much easier to implement.

Can you elaborate how you would do it?

> 
> Using MAP_NORESERVE and eager munmap()s, the testcase works nicely even
> in very low physical memory conditions.

Cool.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null
  2025-01-08 13:30     ` David Hildenbrand
@ 2025-01-09  5:32       ` Dev Jain
  0 siblings, 0 replies; 18+ messages in thread
From: Dev Jain @ 2025-01-09  5:32 UTC (permalink / raw)
  To: David Hildenbrand, Thomas Weißschuh, Andrew Morton,
	Shuah Khan, Thomas Gleixner
  Cc: linux-mm, linux-kselftest, linux-kernel, Ryan Roberts


On 08/01/25 7:00 pm, David Hildenbrand wrote:
> On 08.01.25 07:09, Dev Jain wrote:
>>
>> On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
>>> During the execution of validate_complete_va_space() a lot of memory is
>>> on the VM subsystem. When running on a low memory subsystem an OOM may
>>> be triggered, when writing to the dump file as the filesystem may also
>>> require memory.
>>>
>>> On my test system with 1100MiB physical memory:
>>>
>>>     Tasks state (memory values in pages):
>>>     [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file 
>>> rss_shmem pgtables_bytes swapents oom_score_adj name
>>>     [     57]     0    57 34359215953      695      256 0       439 
>>> 1064390656        0             0 virtual_address
>>>
>>>     Out of memory: Killed process 57 (virtual_address) 
>>> total-vm:137436863812kB, anon-rss:1024kB, file-rss:0kB, 
>>> shmem-rss:1756kB, UID:0 pgtables:1039444kB oom_score_adj:0
>>>     <snip>
>>>     fault_in_iov_iter_readable+0x4a/0xd0
>>>     generic_perform_write+0x9c/0x280
>>>     shmem_file_write_iter+0x86/0x90
>>>     vfs_write+0x29c/0x480
>>>     ksys_write+0x6c/0xe0
>>>     do_syscall_64+0x9e/0x1a0
>>>     entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>>
>>> Write the dumped data into /dev/null instead which does not require
>>> additional memory during write(), making the code simpler as a
>>> side-effect.
>>>
>>> Signed-off-by: Thomas Weißschuh<thomas.weissschuh@linutronix.de>
>>> ---
>>>   tools/testing/selftests/mm/virtual_address_range.c | 6 ++----
>>>   1 file changed, 2 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c 
>>> b/tools/testing/selftests/mm/virtual_address_range.c
>>> index 
>>> 484f82c7b7c871f82a7d9ec6d6c649f2ab1eb0cd..4042fd878acd702d23da2c3293292de33bd48143 
>>> 100644
>>> --- a/tools/testing/selftests/mm/virtual_address_range.c
>>> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>>> @@ -103,10 +103,9 @@ static int validate_complete_va_space(void)
>>>       FILE *file;
>>>       int fd;
>>>   -    fd = open("va_dump", O_CREAT | O_WRONLY, 0600);
>>> -    unlink("va_dump");
>>> +    fd = open("/dev/null", O_WRONLY);
>>>       if (fd < 0) {
>>> -        ksft_test_result_skip("cannot create or open dump file\n");
>>> +        ksft_test_result_skip("cannot create or open /dev/null\n");
>>>           ksft_finished();
>>>       }
> >>   >> @@ -152,7 +151,6 @@ static int validate_complete_va_space(void)
>>>           while (start_addr + hop < end_addr) {
>>>               if (write(fd, (void *)(start_addr + hop), 1) != 1)
>>>                   return 1;
>>> -            lseek(fd, 0, SEEK_SET);
>>>                 hop += MAP_CHUNK_SIZE;
>>>           }
>>>
>>
>> The reason I had not used /dev/null was that write() was succeeding 
>> to /dev/null
>> even from an address not in my VA space. I was puzzled about this 
>> behaviour of
>> /dev/null and I chose to ignore it and just use a real file.
>>
>> To test this behaviour, run the following program:
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <unistd.h>
>> #include <fcntl.h>
>> #include <sys/mman.h>
>> intmain()
>> {
>> intfd;
>> fd = open("va_dump", O_CREAT| O_WRONLY, 0600);
>> unlink("va_dump");
>> // fd = open("/dev/null", O_WRONLY);
>> intret = munmap((void*)(1UL<< 30), 100);
>> if(!ret)
>> printf("munmap succeeded\n");
>> intres = write(fd, (void*)(1UL<< 30), 1);
>> if(res == 1)
>> printf("write succeeded\n");
>> return0;
>> }
>> The write will fail as expected, but if you comment out the va_dump
>> lines and use /dev/null, the write will succeed.
>
> What exactly do we want to achieve with the write? Verify that the 
> output of /proc/self/map is reasonable and we can actually resolve a 
> fault / map a page?
>
> Why not access the memory directly+signal handler or using 
> /proc/self/mem, so you can avoid the temp file completely?
>

We want to determine whether an address belongs to our address space. 
The proper way to do that is
to access the memory, get a segfault and jump to signal handler. I 
wanted to avoid this code churn,
so chose to use write() so that I can validate the address without 
getting a segfault.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB
  2025-01-08 16:13         ` Thomas Weißschuh
  2025-01-08 16:46           ` David Hildenbrand
@ 2025-01-09  5:40           ` Dev Jain
  1 sibling, 0 replies; 18+ messages in thread
From: Dev Jain @ 2025-01-09  5:40 UTC (permalink / raw)
  To: Thomas Weißschuh, David Hildenbrand
  Cc: Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm,
	linux-kselftest, linux-kernel, stable, Ryan Roberts


On 08/01/25 9:43 pm, Thomas Weißschuh wrote:
> On Wed, Jan 08, 2025 at 02:36:57PM +0100, David Hildenbrand wrote:
>> On 08.01.25 09:05, Thomas Weißschuh wrote:
>>> On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote:
>>>> On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
>>>>> If not enough physical memory is available the kernel may fail mmap();
>>>>> see __vm_enough_memory() and vm_commit_limit().
>>>>> In that case the logic in validate_complete_va_space() does not make
>>>>> sense and will even incorrectly fail.
>>>>> Instead skip the test if no mmap() succeeded.
>>>>>
>>>>> Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
>>>>> Cc: stable@vger.kernel.org
>> CC stable on tests is ... odd.
> I thought it was fairly common, but it isn't.
> Will drop it.

Oh, well...
https://lore.kernel.org/all/20240521074358.675031-4-dev.jain@arm.com/
I have done that before :) although the change I was making was fixing a
fundamental flaw in the test and your change is fixing the test for a
specific case (memory pressure), so I tend to concur with David.

>
>>>>> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
>>>>>
>>>>> ---
>>>>> The logic in __vm_enough_memory() seems weird.
>>>>> It describes itself as "Check that a process has enough memory to
>>>>> allocate a new virtual mapping", however it never checks the current
>>>>> memory usage of the process.
>>>>> So it only disallows large mappings. But many small mappings taking the
>>>>> same amount of memory are allowed; and then even automatically merged
>>>>> into one big mapping.
>>>>> ---
>>>>>     tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++
>>>>>     1 file changed, 6 insertions(+)
>>>>>
>>>>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
>>>>> index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644
>>>>> --- a/tools/testing/selftests/mm/virtual_address_range.c
>>>>> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>>>>> @@ -178,6 +178,12 @@ int main(int argc, char *argv[])
>>>>>     		validate_addr(ptr[i], 0);
>>>>>     	}
>>>>>     	lchunks = i;
>>>>> +
>>>>> +	if (!lchunks) {
>>>>> +		ksft_test_result_skip("Not enough memory for a single chunk\n");
>>>>> +		ksft_finished();
>>>>> +	}
>>>>> +
>>>>>     	hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *));
>>>>>     	if (hptr == NULL) {
>>>>>     		ksft_test_result_skip("Memory constraint not fulfilled\n");
>>>>>
>>>> I do not  know about __vm_enough_memory(), but I am going by your description:
>>>> You say that the kernel may fail mmap() when enough physical memory is not
>>>> there, but it may happen that we have already done 100 mmap()'s, and then
>>>> the kernel fails mmap(), so if (!lchunks) won't be able to handle this case.
>>>> Basically, lchunks == 0 is not a complete indicator of kernel failing mmap().
>>> __vm_enough_memory() only checks the size of each single mmap() on its
>>> own. It does not actually check the current memory or address space
>>> usage of the process.
>>> This seems a bit weird, as indicated in my after-the-fold explanation.
>>>
>>>> The basic assumption of the test is that any process should be able to exhaust
>>>> its virtual address space, and running the test under memory pressure and the
>>>> kernel violating this behaviour defeats the point of the test I think?
>>> The assumption is correct, as soon as one mapping succeeds the others
>>> will also succeed, until the actual address space is exhausted.
>>>
>>> Looking at it again, __vm_enough_memory() is only called for writable
>>> mappings, so it would be possible to use only readable mappings in the
>>> test. The test will still fail with OOM, as the many PTEs need more than
>>> 1GiB of physical memory anyways, but at least that produces a usable
>>> error message.
>>> However I'm not sure if this would violate other test assumptions.
>>>
>> Note that with MAP_NORESRVE, most setups we care about will allow mapping as
>> much as you want, but on access OOM will fire.
> Thanks for the hint.
>
>> So one could require that /proc/sys/vm/overcommit_memory is setup properly
>> and use MAP_NORESRVE.
> Isn't the check for lchunks == 0 essentially exactly this?
>
>> Reading from anonymous memory will populate the shared zeropage. To mitigate
>> OOM from "too many page tables", one could simply unmap the pieces as they
>> are verified (or MAP_FIXED over them, to free page tables).
> The code has to figure out if a verified region was created by mmap(),
> otherwise an munmap() could crash the process.
> As the entries from /proc/self/maps may have been merged and (I assume)
> the ordering of mappings is not guaranteed, some bespoke logic to establish
> the link will be needed.
>
> Is it fine to rely on CONFIG_ANON_VMA_NAME?
> That would make it much easier to implement.
>
> Using MAP_NORESERVE and eager munmap()s, the testcase works nicely even
> in very low physical memory conditions.
>
> Thomas


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB
  2025-01-08 16:46           ` David Hildenbrand
@ 2025-01-09  7:47             ` Thomas Weißschuh
  2025-01-09 13:05               ` David Hildenbrand
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Weißschuh @ 2025-01-09  7:47 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dev Jain, Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm,
	linux-kselftest, linux-kernel, stable, Ryan Roberts

On Wed, Jan 08, 2025 at 05:46:37PM +0100, David Hildenbrand wrote:
> On 08.01.25 17:13, Thomas Weißschuh wrote:
> > On Wed, Jan 08, 2025 at 02:36:57PM +0100, David Hildenbrand wrote:
> > > On 08.01.25 09:05, Thomas Weißschuh wrote:
> > > > On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote:
> > > > > 
> > > > > On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
> > > > > > If not enough physical memory is available the kernel may fail mmap();
> > > > > > see __vm_enough_memory() and vm_commit_limit().
> > > > > > In that case the logic in validate_complete_va_space() does not make
> > > > > > sense and will even incorrectly fail.
> > > > > > Instead skip the test if no mmap() succeeded.
> > > > > > 
> > > > > > Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
> > > > > > Cc: stable@vger.kernel.org
> > > 
> > > CC stable on tests is ... odd.
> > 
> > I thought it was fairly common, but it isn't.
> > Will drop it.
> 
> As it's not really a "kernel BUG", it's rather uncommon.

I also used it on patch 2, which is now reproducibly broken on x86
mainline since my commit mentioned in that patch.
But I'll drop it there, too.

> > > Note that with MAP_NORESRVE, most setups we care about will allow mapping as
> > > much as you want, but on access OOM will fire.
> > 
> > Thanks for the hint.
> > 
> > > So one could require that /proc/sys/vm/overcommit_memory is setup properly
> > > and use MAP_NORESRVE.
> > 
> > Isn't the check for lchunks == 0 essentially exactly this?
> 
> I assume paired with MAP_NORESERVE?

Yes.

> Maybe, but it could be better to have something that says "if
> overcommit_memory is not setup properly I will SKIP this test", but
> otherwise I expect this to work and will FAIL if it doesn't".

Ok, I'll validate the sysctl value.

> Or would you expect to run into lchunks == 0 even if overcommit_memory is
> setup properly and MAP_NORESERVE is used? (very very low memory that we
> cannot even create all the VMAs?)

No.

> > > Reading from anonymous memory will populate the shared zeropage. To mitigate
> > > OOM from "too many page tables", one could simply unmap the pieces as they
> > > are verified (or MAP_FIXED over them, to free page tables).
> > 
> > The code has to figure out if a verified region was created by mmap(),
> > otherwise an munmap() could crash the process.
> > As the entries from /proc/self/maps may have been merged and (I assume)
> 
> Yes, and partial unmap (in chunk granularity?) would split them again.
> 
> > the ordering of mappings is not guaranteed, some bespoke logic to establish
> > the link will be needed.
> 
> My thinking was that you simply process one /proc/self/maps entry in some
> chunks. After processing a chunk, you munmap() it.
> 
> So you would process + munmap in chunks.

That is clear. The issue would be to figure which chunks are valid to
unmap. If something critical like the executable file is unmapped,
the process crashes. But see below.

> > Is it fine to rely on CONFIG_ANON_VMA_NAME?
> > That would make it much easier to implement.
> 
> Can you elaborate how you would do it?

First set the VMA name after mmap():

for (i = 0; i < NR_CHUNKS_LOW; i++) {
	ptr[i] = mmap(NULL, MAP_CHUNK_SIZE, PROT_READ | PROT_WRITE,
		     MAP_NORESERVE | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

	if (ptr[i] == MAP_FAILED) {
		if (validate_lower_address_hint())
			ksft_exit_fail_msg("mmap unexpectedly succeeded with hint\n");
		break;
	}

	validate_addr(ptr[i], 0);
	if (prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ptr[i], MAP_CHUNK_SIZE, "virtual_address_range"))
		ksft_exit_fail_msg("prctl(PR_SET_VMA_ANON_NAME) failed: %s\n", strerror(errno));
}

During validation:

hop = 0;
while (start_addr + hop < end_addr) {
	if (write(fd, (void *)(start_addr + hop), 1) != 1)
		return 1;
	lseek(fd, 0, SEEK_SET);

	if (!strncmp(line + path_offset, "[anon:virtual_address_range]", 28))
		munmap((char *)(start_addr + hop), MAP_CHUNK_SIZE);

	hop += MAP_CHUNK_SIZE;

}

It is done for each chunk, as all chunks may have been merged into a
single VMA and a per-VMA unmap would not happen before OOM.

> > Using MAP_NORESERVE and eager munmap()s, the testcase works nicely even
> > in very low physical memory conditions.
> 
> Cool.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB
  2025-01-09  7:47             ` Thomas Weißschuh
@ 2025-01-09 13:05               ` David Hildenbrand
  2025-01-09 13:19                 ` David Hildenbrand
  2025-01-09 13:38                 ` Thomas Weißschuh
  0 siblings, 2 replies; 18+ messages in thread
From: David Hildenbrand @ 2025-01-09 13:05 UTC (permalink / raw)
  To: Thomas Weißschuh
  Cc: Dev Jain, Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm,
	linux-kselftest, linux-kernel, stable, Ryan Roberts

 >
> That is clear. The issue would be to figure which chunks are valid to
> unmap. If something critical like the executable file is unmapped,
> the process crashes. But see below.

Ah, now I see what you mean. Yes, also the stack etc. will be 
problematic. So IIUC, you want to limit the munmap optimization only to 
the manually mmap()ed parts.

> 
>>> Is it fine to rely on CONFIG_ANON_VMA_NAME?
>>> That would make it much easier to implement.
>>
>> Can you elaborate how you would do it?
> 
> First set the VMA name after mmap():
> 
> for (i = 0; i < NR_CHUNKS_LOW; i++) {
> 	ptr[i] = mmap(NULL, MAP_CHUNK_SIZE, PROT_READ | PROT_WRITE,
> 		     MAP_NORESERVE | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> 
> 	if (ptr[i] == MAP_FAILED) {
> 		if (validate_lower_address_hint())
> 			ksft_exit_fail_msg("mmap unexpectedly succeeded with hint\n");
> 		break;
> 	}
> 
> 	validate_addr(ptr[i], 0);
> 	if (prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ptr[i], MAP_CHUNK_SIZE, "virtual_address_range"))
> 		ksft_exit_fail_msg("prctl(PR_SET_VMA_ANON_NAME) failed: %s\n", strerror(errno));

Likely this would prevent merging of VMAs.

With a 1 GiB chunk size, and NR_CHUNKS_LOW == 128TiB, you'd already 
require 128k VMAs. The default limit is frequently 64k.

We could just scan the ptr / hptr array to see if this is a manual mmap 
area or not. If this takes too long, one could sort the arrays by 
address and perform a binary search.

Not the most efficient way of doing it, but maybe good enough for this test?

Alternatively, store the pointer in a xarray-like tree instead of two 
arrays. Requires a bit more memory ... and we'd have to find a simple 
implementation we could just reuse in this test. So maybe there is a 
simpler way to get it done.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB
  2025-01-09 13:05               ` David Hildenbrand
@ 2025-01-09 13:19                 ` David Hildenbrand
  2025-01-09 13:38                 ` Thomas Weißschuh
  1 sibling, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2025-01-09 13:19 UTC (permalink / raw)
  To: Thomas Weißschuh
  Cc: Dev Jain, Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm,
	linux-kselftest, linux-kernel, stable, Ryan Roberts

On 09.01.25 14:05, David Hildenbrand wrote:
>   >
>> That is clear. The issue would be to figure which chunks are valid to
>> unmap. If something critical like the executable file is unmapped,
>> the process crashes. But see below.
> 
> Ah, now I see what you mean. Yes, also the stack etc. will be
> problematic. So IIUC, you want to limit the munmap optimization only to
> the manually mmap()ed parts.
> 
>>
>>>> Is it fine to rely on CONFIG_ANON_VMA_NAME?
>>>> That would make it much easier to implement.
>>>
>>> Can you elaborate how you would do it?
>>
>> First set the VMA name after mmap():

I took a look at the implementation, and VMA merging seems to be able to 
merge such VMAs that share the same name (even when set separately).

So assuming you use the same name for all, that should indeed also work.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB
  2025-01-09 13:05               ` David Hildenbrand
  2025-01-09 13:19                 ` David Hildenbrand
@ 2025-01-09 13:38                 ` Thomas Weißschuh
  1 sibling, 0 replies; 18+ messages in thread
From: Thomas Weißschuh @ 2025-01-09 13:38 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dev Jain, Andrew Morton, Shuah Khan, Thomas Gleixner, linux-mm,
	linux-kselftest, linux-kernel, stable, Ryan Roberts

On Thu, Jan 09, 2025 at 02:05:43PM +0100, David Hildenbrand wrote:
> >
> > That is clear. The issue would be to figure which chunks are valid to
> > unmap. If something critical like the executable file is unmapped,
> > the process crashes. But see below.
> 
> Ah, now I see what you mean. Yes, also the stack etc. will be problematic.
> So IIUC, you want to limit the munmap optimization only to the manually
> mmap()ed parts.

Correct.

> > > > Is it fine to rely on CONFIG_ANON_VMA_NAME?
> > > > That would make it much easier to implement.
> > > 
> > > Can you elaborate how you would do it?
> > 
> > First set the VMA name after mmap():
> > 
> > for (i = 0; i < NR_CHUNKS_LOW; i++) {
> > 	ptr[i] = mmap(NULL, MAP_CHUNK_SIZE, PROT_READ | PROT_WRITE,
> > 		     MAP_NORESERVE | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > 
> > 	if (ptr[i] == MAP_FAILED) {
> > 		if (validate_lower_address_hint())
> > 			ksft_exit_fail_msg("mmap unexpectedly succeeded with hint\n");
> > 		break;
> > 	}
> > 
> > 	validate_addr(ptr[i], 0);
> > 	if (prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ptr[i], MAP_CHUNK_SIZE, "virtual_address_range"))
> > 		ksft_exit_fail_msg("prctl(PR_SET_VMA_ANON_NAME) failed: %s\n", strerror(errno));
> 
> Likely this would prevent merging of VMAs.
>
> With a 1 GiB chunk size, and NR_CHUNKS_LOW == 128TiB, you'd already require
> 128k VMAs. The default limit is frequently 64k.

They are merged for me, as they all share the same name.

PR_SET_VMA(2const) even mentions merging:

	Note that assigning an attribute to a virtual memory area might
	prevent it from being merged with adjacent virtual memory areas
	due to the difference in that attribute's value.

is_mergeable_vma() has an explicit check using anon_vma_name_eq().

> We could just scan the ptr / hptr array to see if this is a manual mmap area
> or not. If this takes too long, one could sort the arrays by address and
> perform a binary search.
>
> Not the most efficient way of doing it, but maybe good enough for this test?

A naive loop is what I tried first, but it took forever.

> Alternatively, store the pointer in a xarray-like tree instead of two
> arrays. Requires a bit more memory ... and we'd have to find a simple
> implementation we could just reuse in this test. So maybe there is a simpler
> way to get it done.

IMO the prctl() is that simpler way.
The only real drawback is the dependency on CONFIG_ANON_VMA_NAME.
We can add an entry to tools/testing/selftests/mm/config for it.


Thomas


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-01-09 13:38 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-07 15:14 [PATCH 0/3] selftests/mm: virtual_address_range: Two bugfixes and a cleanup Thomas Weißschuh
2025-01-07 15:14 ` [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB Thomas Weißschuh
2025-01-08  6:16   ` Dev Jain
2025-01-08  8:05     ` Thomas Weißschuh
2025-01-08 13:36       ` David Hildenbrand
2025-01-08 16:13         ` Thomas Weißschuh
2025-01-08 16:46           ` David Hildenbrand
2025-01-09  7:47             ` Thomas Weißschuh
2025-01-09 13:05               ` David Hildenbrand
2025-01-09 13:19                 ` David Hildenbrand
2025-01-09 13:38                 ` Thomas Weißschuh
2025-01-09  5:40           ` Dev Jain
2025-01-07 15:14 ` [PATCH 2/3] selftests/mm: virtual_address_range: Avoid reading VVAR mappings Thomas Weißschuh
2025-01-07 15:14 ` [PATCH 3/3] selftests/mm: virtual_address_range: Dump to /dev/null Thomas Weißschuh
2025-01-08  6:09   ` Dev Jain
2025-01-08  7:38     ` Thomas Weißschuh
2025-01-08 13:30     ` David Hildenbrand
2025-01-09  5:32       ` Dev Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox