On 23/06/25 11:02 pm, Donet Tom wrote: > On Mon, Jun 23, 2025 at 10:23:02AM +0530, Dev Jain wrote: >> On 21/06/25 11:25 pm, Donet Tom wrote: >>> On Fri, Jun 20, 2025 at 08:15:25PM +0530, Dev Jain wrote: >>>> On 19/06/25 1:53 pm, Donet Tom wrote: >>>>> On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote: >>>>>> On 18/06/25 8:05 pm, Lorenzo Stoakes wrote: >>>>>>> On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote: >>>>>>>> On 18/06/25 7:37 pm, Lorenzo Stoakes wrote: >>>>>>>>> On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote: >>>>>>>>>> On 18/06/25 5:27 pm, Lorenzo Stoakes wrote: >>>>>>>>>>> On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote: >>>>>>>>>>> Are you accounting for sys.max_map_count? If not, then you'll be hitting that >>>>>>>>>>> first. >>>>>>>>>> run_vmtests.sh will run the test in overcommit mode so that won't be an issue. >>>>>>>>> Umm, what? You mean overcommit all mode, and that has no bearing on the max >>>>>>>>> mapping count check. >>>>>>>>> >>>>>>>>> In do_mmap(): >>>>>>>>> >>>>>>>>> /* Too many mappings? */ >>>>>>>>> if (mm->map_count > sysctl_max_map_count) >>>>>>>>> return -ENOMEM; >>>>>>>>> >>>>>>>>> >>>>>>>>> As well as numerous other checks in mm/vma.c. >>>>>>>> Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding >>>>>>>> this. >>>>>>> No problem! It's hard to be aware of everything in mm :) >>>>>>> >>>>>>>>> I'm not sure why an overcommit toggle is even necessary when you could use >>>>>>>>> MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits? >>>>>>>>> >>>>>>>>> I'm pretty confused as to what this test is really achieving honestly. This >>>>>>>>> isn't a useful way of asserting mmap() behaviour as far as I can tell. >>>>>>>> Well, seems like a useful way to me at least : ) Not sure if you are in the mood >>>>>>>> to discuss that but if you'd like me to explain from start to end what the test >>>>>>>> is doing, I can do that : ) >>>>>>>> >>>>>>> I just don't have time right now, I guess I'll have to come back to it >>>>>>> later... it's not the end of the world for it to be iffy in my view as long as >>>>>>> it passes, but it might just not be of great value. >>>>>>> >>>>>>> Philosophically I'd rather we didn't assert internal implementation details like >>>>>>> where we place mappings in userland memory. At no point do we promise to not >>>>>>> leave larger gaps if we feel like it :) >>>>>> You have a fair point. Anyhow a debate for another day. >>>>>> >>>>>>> I'm guessing, reading more, the _real_ test here is some mathematical assertion >>>>>>> about layout from HIGH_ADDR_SHIFT -> end of address space when using hints. >>>>>>> >>>>>>> But again I'm not sure that achieves much and again also is asserting internal >>>>>>> implementation details. >>>>>>> >>>>>>> Correct behaviour of this kind of thing probably better belongs to tests in the >>>>>>> userland VMA testing I'd say. >>>>>>> >>>>>>> Sorry I don't mean to do down work you've done before, just giving an honest >>>>>>> technical appraisal! >>>>>> Nah, it will be rather hilarious to see it all go down the drain xD >>>>>> >>>>>>> Anyway don't let this block work to fix the test if it's failing. We can revisit >>>>>>> this later. >>>>>> Sure. @Aboorva and Donet, I still believe that the correct approach is to elide >>>>>> the gap check at the crossing boundary. What do you think? >>>>>> >>>>> One problem I am seeing with this approach is that, since the hint address >>>>> is generated randomly, the VMAs are also being created at randomly based on >>>>> the hint address.So, for the VMAs created at high addresses, we cannot guarantee >>>>> that the gaps between them will be aligned to MAP_CHUNK_SIZE. >>>>> >>>>> High address VMAs >>>>> ----------------- >>>>> 1000000000000-1000040000000 r--p 00000000 00:00 0 >>>>> 2000000000000-2000040000000 r--p 00000000 00:00 0 >>>>> 4000000000000-4000040000000 r--p 00000000 00:00 0 >>>>> 8000000000000-8000040000000 r--p 00000000 00:00 0 >>>>> e80009d260000-fffff9d260000 r--p 00000000 00:00 0 >>>>> >>>>> I have a different approach to solve this issue. >>>> It is really weird that such a large amount of VA space >>>> is left between the two VMAs yet mmap is failing. >>>> >>>> >>>> >>>> Can you please do the following: >>>> set /proc/sys/vm/max_map_count to the highest value possible. >>>> If running without run_vmtests.sh, set /proc/sys/vm/overcommit_memory to 1. >>>> In validate_complete_va_space: >>>> >>>> if (start_addr >= HIGH_ADDR_MARK && found == false) { >>>> found = true; >>>> continue; >>>> } >>> Thanks Dev for the suggestion. I set max_map_count and set overcommit >>> memory to 1, added this code change as well, and then tried. Still, the >>> test is failing >>> >>>> where found is initialized to false. This will skip the check >>>> for the boundary. >>>> >>>> After this can you tell whether the test is still failing. >>>> >>>> Also can you give me the complete output of proc/pid/maps >>>> after putting a sleep at the end of the test. >>>> >>> on powerpc support DEFAULT_MAP_WINDOW is 128TB and with >>> total address space size is 4PB With hint it can map upto >>> 4PB. Since the hint addres is random in this test random hing VMAs >>> are getting created. IIUC this is expected only. >>> >>> >>> 10000000-10010000 r-xp 00000000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range >>> 10010000-10020000 r--p 00000000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range >>> 10020000-10030000 rw-p 00010000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range >>> 30000000-10030000000 r--p 00000000 00:00 0 [anon:virtual_address_range] >>> 10030770000-100307a0000 rw-p 00000000 00:00 0 [heap] >>> 1004f000000-7fff8f000000 r--p 00000000 00:00 0 [anon:virtual_address_range] >>> 7fff8faf0000-7fff8fe00000 rw-p 00000000 00:00 0 >>> 7fff8fe00000-7fff90030000 r-xp 00000000 fd:00 792355 /usr/lib64/libc.so.6 >>> 7fff90030000-7fff90040000 r--p 00230000 fd:00 792355 /usr/lib64/libc.so.6 >>> 7fff90040000-7fff90050000 rw-p 00240000 fd:00 792355 /usr/lib64/libc.so.6 >>> 7fff90050000-7fff90130000 r-xp 00000000 fd:00 792358 /usr/lib64/libm.so.6 >>> 7fff90130000-7fff90140000 r--p 000d0000 fd:00 792358 /usr/lib64/libm.so.6 >>> 7fff90140000-7fff90150000 rw-p 000e0000 fd:00 792358 /usr/lib64/libm.so.6 >>> 7fff90160000-7fff901a0000 r--p 00000000 00:00 0 [vvar] >>> 7fff901a0000-7fff901b0000 r-xp 00000000 00:00 0 [vdso] >>> 7fff901b0000-7fff90200000 r-xp 00000000 fd:00 792351 /usr/lib64/ld64.so.2 >>> 7fff90200000-7fff90210000 r--p 00040000 fd:00 792351 /usr/lib64/ld64.so.2 >>> 7fff90210000-7fff90220000 rw-p 00050000 fd:00 792351 /usr/lib64/ld64.so.2 >>> 7fffc9770000-7fffc9880000 rw-p 00000000 00:00 0 [stack] >>> 1000000000000-1000040000000 r--p 00000000 00:00 0 [anon:virtual_address_range] >>> 2000000000000-2000040000000 r--p 00000000 00:00 0 [anon:virtual_address_range] >>> 4000000000000-4000040000000 r--p 00000000 00:00 0 [anon:virtual_address_range] >>> 8000000000000-8000040000000 r--p 00000000 00:00 0 [anon:virtual_address_range] >>> eb95410220000-fffff90220000 r--p 00000000 00:00 0 [anon:virtual_address_range] >>> >>> >>> >>> >>> If I give the hint address serially from 128TB then the address >>> space is contigous and gap is also MAP_SIZE, the test is passing. >>> >>> 10000000-10010000 r-xp 00000000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range >>> 10010000-10020000 r--p 00000000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range >>> 10020000-10030000 rw-p 00010000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range >>> 33000000-10033000000 r--p 00000000 00:00 0 [anon:virtual_address_range] >>> 10033380000-100333b0000 rw-p 00000000 00:00 0 [heap] >>> 1006f0f0000-10071000000 rw-p 00000000 00:00 0 >>> 10071000000-7fffb1000000 r--p 00000000 00:00 0 [anon:virtual_address_range] >>> 7fffb15d0000-7fffb1800000 r-xp 00000000 fd:00 792355 /usr/lib64/libc.so.6 >>> 7fffb1800000-7fffb1810000 r--p 00230000 fd:00 792355 /usr/lib64/libc.so.6 >>> 7fffb1810000-7fffb1820000 rw-p 00240000 fd:00 792355 /usr/lib64/libc.so.6 >>> 7fffb1820000-7fffb1900000 r-xp 00000000 fd:00 792358 /usr/lib64/libm.so.6 >>> 7fffb1900000-7fffb1910000 r--p 000d0000 fd:00 792358 /usr/lib64/libm.so.6 >>> 7fffb1910000-7fffb1920000 rw-p 000e0000 fd:00 792358 /usr/lib64/libm.so.6 >>> 7fffb1930000-7fffb1970000 r--p 00000000 00:00 0 [vvar] >>> 7fffb1970000-7fffb1980000 r-xp 00000000 00:00 0 [vdso] >>> 7fffb1980000-7fffb19d0000 r-xp 00000000 fd:00 792351 /usr/lib64/ld64.so.2 >>> 7fffb19d0000-7fffb19e0000 r--p 00040000 fd:00 792351 /usr/lib64/ld64.so.2 >>> 7fffb19e0000-7fffb19f0000 rw-p 00050000 fd:00 792351 /usr/lib64/ld64.so.2 >>> 7fffc5470000-7fffc5580000 rw-p 00000000 00:00 0 [stack] >>> 800000000000-2aab000000000 r--p 00000000 00:00 0 [anon:virtual_address_range] >>> >>> >> Thank you for this output. I can't wrap my head around why this behaviour changes >> when you generate the hint sequentially. The mmap() syscall is supposed to do the >> following (irrespective of high VA space or not) - if the allocation at the hint > Yes, it is working as expected. On PowerPC, the DEFAULT_MAP_WINDOW is > 128TB, and the system can map up to 4PB. > > In the test, the first mmap call maps memory up to 128TB without any > hint, so the VMAs are created below the 128TB boundary. > > In the second mmap call, we provide a hint starting from 256TB, and > the hint address is generated randomly above 256TB. The mappings are > correctly created at these hint addresses. Since the hint addresses > are random, the resulting VMAs are also created at random locations. > > So, what I tried is: mapping from 0 to 128TB without any hint, and > then for the second mmap, instead of starting the hint from 256TB, I > started from 128TB. Instead of using random hint addresses, I used > sequential hint addresses from 128TB up to 512TB. With this change, > the VMAs are created in order, and the test passes. > > 800000000000-2aab000000000 r--p 00000000 00:00 0 128TB to 512TB VMA > > I think we will see same behaviour on x86 with X86_FEATURE_LA57. > > I will send the updated patch in V2. Since you say it fails on both radix and hash, it means that the generic code path is failing. I see that on my system, when I run the test with LPA2 config, write() fails with errno set to -ENOMEM. Can you apply the following diff and check whether the test fails still. Doing this fixed it for arm64. diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index b380e102b22f..3032902d01f2 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -173,10 +173,6 @@ static int validate_complete_va_space(void) */ hop = 0; while (start_addr + hop < end_addr) { - if (write(fd, (void *)(start_addr + hop), 1) != 1) - return 1; - lseek(fd, 0, SEEK_SET); - if (is_marked_vma(vma_name)) munmap((char *)(start_addr + hop), MAP_CHUNK_SIZE); > >> addr succeeds, then all is well, otherwise, do a top-down search for a large >> enough gap. I am not aware of the nuances in powerpc but I really am suspecting >> a bug in powerpc mmap code. Can you try to do some tracing - which function >> eventually fails to find the empty gap? >> >> Through my limited code tracing - we should end up in slice_find_area_topdown, >> then we ask the generic code to find the gap using vm_unmapped_area. So I >> suspect something is happening between this, probably slice_scan_available(). >> >>>>> From 0 to 128TB, we map memory directly without using any hint. For the range above >>>>> 256TB up to 512TB, we perform the mapping using hint addresses. In the current test, >>>>> we use random hint addresses, but I have modified it to generate hint addresses linearly >>>>> starting from 128TB. >>>>> >>>>> With this change: >>>>> >>>>> The 0–128TB range is mapped without hints and verified accordingly. >>>>> >>>>> The 128TB–512TB range is mapped using linear hint addresses and then verified. >>>>> >>>>> Below are the VMAs obtained with this approach: >>>>> >>>>> 10000000-10010000 r-xp 00000000 fd:05 135019531 >>>>> 10010000-10020000 r--p 00000000 fd:05 135019531 >>>>> 10020000-10030000 rw-p 00010000 fd:05 135019531 >>>>> 20000000-10020000000 r--p 00000000 00:00 0 >>>>> 10020800000-10020830000 rw-p 00000000 00:00 0 >>>>> 1004bcf0000-1004c000000 rw-p 00000000 00:00 0 >>>>> 1004c000000-7fff8c000000 r--p 00000000 00:00 0 >>>>> 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355 >>>>> 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355 >>>>> 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355 >>>>> 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358 >>>>> 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358 >>>>> 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358 >>>>> 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0 >>>>> 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0 >>>>> 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351 >>>>> 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351 >>>>> 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351 >>>>> 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0 >>>>> 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0 >>>>> 800000000000-2000000000000 r--p 00000000 00:00 0 -> High Address (128TB to 512TB) >>>>> >>>>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c >>>>> index 4c4c35eac15e..0be008cba4b0 100644 >>>>> --- a/tools/testing/selftests/mm/virtual_address_range.c >>>>> +++ b/tools/testing/selftests/mm/virtual_address_range.c >>>>> @@ -56,21 +56,21 @@ >>>>> #ifdef __aarch64__ >>>>> #define HIGH_ADDR_MARK ADDR_MARK_256TB >>>>> -#define HIGH_ADDR_SHIFT 49 >>>>> +#define HIGH_ADDR_SHIFT 48 >>>>> #define NR_CHUNKS_LOW NR_CHUNKS_256TB >>>>> #define NR_CHUNKS_HIGH NR_CHUNKS_3840TB >>>>> #else >>>>> #define HIGH_ADDR_MARK ADDR_MARK_128TB >>>>> -#define HIGH_ADDR_SHIFT 48 >>>>> +#define HIGH_ADDR_SHIFT 47 >>>>> #define NR_CHUNKS_LOW NR_CHUNKS_128TB >>>>> #define NR_CHUNKS_HIGH NR_CHUNKS_384TB >>>>> #endif >>>>> -static char *hint_addr(void) >>>>> +static char *hint_addr(int hint) >>>>> { >>>>> - int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT); >>>>> + unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE)); >>>>> - return (char *) (1UL << bits); >>>>> + return (char *) (addr); >>>>> } >>>>> static void validate_addr(char *ptr, int high_addr) >>>>> @@ -217,7 +217,7 @@ int main(int argc, char *argv[]) >>>>> } >>>>> for (i = 0; i < NR_CHUNKS_HIGH; i++) { >>>>> - hint = hint_addr(); >>>>> + hint = hint_addr(i); >>>>> hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ, >>>>> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); >>>>> >>>>> >>>>> >>>>> Can we fix it this way?