linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Gavin Shan <gshan@redhat.com>
To: Avi Kivity <avi@scylladb.com>, linux-mm <linux-mm@kvack.org>
Subject: Re: Possible regression with file madvise(MADV_COLLAPSE)
Date: Fri, 11 Oct 2024 16:32:50 +1000	[thread overview]
Message-ID: <108c788a-0742-4957-aaa3-6e2e257d11bd@redhat.com> (raw)
In-Reply-To: <8ac28fb858a2394cc72c3dc5924f1fd031fc6fe0.camel@scylladb.com>

Hi Avi,

On 10/10/24 1:54 AM, Avi Kivity wrote:
> On Linux 6.10.10 with CONFIG_READ_ONLY_THP_FOR_FS=y,
> madvise(MADV_COLLAPSE) on  program text fails with EINVAL.
> 
> To reproduce, compile the reproducer with
> 
> clang -g -o text-hugepage  text-hugepage.c \
> 	-fuse-ld=lld \
> 	-Wl,-zcommon-page-size=2097152 -Wl,-zmax-page-size=2097152 \
>          -Wl,-z,separate-loadable-segments
> 
> and run:
> 
> $ strace -e trace=madvise ./text-hugepage
> madvise(0x400000, 2097152, MADV_HUGEPAGE) = 0
> madvise(0x400000, 2097152, MADV_POPULATE_READ) = 0
> madvise(0x400000, 2097152, MADV_COLLAPSE) = -1 EINVAL (Invalid
> argument)
> 
> (the funky linker options are needed to make sure the .text vma spans a
> hugepage).
> 
> 
> I say "possible regression" since I haven't tried it with an older
> kernel, but I believe it worked at some point or other seeing that
> others managed to get it to work.
> 
> ==== text-hugepage.c ====
> #include <stdlib.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <string.h>
> 
> #include <sys/mman.h>
> 
> static
> void
> try_remap_text_segment() {
>      FILE *fp = fopen("/proc/self/maps", "r");
>      if (!fp) {
>          return;
>      }
>      char *buf = NULL;
>      size_t n;
>      while (getline(&buf, &n, fp) >= 0) {
>          char *lstart = buf;
>          char *lmid = strchr(lstart, '-');
>          if (!lmid) {
>              continue;
>          }
>          *lmid++ = '\0';
>          char *lend = strchr(lmid, ' ');
>          if (!lend) {
>              continue;
>          }
>          *lend = '\0';
>          
>          size_t start = strtoul(lstart, NULL, 16);
>          size_t end = strtoul(lmid, NULL, 16);
>          uintptr_t some_text_addr = (uintptr_t)&try_remap_text_segment;
>          if (some_text_addr >= start && some_text_addr < end) {
>              end &= ~(uintptr_t)0x1fffff;
>              madvise((void*)start, end - start, MADV_HUGEPAGE);
>              madvise((void*)start, end - start, MADV_POPULATE_READ);
>              madvise((void*)start, end - start, MADV_COLLAPSE);
>              break;
>          }
>      }
>      free(buf);
>      fclose(fp);
> }
> 
> void
> huge_function() {
>      // Make sure .text is has a huge page full of stuff
>      asm volatile (".fill 4000000, 1, 0x90");
> }
> 
> int
> main() {
>      try_remap_text_segment();
> }
> ==== end text-hugepage.c ====
> 

I'm able to reproduce the issue with upstream kernel (v6.12.rc2) on ARM64 where the
base page size is 4KB. The reason why I looked into the issue is because of commit
d659b715e94a ("mm/huge_memory: avoid PMD-size page cache if needed") where -EINVAL
is enforced on madvise(MADV_COLLAPSE) on ARM64 where the base page size is 64KB.

In order to reproduce the issue, I have to drop the clean pagecache and compile
the test program every time.

[root@dhcp-10-26-1-237 issue]# cat Makefile
default:
	@echo 1 > /proc/sys/vm/drop_caches
	@gcc test.c -o test
	./test
[root@dhcp-10-26-1-237 issue]# make
./test
test: test.c:54: try_remap_text_segment: Assertion `ret == 0' failed.      <<< Error from madvise(MADV_COLLAPSE)
make: *** [Makefile:4: default] Aborted (core dumped)

Traced it a bit and found SCAN_FAIL is returned as the following call trace indicates.
However, the progream ("test") is opened as readonly, I don't understand how PG_dirty
is set.

Backtrace
=========
sys_madvise
   do_madvise
     madvise_behavior_valid
     madvise_walk_vmas
       madvise_vma_behavior
         can_modify_vma_madv
         madvise_collapse
           thp_vma_allowable_order
           hpage_collapse_scan_file
             collapse_file
               folio_test_dirty          # SCAN_FAIL returned here

Snapshot of /proc/`pidof test`/smaps before calling to madvise(MADV_COLLAPSE).

[root@dhcp-10-26-1-237 issue]# cat /proc/`pidof test`/smaps | head -n 25
00400000-00600000 r-xp 00000000 fd:05 101812754                          /home/gavin/sandbox/issue/test
Size:               2048 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                2048 kB
Pss:                2048 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:      2048 kB
Private_Dirty:         0 kB
Referenced:         2048 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           1
VmFlags: rd ex mr mw me hg

Thanks,
Gavin



  reply	other threads:[~2024-10-11  6:33 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-09 15:54 Avi Kivity
2024-10-11  6:32 ` Gavin Shan [this message]
2024-10-11 22:29 ` Yang Shi
2024-10-12 15:38   ` Avi Kivity
2024-10-12 20:05     ` Yang Shi
2024-10-12 20:24       ` Avi Kivity
2024-10-12 23:50         ` Yang Shi
2024-10-13 11:04           ` Avi Kivity
2024-10-13 13:25             ` Avi Kivity
2024-10-14 22:06               ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=108c788a-0742-4957-aaa3-6e2e257d11bd@redhat.com \
    --to=gshan@redhat.com \
    --cc=avi@scylladb.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox