* Possible regression with file madvise(MADV_COLLAPSE)
@ 2024-10-09 15:54 Avi Kivity
2024-10-11 6:32 ` Gavin Shan
2024-10-11 22:29 ` Yang Shi
0 siblings, 2 replies; 10+ messages in thread
From: Avi Kivity @ 2024-10-09 15:54 UTC (permalink / raw)
To: linux-mm
On Linux 6.10.10 with CONFIG_READ_ONLY_THP_FOR_FS=y,
madvise(MADV_COLLAPSE) on program text fails with EINVAL.
To reproduce, compile the reproducer with
clang -g -o text-hugepage text-hugepage.c \
-fuse-ld=lld \
-Wl,-zcommon-page-size=2097152 -Wl,-zmax-page-size=2097152 \
-Wl,-z,separate-loadable-segments
and run:
$ strace -e trace=madvise ./text-hugepage
madvise(0x400000, 2097152, MADV_HUGEPAGE) = 0
madvise(0x400000, 2097152, MADV_POPULATE_READ) = 0
madvise(0x400000, 2097152, MADV_COLLAPSE) = -1 EINVAL (Invalid
argument)
(the funky linker options are needed to make sure the .text vma spans a
hugepage).
I say "possible regression" since I haven't tried it with an older
kernel, but I believe it worked at some point or other seeing that
others managed to get it to work.
==== text-hugepage.c ====
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
static
void
try_remap_text_segment() {
FILE *fp = fopen("/proc/self/maps", "r");
if (!fp) {
return;
}
char *buf = NULL;
size_t n;
while (getline(&buf, &n, fp) >= 0) {
char *lstart = buf;
char *lmid = strchr(lstart, '-');
if (!lmid) {
continue;
}
*lmid++ = '\0';
char *lend = strchr(lmid, ' ');
if (!lend) {
continue;
}
*lend = '\0';
size_t start = strtoul(lstart, NULL, 16);
size_t end = strtoul(lmid, NULL, 16);
uintptr_t some_text_addr = (uintptr_t)&try_remap_text_segment;
if (some_text_addr >= start && some_text_addr < end) {
end &= ~(uintptr_t)0x1fffff;
madvise((void*)start, end - start, MADV_HUGEPAGE);
madvise((void*)start, end - start, MADV_POPULATE_READ);
madvise((void*)start, end - start, MADV_COLLAPSE);
break;
}
}
free(buf);
fclose(fp);
}
void
huge_function() {
// Make sure .text is has a huge page full of stuff
asm volatile (".fill 4000000, 1, 0x90");
}
int
main() {
try_remap_text_segment();
}
==== end text-hugepage.c ====
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Possible regression with file madvise(MADV_COLLAPSE)
2024-10-09 15:54 Possible regression with file madvise(MADV_COLLAPSE) Avi Kivity
@ 2024-10-11 6:32 ` Gavin Shan
2024-10-11 22:29 ` Yang Shi
1 sibling, 0 replies; 10+ messages in thread
From: Gavin Shan @ 2024-10-11 6:32 UTC (permalink / raw)
To: Avi Kivity, linux-mm
Hi Avi,
On 10/10/24 1:54 AM, Avi Kivity wrote:
> On Linux 6.10.10 with CONFIG_READ_ONLY_THP_FOR_FS=y,
> madvise(MADV_COLLAPSE) on program text fails with EINVAL.
>
> To reproduce, compile the reproducer with
>
> clang -g -o text-hugepage text-hugepage.c \
> -fuse-ld=lld \
> -Wl,-zcommon-page-size=2097152 -Wl,-zmax-page-size=2097152 \
> -Wl,-z,separate-loadable-segments
>
> and run:
>
> $ strace -e trace=madvise ./text-hugepage
> madvise(0x400000, 2097152, MADV_HUGEPAGE) = 0
> madvise(0x400000, 2097152, MADV_POPULATE_READ) = 0
> madvise(0x400000, 2097152, MADV_COLLAPSE) = -1 EINVAL (Invalid
> argument)
>
> (the funky linker options are needed to make sure the .text vma spans a
> hugepage).
>
>
> I say "possible regression" since I haven't tried it with an older
> kernel, but I believe it worked at some point or other seeing that
> others managed to get it to work.
>
> ==== text-hugepage.c ====
> #include <stdlib.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <string.h>
>
> #include <sys/mman.h>
>
> static
> void
> try_remap_text_segment() {
> FILE *fp = fopen("/proc/self/maps", "r");
> if (!fp) {
> return;
> }
> char *buf = NULL;
> size_t n;
> while (getline(&buf, &n, fp) >= 0) {
> char *lstart = buf;
> char *lmid = strchr(lstart, '-');
> if (!lmid) {
> continue;
> }
> *lmid++ = '\0';
> char *lend = strchr(lmid, ' ');
> if (!lend) {
> continue;
> }
> *lend = '\0';
>
> size_t start = strtoul(lstart, NULL, 16);
> size_t end = strtoul(lmid, NULL, 16);
> uintptr_t some_text_addr = (uintptr_t)&try_remap_text_segment;
> if (some_text_addr >= start && some_text_addr < end) {
> end &= ~(uintptr_t)0x1fffff;
> madvise((void*)start, end - start, MADV_HUGEPAGE);
> madvise((void*)start, end - start, MADV_POPULATE_READ);
> madvise((void*)start, end - start, MADV_COLLAPSE);
> break;
> }
> }
> free(buf);
> fclose(fp);
> }
>
> void
> huge_function() {
> // Make sure .text is has a huge page full of stuff
> asm volatile (".fill 4000000, 1, 0x90");
> }
>
> int
> main() {
> try_remap_text_segment();
> }
> ==== end text-hugepage.c ====
>
I'm able to reproduce the issue with upstream kernel (v6.12.rc2) on ARM64 where the
base page size is 4KB. The reason why I looked into the issue is because of commit
d659b715e94a ("mm/huge_memory: avoid PMD-size page cache if needed") where -EINVAL
is enforced on madvise(MADV_COLLAPSE) on ARM64 where the base page size is 64KB.
In order to reproduce the issue, I have to drop the clean pagecache and compile
the test program every time.
[root@dhcp-10-26-1-237 issue]# cat Makefile
default:
@echo 1 > /proc/sys/vm/drop_caches
@gcc test.c -o test
./test
[root@dhcp-10-26-1-237 issue]# make
./test
test: test.c:54: try_remap_text_segment: Assertion `ret == 0' failed. <<< Error from madvise(MADV_COLLAPSE)
make: *** [Makefile:4: default] Aborted (core dumped)
Traced it a bit and found SCAN_FAIL is returned as the following call trace indicates.
However, the progream ("test") is opened as readonly, I don't understand how PG_dirty
is set.
Backtrace
=========
sys_madvise
do_madvise
madvise_behavior_valid
madvise_walk_vmas
madvise_vma_behavior
can_modify_vma_madv
madvise_collapse
thp_vma_allowable_order
hpage_collapse_scan_file
collapse_file
folio_test_dirty # SCAN_FAIL returned here
Snapshot of /proc/`pidof test`/smaps before calling to madvise(MADV_COLLAPSE).
[root@dhcp-10-26-1-237 issue]# cat /proc/`pidof test`/smaps | head -n 25
00400000-00600000 r-xp 00000000 fd:05 101812754 /home/gavin/sandbox/issue/test
Size: 2048 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 2048 kB
Pss: 2048 kB
Pss_Dirty: 0 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 2048 kB
Private_Dirty: 0 kB
Referenced: 2048 kB
Anonymous: 0 kB
KSM: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
THPeligible: 1
VmFlags: rd ex mr mw me hg
Thanks,
Gavin
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Possible regression with file madvise(MADV_COLLAPSE)
2024-10-09 15:54 Possible regression with file madvise(MADV_COLLAPSE) Avi Kivity
2024-10-11 6:32 ` Gavin Shan
@ 2024-10-11 22:29 ` Yang Shi
2024-10-12 15:38 ` Avi Kivity
1 sibling, 1 reply; 10+ messages in thread
From: Yang Shi @ 2024-10-11 22:29 UTC (permalink / raw)
To: Avi Kivity; +Cc: linux-mm
On Wed, Oct 9, 2024 at 9:04 AM Avi Kivity <avi@scylladb.com> wrote:
>
> On Linux 6.10.10 with CONFIG_READ_ONLY_THP_FOR_FS=y,
> madvise(MADV_COLLAPSE) on program text fails with EINVAL.
>
> To reproduce, compile the reproducer with
>
> clang -g -o text-hugepage text-hugepage.c \
> -fuse-ld=lld \
> -Wl,-zcommon-page-size=2097152 -Wl,-zmax-page-size=2097152 \
> -Wl,-z,separate-loadable-segments
>
> and run:
Didn't clang make the page cache dirty?
Having sync between clang and the execution made the problem go away for me.
>
> $ strace -e trace=madvise ./text-hugepage
> madvise(0x400000, 2097152, MADV_HUGEPAGE) = 0
> madvise(0x400000, 2097152, MADV_POPULATE_READ) = 0
> madvise(0x400000, 2097152, MADV_COLLAPSE) = -1 EINVAL (Invalid
> argument)
>
> (the funky linker options are needed to make sure the .text vma spans a
> hugepage).
>
>
> I say "possible regression" since I haven't tried it with an older
> kernel, but I believe it worked at some point or other seeing that
> others managed to get it to work.
>
> ==== text-hugepage.c ====
> #include <stdlib.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <string.h>
>
> #include <sys/mman.h>
>
> static
> void
> try_remap_text_segment() {
> FILE *fp = fopen("/proc/self/maps", "r");
> if (!fp) {
> return;
> }
> char *buf = NULL;
> size_t n;
> while (getline(&buf, &n, fp) >= 0) {
> char *lstart = buf;
> char *lmid = strchr(lstart, '-');
> if (!lmid) {
> continue;
> }
> *lmid++ = '\0';
> char *lend = strchr(lmid, ' ');
> if (!lend) {
> continue;
> }
> *lend = '\0';
>
> size_t start = strtoul(lstart, NULL, 16);
> size_t end = strtoul(lmid, NULL, 16);
> uintptr_t some_text_addr = (uintptr_t)&try_remap_text_segment;
> if (some_text_addr >= start && some_text_addr < end) {
> end &= ~(uintptr_t)0x1fffff;
> madvise((void*)start, end - start, MADV_HUGEPAGE);
> madvise((void*)start, end - start, MADV_POPULATE_READ);
> madvise((void*)start, end - start, MADV_COLLAPSE);
> break;
> }
> }
> free(buf);
> fclose(fp);
> }
>
> void
> huge_function() {
> // Make sure .text is has a huge page full of stuff
> asm volatile (".fill 4000000, 1, 0x90");
> }
>
> int
> main() {
> try_remap_text_segment();
> }
> ==== end text-hugepage.c ====
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Possible regression with file madvise(MADV_COLLAPSE)
2024-10-11 22:29 ` Yang Shi
@ 2024-10-12 15:38 ` Avi Kivity
2024-10-12 20:05 ` Yang Shi
0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2024-10-12 15:38 UTC (permalink / raw)
To: Yang Shi; +Cc: linux-mm
On Fri, 2024-10-11 at 15:29 -0700, Yang Shi wrote:
> On Wed, Oct 9, 2024 at 9:04 AM Avi Kivity <avi@scylladb.com> wrote:
> >
> > On Linux 6.10.10 with CONFIG_READ_ONLY_THP_FOR_FS=y,
> > madvise(MADV_COLLAPSE) on program text fails with EINVAL.
> >
> > To reproduce, compile the reproducer with
> >
> > clang -g -o text-hugepage text-hugepage.c \
> > -fuse-ld=lld \
> > -Wl,-zcommon-page-size=2097152 -Wl,-zmax-page-size=2097152
> > \
> > -Wl,-z,separate-loadable-segments
> >
> > and run:
>
> Didn't clang make the page cache dirty?
>
> Having sync between clang and the execution made the problem go away
> for me.
>
I see it even with sync (and msync just before the madvise calls).
Tracing shows this (last lines before syscall exit):
| hpage_collapse_scan_file() {
| __rcu_read_lock();
| __rcu_read_unlock();
| }
so, it's not clear what the root cause is.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Possible regression with file madvise(MADV_COLLAPSE)
2024-10-12 15:38 ` Avi Kivity
@ 2024-10-12 20:05 ` Yang Shi
2024-10-12 20:24 ` Avi Kivity
0 siblings, 1 reply; 10+ messages in thread
From: Yang Shi @ 2024-10-12 20:05 UTC (permalink / raw)
To: Avi Kivity; +Cc: linux-mm
On Sat, Oct 12, 2024 at 8:38 AM Avi Kivity <avi@scylladb.com> wrote:
>
> On Fri, 2024-10-11 at 15:29 -0700, Yang Shi wrote:
> > On Wed, Oct 9, 2024 at 9:04 AM Avi Kivity <avi@scylladb.com> wrote:
> > >
> > > On Linux 6.10.10 with CONFIG_READ_ONLY_THP_FOR_FS=y,
> > > madvise(MADV_COLLAPSE) on program text fails with EINVAL.
> > >
> > > To reproduce, compile the reproducer with
> > >
> > > clang -g -o text-hugepage text-hugepage.c \
> > > -fuse-ld=lld \
> > > -Wl,-zcommon-page-size=2097152 -Wl,-zmax-page-size=2097152
> > > \
> > > -Wl,-z,separate-loadable-segments
> > >
> > > and run:
> >
> > Didn't clang make the page cache dirty?
> >
> > Having sync between clang and the execution made the problem go away
> > for me.
> >
>
> I see it even with sync (and msync just before the madvise calls).
Did you stop khugepaged? It may race with MADV_COLLAPSE. If it failed
due to race with khugepaged, you should see -EAGAIN instead of
-EINVAL.
I did the below commands in a loop for 1000 times, it never failed (I
modified the test program a little bit to print out failure if
MADV_COLLAPSE returns failure). I had khugepaged stopped and ran the
test on v6.12-rc1 kernel on my AmpereOne machine.
rm text-hugepage
clang -g -o text-hugepage text-hugepage.c -fuse-ld=lld
-Wl,-zcommon-page-size=2097152 -Wl,-zmax-page-size=2097152
-Wl,-z,separate-loadable-segments
sync
./text-hugepage
>
>
> Tracing shows this (last lines before syscall exit):
>
> | hpage_collapse_scan_file() {
> | __rcu_read_lock();
> | __rcu_read_unlock();
> | }
It meant collapse_file() was not called at all.
hpage_collapse_scan_file() failed. A couple of reasons may fail it,
for example, refcount is not expected, not on lru, etc. You can trace
huge_memory:mm_khugepaged_scan_file to get more information about the
failure.
>
>
> so, it's not clear what the root cause is.
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Possible regression with file madvise(MADV_COLLAPSE)
2024-10-12 20:05 ` Yang Shi
@ 2024-10-12 20:24 ` Avi Kivity
2024-10-12 23:50 ` Yang Shi
0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2024-10-12 20:24 UTC (permalink / raw)
To: Yang Shi; +Cc: linux-mm
On Sat, 2024-10-12 at 13:05 -0700, Yang Shi wrote:
> On Sat, Oct 12, 2024 at 8:38 AM Avi Kivity <avi@scylladb.com> wrote:
> >
> > On Fri, 2024-10-11 at 15:29 -0700, Yang Shi wrote:
> > > On Wed, Oct 9, 2024 at 9:04 AM Avi Kivity <avi@scylladb.com>
> > > wrote:
> > > >
> > > > On Linux 6.10.10 with CONFIG_READ_ONLY_THP_FOR_FS=y,
> > > > madvise(MADV_COLLAPSE) on program text fails with EINVAL.
> > > >
> > > > To reproduce, compile the reproducer with
> > > >
> > > > clang -g -o text-hugepage text-hugepage.c \
> > > > -fuse-ld=lld \
> > > > -Wl,-zcommon-page-size=2097152 -Wl,-zmax-page-
> > > > size=2097152
> > > > \
> > > > -Wl,-z,separate-loadable-segments
> > > >
> > > > and run:
> > >
> > > Didn't clang make the page cache dirty?
> > >
> > > Having sync between clang and the execution made the problem go
> > > away
> > > for me.
> > >
> >
> > I see it even with sync (and msync just before the madvise calls).
>
> Did you stop khugepaged? It may race with MADV_COLLAPSE. If it failed
> due to race with khugepaged, you should see -EAGAIN instead of
> -EINVAL.
I did not, but I don't imagine I hit the race in all my attempts.
>
> I did the below commands in a loop for 1000 times, it never failed (I
> modified the test program a little bit to print out failure if
> MADV_COLLAPSE returns failure). I had khugepaged stopped and ran the
> test on v6.12-rc1 kernel on my AmpereOne machine.
>
> rm text-hugepage
> clang -g -o text-hugepage text-hugepage.c -fuse-ld=lld
> -Wl,-zcommon-page-size=2097152 -Wl,-zmax-page-size=2097152
> -Wl,-z,separate-loadable-segments
> sync
> ./text-hugepage
>
> >
> >
> > Tracing shows this (last lines before syscall exit):
> >
> > > hpage_collapse_scan_file() {
> > > __rcu_read_lock();
> > > __rcu_read_unlock();
> > > }
>
> It meant collapse_file() was not called at all.
> hpage_collapse_scan_file() failed. A couple of reasons may fail it,
> for example, refcount is not expected, not on lru, etc. You can trace
> huge_memory:mm_khugepaged_scan_file to get more information about the
> failure.
text-hugepage-689146 [023] 200457.073794: mm_khugepaged_scan_file:
mm=0xffff92fc512aac00, scan_pfn=0x5a4310, filename=text-hugepage,
present=0, swap=0, result=page_compound
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Possible regression with file madvise(MADV_COLLAPSE)
2024-10-12 20:24 ` Avi Kivity
@ 2024-10-12 23:50 ` Yang Shi
2024-10-13 11:04 ` Avi Kivity
0 siblings, 1 reply; 10+ messages in thread
From: Yang Shi @ 2024-10-12 23:50 UTC (permalink / raw)
To: Avi Kivity; +Cc: linux-mm, Baolin Wang
On Sat, Oct 12, 2024 at 1:24 PM Avi Kivity <avi@scylladb.com> wrote:
>
> On Sat, 2024-10-12 at 13:05 -0700, Yang Shi wrote:
> > On Sat, Oct 12, 2024 at 8:38 AM Avi Kivity <avi@scylladb.com> wrote:
> > >
> > > On Fri, 2024-10-11 at 15:29 -0700, Yang Shi wrote:
> > > > On Wed, Oct 9, 2024 at 9:04 AM Avi Kivity <avi@scylladb.com>
> > > > wrote:
> > > > >
> > > > > On Linux 6.10.10 with CONFIG_READ_ONLY_THP_FOR_FS=y,
> > > > > madvise(MADV_COLLAPSE) on program text fails with EINVAL.
> > > > >
> > > > > To reproduce, compile the reproducer with
> > > > >
> > > > > clang -g -o text-hugepage text-hugepage.c \
> > > > > -fuse-ld=lld \
> > > > > -Wl,-zcommon-page-size=2097152 -Wl,-zmax-page-
> > > > > size=2097152
> > > > > \
> > > > > -Wl,-z,separate-loadable-segments
> > > > >
> > > > > and run:
> > > >
> > > > Didn't clang make the page cache dirty?
> > > >
> > > > Having sync between clang and the execution made the problem go
> > > > away
> > > > for me.
> > > >
> > >
> > > I see it even with sync (and msync just before the madvise calls).
> >
> > Did you stop khugepaged? It may race with MADV_COLLAPSE. If it failed
> > due to race with khugepaged, you should see -EAGAIN instead of
> > -EINVAL.
>
>
> I did not, but I don't imagine I hit the race in all my attempts.
>
> >
> > I did the below commands in a loop for 1000 times, it never failed (I
> > modified the test program a little bit to print out failure if
> > MADV_COLLAPSE returns failure). I had khugepaged stopped and ran the
> > test on v6.12-rc1 kernel on my AmpereOne machine.
> >
> > rm text-hugepage
> > clang -g -o text-hugepage text-hugepage.c -fuse-ld=lld
> > -Wl,-zcommon-page-size=2097152 -Wl,-zmax-page-size=2097152
> > -Wl,-z,separate-loadable-segments
> > sync
> > ./text-hugepage
> >
> > >
> > >
> > > Tracing shows this (last lines before syscall exit):
> > >
> > > > hpage_collapse_scan_file() {
> > > > __rcu_read_lock();
> > > > __rcu_read_unlock();
> > > > }
> >
> > It meant collapse_file() was not called at all.
> > hpage_collapse_scan_file() failed. A couple of reasons may fail it,
> > for example, refcount is not expected, not on lru, etc. You can trace
> > huge_memory:mm_khugepaged_scan_file to get more information about the
> > failure.
>
>
> text-hugepage-689146 [023] 200457.073794: mm_khugepaged_scan_file:
> mm=0xffff92fc512aac00, scan_pfn=0x5a4310, filename=text-hugepage,
> present=0, swap=0, result=page_compound
Aha, it is because v6.10 doesn't support collapse non-PMD order large
folios. It has been fixed in v6.12-rc1. The patch series is:
https://lore.kernel.org/all/cover.1724140601.git.baolin.wang@linux.alibaba.com/
The subject says "shmem", but it actually works for regular files too.
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Possible regression with file madvise(MADV_COLLAPSE)
2024-10-12 23:50 ` Yang Shi
@ 2024-10-13 11:04 ` Avi Kivity
2024-10-13 13:25 ` Avi Kivity
0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2024-10-13 11:04 UTC (permalink / raw)
To: Yang Shi; +Cc: linux-mm, Baolin Wang
On Sat, 2024-10-12 at 16:50 -0700, Yang Shi wrote:
> On Sat, Oct 12, 2024 at 1:24 PM Avi Kivity <avi@scylladb.com> wrote:
> >
> > On Sat, 2024-10-12 at 13:05 -0700, Yang Shi wrote:
> > > On Sat, Oct 12, 2024 at 8:38 AM Avi Kivity <avi@scylladb.com>
> > > wrote:
> > > >
> > > > On Fri, 2024-10-11 at 15:29 -0700, Yang Shi wrote:
> > > > > On Wed, Oct 9, 2024 at 9:04 AM Avi Kivity <avi@scylladb.com>
> > > > > wrote:
> > > > > >
> > > > > > On Linux 6.10.10 with CONFIG_READ_ONLY_THP_FOR_FS=y,
> > > > > > madvise(MADV_COLLAPSE) on program text fails with EINVAL.
> > > > > >
> > > > > > To reproduce, compile the reproducer with
> > > > > >
> > > > > > clang -g -o text-hugepage text-hugepage.c \
> > > > > > -fuse-ld=lld \
> > > > > > -Wl,-zcommon-page-size=2097152 -Wl,-zmax-page-
> > > > > > size=2097152
> > > > > > \
> > > > > > -Wl,-z,separate-loadable-segments
> > > > > >
> > > > > > and run:
> > > > >
> > > > > Didn't clang make the page cache dirty?
> > > > >
> > > > > Having sync between clang and the execution made the problem
> > > > > go
> > > > > away
> > > > > for me.
> > > > >
> > > >
> > > > I see it even with sync (and msync just before the madvise
> > > > calls).
> > >
> > > Did you stop khugepaged? It may race with MADV_COLLAPSE. If it
> > > failed
> > > due to race with khugepaged, you should see -EAGAIN instead of
> > > -EINVAL.
> >
> >
> > I did not, but I don't imagine I hit the race in all my attempts.
> >
> > >
> > > I did the below commands in a loop for 1000 times, it never
> > > failed (I
> > > modified the test program a little bit to print out failure if
> > > MADV_COLLAPSE returns failure). I had khugepaged stopped and ran
> > > the
> > > test on v6.12-rc1 kernel on my AmpereOne machine.
> > >
> > > rm text-hugepage
> > > clang -g -o text-hugepage text-hugepage.c -fuse-ld=lld
> > > -Wl,-zcommon-page-size=2097152 -Wl,-zmax-page-size=2097152
> > > -Wl,-z,separate-loadable-segments
> > > sync
> > > ./text-hugepage
> > >
> > > >
> > > >
> > > > Tracing shows this (last lines before syscall exit):
> > > >
> > > > > hpage_collapse_scan_file() {
> > > > > __rcu_read_lock();
> > > > > __rcu_read_unlock();
> > > > > }
> > >
> > > It meant collapse_file() was not called at all.
> > > hpage_collapse_scan_file() failed. A couple of reasons may fail
> > > it,
> > > for example, refcount is not expected, not on lru, etc. You can
> > > trace
> > > huge_memory:mm_khugepaged_scan_file to get more information about
> > > the
> > > failure.
> >
> >
> > text-hugepage-689146 [023] 200457.073794:
> > mm_khugepaged_scan_file:
> > mm=0xffff92fc512aac00, scan_pfn=0x5a4310, filename=text-hugepage,
> > present=0, swap=0, result=page_compound
>
> Aha, it is because v6.10 doesn't support collapse non-PMD order large
> folios. It has been fixed in v6.12-rc1. The patch series is:
> https://lore.kernel.org/all/cover.1724140601.git.baolin.wang@linux.al
> ibaba.com/
>
> The subject says "shmem", but it actually works for regular files
> too.
Thanks a lot. I will retest when 6.12 reaches Fedora testing.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Possible regression with file madvise(MADV_COLLAPSE)
2024-10-13 11:04 ` Avi Kivity
@ 2024-10-13 13:25 ` Avi Kivity
2024-10-14 22:06 ` Yang Shi
0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2024-10-13 13:25 UTC (permalink / raw)
To: Yang Shi; +Cc: linux-mm, Baolin Wang
On Sun, 2024-10-13 at 14:04 +0300, Avi Kivity wrote:
> > >
> > > text-hugepage-689146 [023] 200457.073794:
> > > mm_khugepaged_scan_file:
> > > mm=0xffff92fc512aac00, scan_pfn=0x5a4310, filename=text-hugepage,
> > > present=0, swap=0, result=page_compound
> >
> > Aha, it is because v6.10 doesn't support collapse non-PMD order
> > large
> > folios. It has been fixed in v6.12-rc1. The patch series is:
> > https://lore.kernel.org/all/cover.1724140601.git.baolin.wang@linux.
> > al
> > ibaba.com/
> >
> > The subject says "shmem", but it actually works for regular files
> > too.
>
>
>
> Thanks a lot. I will retest when 6.12 reaches Fedora testing.
>
It is available, so I retested it (6.12-rc2). I confirm it works (and
delivers a nice performance improvement).
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-10-14 22:06 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-09 15:54 Possible regression with file madvise(MADV_COLLAPSE) Avi Kivity
2024-10-11 6:32 ` Gavin Shan
2024-10-11 22:29 ` Yang Shi
2024-10-12 15:38 ` Avi Kivity
2024-10-12 20:05 ` Yang Shi
2024-10-12 20:24 ` Avi Kivity
2024-10-12 23:50 ` Yang Shi
2024-10-13 11:04 ` Avi Kivity
2024-10-13 13:25 ` Avi Kivity
2024-10-14 22:06 ` Yang Shi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox