linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Fumin Gao <Fumin.Gao@viavisolutions.com>
To: "mike.kravetz@oracle.com" <mike.kravetz@oracle.com>,
	"songmuchun@bytedance.com" <songmuchun@bytedance.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Report a huge page issue in kernel version v5.19.xx
Date: Mon, 3 Jul 2023 03:07:39 +0000	[thread overview]
Message-ID: <MW4PR18MB50831DE59BEA3DF1374DE3798029A@MW4PR18MB5083.namprd18.prod.outlook.com> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 3072 bytes --]

Hi,

What's the issue?
Recently in our product, I found a issue in kernel version v5.19.xx, this issue was fixed in kernel version v6.xx.
The issue is I can't get which node the huge page is on by system call move_pages.

How to reproduce this issue?
I attached my test programme file in email.
virtaddr = mmap(NULL, ONE_GIG, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB , -1, 0);
*(char *)virtaddr = 0;
if (syscall(SYS_move_pages, 0, 1, &virtaddr, NULL, &NumaNode, 0) != 0)
{
   printf("Get virtual address 0x%p on NumaNode failed \n", virtaddr);
}
printf("create shared memory with mmap, virtaddr 0x%lx on Node %d, errno %d \n", virtaddr,NumaNode, errno);

When tested with kernel v5.19.xx , the value of  NumaNode is -2 (-ENOENT).

My analysis of this issue.
Based on the following trace and kernel source code, I can see the function calling process.
kernel_move_pages - do_pages_stat -  do_pages_stat_array - follow_page - follow_page_mask
- follow_p4d_mask - follow_pud_mask - follow_huge_pud

[001] ..... 510329749178328: sys_move_pages(pid: 0, nr_pages: 1, pages: 7fffa23a2c90, nodes: 0, status: 7fffa23a2c9c, flags: 0)
[001] ..... 510329749179360: sys_enter: NR 279 (0, 1, 7fffa23a2c90, 0, 7fffa23a2c9c, 0)
[001] ...1. 510329749185448: mmap_lock_start_locking: mm=00000000e0f35bcd memcg_path=/user.slice/user-1000.slice/session-1.scope write=false
[001] ...1. 510329749187872: mmap_lock_acquire_returned: mm=00000000e0f35bcd memcg_path=/user.slice/user-1000.slice/session-1.scope write=false success=true
[001] ..... 510329749196628: p_follow_page_0: (follow_page+0x0/0xe0)
[001] ..... 510329749199690: p_vma_is_secretmem_0: (vma_is_secretmem+0x0/0x20)
[001] ..... 510329749202194: p_follow_page_mask_0: (follow_page_mask+0x0/0x160)
[001] ..... 510329749206928: p_follow_huge_addr_0: (follow_huge_addr+0x0/0x20)
[001] ..... 510329749210628: myretprobe: (follow_page_mask+0x38/0x160 <- follow_huge_addr) ret=0xffffffffffffffea
[001] ..... 510329749216464: p_follow_pud_mask_isra_0_0: (follow_pud_mask.isra.0+0x0/0x1e0)
[001] ..... 510329749221108: p_follow_huge_pud_0: (follow_huge_pud+0x0/0x80)
[001] ..... 510329749221902: myretprobe: (follow_pud_mask.isra.0+0x1c8/0x1e0 <- follow_huge_pud) ret=0x0
[001] ..... 510329749223462: myretprobe: (follow_page_mask+0x147/0x160 <- follow_pud_mask.isra.0) ret=0x0
[001] ..... 510329749224838: myretprobe: (do_pages_stat+0x18b/0x330 <- follow_page) ret=0x0
[001] ...1. 510329749226096: mmap_lock_released: mm=00000000e0f35bcd memcg_path=/user.slice/user-1000.slice/session-1.scope write=false
[001] ..... 510329749228348: sys_move_pages -> 0x0
[001] ..... 510329749229224: sys_exit: NR 279 = 0

In the kernel version v5.19.xx, it add a flag FOLL_GET  in do_pages_stat_array compared with v5.18.xx.
        page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);

But in the function follow_huge_pud, if the flags has FOLL_GET, it will return NULL. This causes we get the status is -ENOENT (-2)
in move_pages.


Is my analysis correct ?

Thanks
Fumin


[-- Attachment #1.2: Type: text/html, Size: 8870 bytes --]

[-- Attachment #2: test_programme.c --]
[-- Type: text/plain, Size: 1003 bytes --]


#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <errno.h>
#include <string.h>
#include <signal.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <fcntl.h>

#define ONE_MEG               (1024UL*1024UL)
#define ONE_GIG               (1024UL * ONE_MEG)

int main(int argc, char * argv[])
{
	char s[32];
	int  NumaNode;
    void *virtaddr;
	virtaddr = mmap(NULL, ONE_GIG, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB , -1, 0);
    if (virtaddr == -1)
    {
	   printf("mmap return failed\n");
       perror("mmap");
       exit(1);
    }
	*(char *)virtaddr = 0;
	
	if (syscall(SYS_move_pages, 0, 1, &virtaddr, NULL, &NumaNode, 0) != 0)
    {
       printf("Get virtual address 0x%p on NumaNode failed \n", virtaddr);
    }
	
	printf("create shared memory with mmap, virtaddr 0x%lx on Node %d, errno %d \n", virtaddr,NumaNode, errno);
    
	return 0;
}

             reply	other threads:[~2023-07-03  3:07 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-03  3:07 Fumin Gao [this message]
2023-07-03  8:51 ` Muchun Song
2023-07-03  8:58   ` Fumin Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MW4PR18MB50831DE59BEA3DF1374DE3798029A@MW4PR18MB5083.namprd18.prod.outlook.com \
    --to=fumin.gao@viavisolutions.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox