From: "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com>
To: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>
Cc: "dan.j.williams@intel.com" <dan.j.williams@intel.com>,
"Yasunori Gotou (Fujitsu)" <y-goto@fujitsu.com>,
"david@redhat.com >> David Hildenbrand" <david@redhat.com>,
Oscar Salvador <osalvador@suse.de>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"Xingtao Yao (Fujitsu)" <yaoxt.fnst@fujitsu.com>
Subject: Re: [BUG ?] Offline Memory gets stuck in offline_pages()
Date: Thu, 4 Jul 2024 07:43:55 +0000 [thread overview]
Message-ID: <5a4ef056-73c7-42e9-a839-43d42f8b7eab@fujitsu.com> (raw)
In-Reply-To: <6a07125f-e720-404c-b2f9-e55f3f166e85@fujitsu.com>
All,
Some progress updates
When issue occurs, calling __drain_all_pages() can make offline_pages() escape from the loop.
>
> Jun 28 15:29:26 linux kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7980dd
> Jun 28 15:29:26 linux kernel: flags: 0x9fffffc0000000(node=2|zone=3|lastcpupid=0x1fffff)
> Jun 28 15:29:26 linux kernel: raw: 009fffffc0000000 ffffdfbd9e603788 ffffd4f0ffd97ef0 0000000000000000
> Jun 28 15:29:26 linux kernel: raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> Jun 28 15:29:26 linux kernel: page dumped because: trouble page...
>
With this problematic page structure contents, it seems that the
list_head = {ffffdfbd9e603788, ffffd4f0ffd97ef0} is valid.
I guess it was linking to the pcp_list, so I dumped the per_cpu_pages[cpu].count
in every in critical timings.
An example is as below,
offline_pages()
{
// per_cpu_pages[1].count = 0
zone_pcp_disable() // will call __drain_all_pages()
// per_cpu_pages[1].count = 188
do {
do {
scan_movable_pages()
ret = do_migrate_range()
} while (!ret)
ret = test_pages_isolated()
if(is the 1st iteration)
// per_cpu_pages[1].count = 182
if (issue occurs) { /* if the loop take beyond 10 seconds */
// per_cpu_pages[1].count = 61
__drain_all_pages()
// per_cpu_pages[1].count = 0
/* will escape from the outer loop in later iterations */
}
} while (ret)
}
Some interesting points:
- After the 1st __drain_all_pages(), per_cpu_pages[1].count increased to 188 from 0,
does it mean it's racing with something...?
- per_cpu_pages[1].count will decrease but not decrease to 0 during iterations
- when issue occurs, calling __drain_all_pages() will decrease per_cpu_pages[1].count to 0.
So I wonder if it's fine to call __drain_all_pages() in the loop?
Looking forward to your insights.
Thanks
Zhijian
On 01/07/2024 09:25, Zhijian Li (Fujitsu) wrote:
> Hi all
>
>
> Overview:
> During testing the CXL memory hotremove, we noticed that `daxctl offline-memory dax0.0`
> would get stuck forever sometimes. daxctl offline-memory dax0.0 will write "offline" to
> /sys/devices/system/memory/memoryNNN/state.
>
> Workaround:
> When it happens, we can type Ctrl-C to abort it and then retry again.
> Then the CXL memory is able to offline successfully.
>
> Where the kernel gets stuck:
> After digging into the kernel, we found that when the issue occurs, the kernel
> is stuck in the outer loop of offline_pages(). Below is a piece of the
> highlighted offline_pages():
>
> ```
> int __ref offline_pages()
> {
> do { // outer loop
> pfn = start_pfn;
> do {
> ret = scan_movable_pages(pfn, end_pfn, &pfn); // It returns -ENOENT
> if (!ret)
> do_migrate_range(pfn, end_pfn); // Not reach here
> } while (!ret);
> ret = test_pages_isolated(start_pfn, end_pfn, MEMORY_OFFLINE);
> } while (ret); // ret is -EBUSY
> }
> ```
>
> In this case, we dumped the first page that cannot be isolated (see dump_page below), it's
> content does not change in each iteration.:
> ```
> Jun 28 15:29:26 linux kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7980dd
> Jun 28 15:29:26 linux kernel: flags: 0x9fffffc0000000(node=2|zone=3|lastcpupid=0x1fffff)
> Jun 28 15:29:26 linux kernel: raw: 009fffffc0000000 ffffdfbd9e603788 ffffd4f0ffd97ef0 0000000000000000
> Jun 28 15:29:26 linux kernel: raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> Jun 28 15:29:26 linux kernel: page dumped because: trouble page...
> ```
>
> Every time the issue occurs, the content of the page structure is similar.
>
> Questions:
> Q1. Is this behavior expected? At least for an OS administrator, it should return
> promptly (success or failure) instead of hanging indefinitely.
> Q2. Regarding the offline_pages() function, encountering such a page indeed causes
> an endless loop. Shouldn't another part of the kernel timely changed the state
> of this page?
>
> When I use the workaround mentioned above (Ctrl-C and try offline again), I find
> that the page state changes (see dump_page below):
> ```
> Jun 28 15:33:12 linux kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7980dd
> Jun 28 15:33:12 linux kernel: flags: 0x9fffffc0000000(node=2|zone=3|lastcpupid=0x1fffff)
> Jun 28 15:33:12 linux kernel: raw: 009fffffc0000000 dead000000000100 dead000000000122 0000000000000000
> Jun 28 15:33:12 linux kernel: raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> Jun 28 15:33:12 linux kernel: page dumped because: previous trouble page
> ```
>
> What our test does:
> We have a CXL memory device, which is configured as kmem and online into the MOVABLE
> zone as NUMA node2. We run two processes, consume-memory and offline-memory, in parallel,
> see the pseudo code below:
>
> ```
> main()
> {
> if (fork() == 0)
> numactl -m 2 ./consume-memory
> else {
> daxctl offline-memory dax0.0
> wait()
> }
> }
> ```
>
> Attached is the process information (when it gets stuck):
> ```
> root 25716 0.0 0.0 2460 1408 pts/0 S+ 15:28 0:00 ./main
> root 25719 0.0 0.0 0 0 pts/0 Z+ 15:28 0:00 [consume-memory] <defunct>
> root 25720 98.6 0.0 9476 3740 pts/0 R+ 15:28 0:26 daxctl offline-memory /dev/dax0.0
> ```
>
> Feel free to let me know if you need more details.
> Thank you for your attention to this issue. Looking forward to your insights.
>
> Thanks
> Zhijian
next prev parent reply other threads:[~2024-07-04 7:44 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-01 1:25 Zhijian Li (Fujitsu)
2024-07-01 7:14 ` David Hildenbrand
2024-07-01 12:07 ` Zhijian Li (Fujitsu)
2024-07-04 7:43 ` Zhijian Li (Fujitsu) [this message]
2024-07-04 8:14 ` David Hildenbrand
2024-07-04 13:07 ` Zhijian Li (Fujitsu)
2024-07-12 1:50 ` Zhijian Li (Fujitsu)
2024-07-12 5:51 ` Zhijian Li (Fujitsu)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5a4ef056-73c7-42e9-a839-43d42f8b7eab@fujitsu.com \
--to=lizhijian@fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=osalvador@suse.de \
--cc=y-goto@fujitsu.com \
--cc=yaoxt.fnst@fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox