* The process hangs during memory-hotplug.
@ 2022-02-13 11:07 Yang Yanchao
2022-02-13 12:24 ` Matthew Wilcox
2022-02-14 11:55 ` David Hildenbrand
0 siblings, 2 replies; 3+ messages in thread
From: Yang Yanchao @ 2022-02-13 11:07 UTC (permalink / raw)
To: linux-mm; +Cc: yangyanchao6, wuxu.wu
Hello,
I find a hanging issue during memory-hotplug on kernel-4.18.
Repetition steps:
1. malloc for all system memory, write 'x', then free
2. for each removable memory block:
echo offline > /sys/devices/system/memory/memoryXXX/state
Then during the offline process, there is a high probability of being stuck for more than 20 minutes to five hours.
cat /sys/ Device/system/Memory/memoryXXX/state
The status is "going-offline"
I try to understand it by adding some print to the kernel.The discovery process can't exit in this loop:
__offline_pages
do_migrate_range
migrate_pages
unmap_and_move
move_to_new_page
fallback_migrate_page --> return EAGAIN
I try to clear the cache, but it don't seems to solve the problem.
echo 3 > /proc/sys/vm/drop_caches
Can I fix this problem with other Settings? Or can I see why it's stuck?
System configuration information:
Use physical machines instead of virtual machines
[root@localhost ~]# free -h
total used free shared buff/cache available
Mem: 502Gi 201Gi 300Gi 10Mi 172Mi 299Gi
Swap: 4.0Gi 0B 4.0Gi
[root@localhost ~]# uname -i
x86_64
Regards,
Yang Yanchao
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: The process hangs during memory-hotplug.
2022-02-13 11:07 The process hangs during memory-hotplug Yang Yanchao
@ 2022-02-13 12:24 ` Matthew Wilcox
2022-02-14 11:55 ` David Hildenbrand
1 sibling, 0 replies; 3+ messages in thread
From: Matthew Wilcox @ 2022-02-13 12:24 UTC (permalink / raw)
To: Yang Yanchao; +Cc: linux-mm, wuxu.wu
On Sun, Feb 13, 2022 at 07:07:03PM +0800, Yang Yanchao wrote:
> Hello,
>
> I find a hanging issue during memory-hotplug on kernel-4.18.
4.18 was released over three years ago and is long past its support
lifespan. 4.19.229 is the closest kernel which is still receiving
any kind of support, and even then, I'd recommend trying to reproduce
the problem on 5.16 before reporting this kind of bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: The process hangs during memory-hotplug.
2022-02-13 11:07 The process hangs during memory-hotplug Yang Yanchao
2022-02-13 12:24 ` Matthew Wilcox
@ 2022-02-14 11:55 ` David Hildenbrand
1 sibling, 0 replies; 3+ messages in thread
From: David Hildenbrand @ 2022-02-14 11:55 UTC (permalink / raw)
To: Yang Yanchao, linux-mm; +Cc: wuxu.wu
On 13.02.22 12:07, Yang Yanchao wrote:
> Hello,
>
Hi,
> I find a hanging issue during memory-hotplug on kernel-4.18.
you actually mean memory hotunplug / memory offlinig IIUC.
> Repetition steps:
> 1. malloc for all system memory, write 'x', then free
> 2. for each removable memory block:
Note that "removable=yes" was always racy and upstream Linux nowadays
only keeps that property around to not break older user space --
upstream Linux always reports "removable=yes" if memory offlining is
supported.
> echo offline > /sys/devices/system/memory/memoryXXX/state
> Then during the offline process, there is a high probability of being stuck for more than 20 minutes to five hours.
> cat /sys/ Device/system/Memory/memoryXXX/state
> The status is "going-offline"
> I try to understand it by adding some print to the kernel.The discovery process can't exit in this loop:
> __offline_pages
> do_migrate_range
> migrate_pages
> unmap_and_move
> move_to_new_page
> fallback_migrate_page --> return EAGAIN
> I try to clear the cache, but it don't seems to solve the problem.
> echo 3 > /proc/sys/vm/drop_caches
> Can I fix this problem with other Settings? Or can I see why it's stuck?
There are no real guarantees what will happen when trying offlinig a
memory block that's not onlined to ZONE_MOVABLE.
You can observe the zone e.g., via
$ cat /sys/devices/system/memory/memory40/valid_zones
Normal
Even with ZONE_MOVABLE, it can take quite a while (and in corner cases
eventually forever) until offlining succeeds.
Now, 20 minutes are a bit extreme. User space can always cancel
offlining -- in your example, by killing the "echo offline >
/sys/devices/system/memory/memoryXXX/state" process.
Having that said, as raised by Matthew, a lot changed since 4.18, so you
should try reproducing upstream. But even there, you can just cancel
offlining if it takes too long. If you observe similar behavior on
ZONE_MOVABLE, it would be interesting to find out how to better handle
that to make offlining succeed faster.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-02-14 11:55 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-13 11:07 The process hangs during memory-hotplug Yang Yanchao
2022-02-13 12:24 ` Matthew Wilcox
2022-02-14 11:55 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox