linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* MGLRU OOM problem
@ 2024-06-24 17:32 Waiman Long
  0 siblings, 0 replies; only message in thread
From: Waiman Long @ 2024-06-24 17:32 UTC (permalink / raw)
  To: Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Johannes Weiner, Chris Down, Yu Zhao,
	Axel Rasmussen
  Cc: Linux Kernel Mailing List, Linux Memory Management List,
	Rafael Aquini, cgroups

Hi,

We are hitting an OOM issue with our OpenShift middleware which is
based on Kubernetes. Currently, it only sets memory.max when setting
a memory limit.  OOM kills are rather frequently encountered when we
try to write a large data file that exceeds memory.max to a NFS mount
filesystem. I have bisected the problem down to commit 14aa8b2d5c2e
("mm/mglru: don't sync disk for each aging cycle").

The following command can be used to cause an OOM kill when running in a
memory cgroup with a memory.max limit of 600M on a NFS mount filesystem.

  # dd if=/dev/urandom of=/disk/2G.bin bs=32K count=65536 
status=progress iflag=fullblock

In my case, I can cause an OOM when I ran the reproducer the 2nd time in 
a test system.

In the first successful run, the reported data rate was:

   2147483648 bytes (2.1 GB, 2.0 GiB) copied, 57.5474 s, 37.3 MB/s

After reverting commit 14aa8b2d5c2e ("mm/mglru: don't sync disk for each
aging cycle"), OOM can no longer be reproduced and the new data rate was:

   2147483648 bytes (2.1 GB, 2.0 GiB) copied, 25.694 s, 83.6 MB/s

If I disabled MGLRU (echo 0 > /sys/kernel/mm/lru_gen/enabled), the data
rate was:

   2147483648 bytes (2.1 GB, 2.0 GiB) copied, 21.184 s, 101 MB/s

I know that the purpose of commit 14aa8b2d5c2e to prevent premature
aging of SSDs. However I would like to find a way to wake up the flusher
whenever the cgroup is under memory pressure and have a lot of dirty
pages, but I don't have a solid clue yet.

I am aware that there was a previous discussion about this commit in
[1], so I would like to engage the same community to see if there can
be a proper solution to this problem.

[1] https://lore.kernel.org/lkml/ZcWOh9u3uqZjNFMa@chrisdown.name/

Cheers,
Longman



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-06-24 17:32 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-24 17:32 MGLRU OOM problem Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox