* [RFC PATCH] memcontrol: Wait for draining of remote stocks to avoid OOM when charging
@ 2025-05-30 15:18 Michal Koutný
2025-06-12 11:37 ` Michal Koutný
0 siblings, 1 reply; 2+ messages in thread
From: Michal Koutný @ 2025-05-30 15:18 UTC (permalink / raw)
To: cgroups, linux-mm, linux-kernel
Cc: Michal Koutný,
Martin Doucha, Johannes Weiner, Michal Hocko, Roman Gushchin,
Shakeel Butt, Muchun Song, Andrew Morton
The LTP memcontrol03.c checks behavior of memory.min protection under
relatively tight conditions -- there is 2MiB margin for allocating task
below test's memory.max.
MEMCG_CHARGE_BATCH might be over-charged to page_counters temporarily
but this alone should not lead to OOM because this overcharged amount is
retrieved by draining stock. Or is it?
I suspect this may cause troubles when there is >MEMCG_CHARGE_BATCH charge
preceded by a small charge:
try_charge_memcg(memcg, ..., 1);
// counter->usage += 64
// local stock = 63
// no OOM but counter->usage > counter->max
// running on different CPU
try_charge_memcg(memcg, ..., 65);
// 4M in stock + 148M new charge, only 150M w/out hard protection to reclaim
try_to_free_mem_cgroup_pages
if (cpu == curcpu)
drain_local_stock // this would be ok
else
schedule_work_on(cpu, &stock->work); // this is asynchronous
// charging+(no more)reclaim is retried MAX_RECLAIM_RETRIES = 16 times
// if other cpu stock aren't flushed by now, this may cause OOM
This effect is pronounced on machines with 64k page size where it makes
MEMCG_CHARGE_BATCH worth whopping 4MiB (per CPU).
Prevent the premature OOM by waiting for stock flushing (even) from remote
CPUs.
Link: https://lore.kernel.org/ltp/144b6bac-edba-470a-bf87-abf492d85ef5@suse.cz/
Reported-by: Martin Doucha <mdoucha@suse.cz>
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Tested-by: Martin Doucha <mdoucha@suse.cz>
---
mm/memcontrol-v1.h | 2 +-
mm/memcontrol.c | 15 ++++++++++-----
2 files changed, 11 insertions(+), 6 deletions(-)
My reason(s) for RFC:
1) I'm not sure if there isn't a simpler way than flushing stocks over
all CPUs (also the guard with gfpflags_allow_blocking() is there only
for explicitness, in case the code was moved over).
2) It requires specific scheduling over CPUs, so it may not be so common
and severe in practice.
diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h
index 6358464bb4160..3e57645d0c175 100644
--- a/mm/memcontrol-v1.h
+++ b/mm/memcontrol-v1.h
@@ -24,7 +24,7 @@
unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap);
-void drain_all_stock(struct mem_cgroup *root_memcg);
+void drain_all_stock(struct mem_cgroup *root_memcg, bool sync);
unsigned long memcg_events(struct mem_cgroup *memcg, int event);
unsigned long memcg_page_state_output(struct mem_cgroup *memcg, int item);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2d4d65f25fecd..ddf905baab12d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1911,7 +1911,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
* Drains all per-CPU charge caches for given root_memcg resp. subtree
* of the hierarchy under it.
*/
-void drain_all_stock(struct mem_cgroup *root_memcg)
+void drain_all_stock(struct mem_cgroup *root_memcg, bool sync)
{
int cpu, curcpu;
@@ -1948,6 +1948,11 @@ void drain_all_stock(struct mem_cgroup *root_memcg)
schedule_work_on(cpu, &stock->work);
}
}
+ if (sync)
+ for_each_online_cpu(cpu) {
+ struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
+ flush_work(&stock->work);
+ }
migrate_enable();
mutex_unlock(&percpu_charge_mutex);
}
@@ -2307,7 +2312,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
goto retry;
if (!drained) {
- drain_all_stock(mem_over_limit);
+ drain_all_stock(mem_over_limit, gfpflags_allow_blocking(gfp_mask));
drained = true;
goto retry;
}
@@ -3773,7 +3778,7 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
wb_memcg_offline(memcg);
lru_gen_offline_memcg(memcg);
- drain_all_stock(memcg);
+ drain_all_stock(memcg, false);
mem_cgroup_id_put(memcg);
}
@@ -4205,7 +4210,7 @@ static ssize_t memory_high_write(struct kernfs_open_file *of,
break;
if (!drained) {
- drain_all_stock(memcg);
+ drain_all_stock(memcg, false);
drained = true;
continue;
}
@@ -4253,7 +4258,7 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
break;
if (!drained) {
- drain_all_stock(memcg);
+ drain_all_stock(memcg, false);
drained = true;
continue;
}
base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca
--
2.49.0
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [RFC PATCH] memcontrol: Wait for draining of remote stocks to avoid OOM when charging
2025-05-30 15:18 [RFC PATCH] memcontrol: Wait for draining of remote stocks to avoid OOM when charging Michal Koutný
@ 2025-06-12 11:37 ` Michal Koutný
0 siblings, 0 replies; 2+ messages in thread
From: Michal Koutný @ 2025-06-12 11:37 UTC (permalink / raw)
To: cgroups, linux-mm, linux-kernel
Cc: Martin Doucha, Johannes Weiner, Michal Hocko, Roman Gushchin,
Shakeel Butt, Muchun Song, Andrew Morton
[-- Attachment #1: Type: text/plain, Size: 378 bytes --]
On Fri, May 30, 2025 at 05:18:57PM +0200, Michal Koutný <mkoutny@suse.com> wrote:
> 2) It requires specific scheduling over CPUs, so it may not be so common
> and severe in practice.
This means in practice, there'd be likely a _different_ running memcg on
other CPUs and that would implicitly flush those stocks. I'm concluding
there's no big issue to fix.
Michal
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-06-12 11:37 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-30 15:18 [RFC PATCH] memcontrol: Wait for draining of remote stocks to avoid OOM when charging Michal Koutný
2025-06-12 11:37 ` Michal Koutný
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox