From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFBE2C5B543 for ; Fri, 30 May 2025 15:19:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2C45F6B014D; Fri, 30 May 2025 11:19:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 274586B014E; Fri, 30 May 2025 11:19:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1630C6B014F; Fri, 30 May 2025 11:19:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E99C16B014D for ; Fri, 30 May 2025 11:19:20 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4FF4CC0BC7 for ; Fri, 30 May 2025 15:19:19 +0000 (UTC) X-FDA: 83499932838.28.9481A74 Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) by imf30.hostedemail.com (Postfix) with ESMTP id 45A8C8000B for ; Fri, 30 May 2025 15:19:17 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=HGFUI+3k; spf=pass (imf30.hostedemail.com: domain of mkoutny@suse.com designates 209.85.128.68 as permitted sender) smtp.mailfrom=mkoutny@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748618357; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=rHzLL+KnCdbgcbNQWzWBQ9D1stAOWVmlpwTkZFpyF58=; b=vifyzrfVLcffYqDt9k+jbwMy8v1n58BcvAxD66JvP0x9OAWzUqpH9u8DAZ5/P8/1ZThsEi tBQXiKVlqAW49gIS3LWL+77PH4LHKIF7PNaZGmzc5zRagKB4ggaF3Yru7ENxW+bn84TPkh R2EZeH8NlPUpFJYd8qe07GxBfOEyZ/Q= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=HGFUI+3k; spf=pass (imf30.hostedemail.com: domain of mkoutny@suse.com designates 209.85.128.68 as permitted sender) smtp.mailfrom=mkoutny@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748618357; a=rsa-sha256; cv=none; b=MeRNukGMakVVETl0qvytrTkYfKG1SDIFROMCPzbPUkg6mLdHX7ywI2LPdaKctgiS0pU8+x RlBEf8DXLs8ZqBy2Fw3ftZXnl0f5ioe+Kl7EjIbnfvlTAY3w6wWepar6JfDs9kEAWCFkVM 8QYKwUxtGhfb4/uKsYUxFvDeoZEDRWw= Received: by mail-wm1-f68.google.com with SMTP id 5b1f17b1804b1-45024721cbdso16650745e9.2 for ; Fri, 30 May 2025 08:19:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1748618355; x=1749223155; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=rHzLL+KnCdbgcbNQWzWBQ9D1stAOWVmlpwTkZFpyF58=; b=HGFUI+3kSxu12yfkQusB6d3QXW1l8z8g6+ql3dcoHZFoK+AkysnANBFfNDyxWwErmb zAdGDm5u+v29qVn5QQaGW3x1KN7QCj9uyDLkk1dEAq4C2tnUyQl0XJGJB5CKwcTsJDSI tzc5qJTcAR941CS1seLD0vLwCQ2T2k+ESxZ3swG+h0BxAPMgDe7UA+sgfg4klk3ZnBB6 rOf6owbfdZnxMiMWZuvTIDP9/ra7GnELzo4j/qto7lGT/HpFSGP2/U8y2ZZTaMFhMsX4 gxsHcOtOMacd+Y2pKdbGxf2xWWrbRPaFYalV/FlWIZU1OIOkKtqnEsUP0oSGKv4h45FY BIyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748618355; x=1749223155; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=rHzLL+KnCdbgcbNQWzWBQ9D1stAOWVmlpwTkZFpyF58=; b=tqfYXRuOqrqywGE+82aJ/nE0AGN+XQoTtKbZRTq4oqUSCuTl3M0iGcDZgrUcowBop4 1zsxOXL3WQjVQSO1cNMDyd2HluFaD0H8F70HqydemehwMAESy3DwRkwNo61I4OaqvYt4 GWm164TPHu24WfvK1ADjyvH3bLw11jD8BiJOpmCA7GE/abT/e/zRrlEYNnjh0dM+zb5V Wt1XNtXkjRms7smbgA/VLgbQdZe5gT5O/aOiRzSiODPA1JMt1ksRqvDFOwnDstV3s6DE S3ld01L0xxmlwMDlwog24ZEhcA98zD0PnaokljmIk7TPMq9/mUTy7eT9eIljiU3Bh/uY IQtw== X-Forwarded-Encrypted: i=1; AJvYcCXG9/arkHeuGv/wuwN4xaZOHQDVfbBmEVN28HYNoewYpaJJKmU+rWl9D1Kb//RI8aOKojlQSrCTNQ==@kvack.org X-Gm-Message-State: AOJu0Yx24bsx5ikmlaKkEzb6Y52gdO1pmSjxPxlnNOlwMgPpTe67X7oA TmCB/hYgAxnVqSosVLGK6pxm7vmY/nRAaBz4YttAts8CZN2rYlg5b1fFm/jDCGiHB6k= X-Gm-Gg: ASbGnctJ53qM0pURgeTOADM1o7Eypy5SJSGQ9mmpL33cJO42K27AOEMjgfstERXYLm/ vnt8Uplo+xuokwox4N0HPGlLNjos8DRdK6v6UzMkcpbyBJ9+VwWTFqQCXlRNwzmN1e3S3VMYCvN QQHuAx/zf/nIIqJmq4GGvrx2CmjlvQpS76UkdJIR79ZZQu2rHTDNfvSmoMCeGIzj7XWavZecQSj Wor+KvlSl4MKd2eYB6ZAoDW2/NQZDJsfMn9JSufQsJlvSpOVXEPcycYHph9ngYLbzFfUnFGiTUf V17NkeSckpb9ExiyWgy+nxoOufLmZYtbY8Hbc22c0CZRvr9wm6EIjvB1NRS0fq0t X-Google-Smtp-Source: AGHT+IFfgE9YkP6zKgTOBVVj8OU3ZF8h4Ob5dU10DqOggrcEUUPJXi1L8Z4gcTvobVCqNojEWm9B7g== X-Received: by 2002:a05:600c:4f8b:b0:442:e147:bea6 with SMTP id 5b1f17b1804b1-450d6514eaemr35846435e9.11.1748618355407; Fri, 30 May 2025 08:19:15 -0700 (PDT) Received: from blackdock.suse.cz ([193.86.92.181]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a4f0097813sm5112533f8f.72.2025.05.30.08.19.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 May 2025 08:19:15 -0700 (PDT) From: =?UTF-8?q?Michal=20Koutn=C3=BD?= To: cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Michal=20Koutn=C3=BD?= , Martin Doucha , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton Subject: [RFC PATCH] memcontrol: Wait for draining of remote stocks to avoid OOM when charging Date: Fri, 30 May 2025 17:18:57 +0200 Message-ID: <20250530151858.672391-1-mkoutny@suse.com> X-Mailer: git-send-email 2.49.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 45A8C8000B X-Stat-Signature: qzihjhdnrp6hf6ic3q9prgnorwt9kyq5 X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1748618357-975898 X-HE-Meta: U2FsdGVkX19uum/dl/u3vqRvBTeGhtfojps2pBhZ61unjfNmRQ/MnTx9blxw9sabe8WT6cN5HW2NxfWrfj+llvEYPv2f92lioNOfvESau69bcJaO+4j0UEn9+5W4VqwHQdE6LK2XaMcY64788e6NQClipO7t0dHQP6gO8bNcGtA99xZOhByj1S79tMEx90tbMRCfdbaAHU62J2zx011KPUDz7NuG8RH6fLJOaEdFF2XJ0+qR9sxfFTFrS8WN2L6jBTjXWGeu6IHux/5eertrtqgOG1L+oLEdGTrm+58z9NOKd2GzrRxoKMFzQi6ZuhNJUJDUxOK7NxALB+4jxxCglvKVMgOi8r3/1/pg/sEJpt9y+usDMIF1kN0cbcg7AbMhh8hsS9rtvSIvmJov5Zxk5eADhFW+YpLN1Q7OYP2tWO3k61zn4b0XOfnmxxziq5fyrfd2Jsa8mOHgzwH/2dye5uwcVRD4aX7o6uPXOrTAv4Aa6JY1LvOw90gEYlLqmkUY5Psy0KzNn71rjDS4BcDxHHRTW8JLOar/UbRsC6zJ3FwD8CvFQ8ZZd7hjXar+6mlfQiHrhbsNnqmL4dc+Q94O3cu5id2KEz4cIzzmvXJ1bDtWzqAc8DGKoY9WoX5xxrZseoQsyT7OlztKa4UtTD2qM3/N9UJyvUdx++PfLb0HSyCV1kLXsUWnsGFGXQSgyVHc2LS4CAhTypwCIwJbA5boaAngDsfb4Sur3SJ/23JmMxZMvmNc1B7sN+IdrzK2+R3oSmmq1u6GugfA/UjOQECx6x6gSvGbRd3/ASKv0NeXMnSaHAFK2v0ToylXcxePC/0wUHk8GO0HHjQ6bSUZxmx4JzdE0yDz1iV1hHmBEWOIOMoYmpNPH23Z7joC+sUWU9CaLpVigaNR3rVNWodEGG5Y1do/70wG0E19eejRozHe3qkNHWmaCAOHthFsKnpjlIUVvAUqtrBwW7VHiBMNyDY kXGlBbB2 WFzZBkxCr/HfIv3v7B5MvpFrMjHhlJK5hkO8A/cMAZIe3/t05bzT4faNBjRGTts4qTccgRn/tbltRG65+KNYqOQ9LePlwNaeKRbCx34zqTfY7JMLvu0l+Zu93vAHlEuhxL83HpyMkh6CbK7K+a9j1O6EmDWxJIRsZGSVEj3akUa6192HRHlZnZX1fXS85KlJmL8Cwte7cRYkWeX51xEtvgQxk9RtX1JSBIRGGrFsbiCj5KwsMPspudX+9Yo14bzGdayr2Ueu1TPpMAf8API33ZIzvGDIMgleIOgdLadzg8unqR9Q5uSnnSl9D7aiYEa8V2RzMFkuVBWt3Ifu3R3SLvnIhS4csMUpCN+3rR4ukC72YUD3XTGytwWGhcMcZxWr+9+x6vy5zFx31sgecXvZ5In4PC4Nn3nTt4rt83XudpTHc9gS0Gx60E/ryIw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The LTP memcontrol03.c checks behavior of memory.min protection under relatively tight conditions -- there is 2MiB margin for allocating task below test's memory.max. MEMCG_CHARGE_BATCH might be over-charged to page_counters temporarily but this alone should not lead to OOM because this overcharged amount is retrieved by draining stock. Or is it? I suspect this may cause troubles when there is >MEMCG_CHARGE_BATCH charge preceded by a small charge: try_charge_memcg(memcg, ..., 1); // counter->usage += 64 // local stock = 63 // no OOM but counter->usage > counter->max // running on different CPU try_charge_memcg(memcg, ..., 65); // 4M in stock + 148M new charge, only 150M w/out hard protection to reclaim try_to_free_mem_cgroup_pages if (cpu == curcpu) drain_local_stock // this would be ok else schedule_work_on(cpu, &stock->work); // this is asynchronous // charging+(no more)reclaim is retried MAX_RECLAIM_RETRIES = 16 times // if other cpu stock aren't flushed by now, this may cause OOM This effect is pronounced on machines with 64k page size where it makes MEMCG_CHARGE_BATCH worth whopping 4MiB (per CPU). Prevent the premature OOM by waiting for stock flushing (even) from remote CPUs. Link: https://lore.kernel.org/ltp/144b6bac-edba-470a-bf87-abf492d85ef5@suse.cz/ Reported-by: Martin Doucha Signed-off-by: Michal Koutný Tested-by: Martin Doucha --- mm/memcontrol-v1.h | 2 +- mm/memcontrol.c | 15 ++++++++++----- 2 files changed, 11 insertions(+), 6 deletions(-) My reason(s) for RFC: 1) I'm not sure if there isn't a simpler way than flushing stocks over all CPUs (also the guard with gfpflags_allow_blocking() is there only for explicitness, in case the code was moved over). 2) It requires specific scheduling over CPUs, so it may not be so common and severe in practice. diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 6358464bb4160..3e57645d0c175 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -24,7 +24,7 @@ unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); -void drain_all_stock(struct mem_cgroup *root_memcg); +void drain_all_stock(struct mem_cgroup *root_memcg, bool sync); unsigned long memcg_events(struct mem_cgroup *memcg, int event); unsigned long memcg_page_state_output(struct mem_cgroup *memcg, int item); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2d4d65f25fecd..ddf905baab12d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1911,7 +1911,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) * Drains all per-CPU charge caches for given root_memcg resp. subtree * of the hierarchy under it. */ -void drain_all_stock(struct mem_cgroup *root_memcg) +void drain_all_stock(struct mem_cgroup *root_memcg, bool sync) { int cpu, curcpu; @@ -1948,6 +1948,11 @@ void drain_all_stock(struct mem_cgroup *root_memcg) schedule_work_on(cpu, &stock->work); } } + if (sync) + for_each_online_cpu(cpu) { + struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); + flush_work(&stock->work); + } migrate_enable(); mutex_unlock(&percpu_charge_mutex); } @@ -2307,7 +2312,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, goto retry; if (!drained) { - drain_all_stock(mem_over_limit); + drain_all_stock(mem_over_limit, gfpflags_allow_blocking(gfp_mask)); drained = true; goto retry; } @@ -3773,7 +3778,7 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) wb_memcg_offline(memcg); lru_gen_offline_memcg(memcg); - drain_all_stock(memcg); + drain_all_stock(memcg, false); mem_cgroup_id_put(memcg); } @@ -4205,7 +4210,7 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, break; if (!drained) { - drain_all_stock(memcg); + drain_all_stock(memcg, false); drained = true; continue; } @@ -4253,7 +4258,7 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, break; if (!drained) { - drain_all_stock(memcg); + drain_all_stock(memcg, false); drained = true; continue; } base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca -- 2.49.0