From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81C9DC28CF5 for ; Wed, 26 Jan 2022 14:56:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DDD386B0072; Wed, 26 Jan 2022 09:56:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D8B216B0074; Wed, 26 Jan 2022 09:56:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C531E6B0075; Wed, 26 Jan 2022 09:56:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0201.hostedemail.com [216.40.44.201]) by kanga.kvack.org (Postfix) with ESMTP id B27396B0072 for ; Wed, 26 Jan 2022 09:56:57 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 71AB2181EBF15 for ; Wed, 26 Jan 2022 14:56:57 +0000 (UTC) X-FDA: 79072740474.14.7409AB2 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf27.hostedemail.com (Postfix) with ESMTP id 7B9AC40002 for ; Wed, 26 Jan 2022 14:56:56 +0000 (UTC) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 3BDC91F3B0; Wed, 26 Jan 2022 14:56:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1643209015; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=UeaomIGUdhSiqk8Q8VOQigps7LLu2LVmLRRyq9SaCY4=; b=KRy7/W1n2hWD1uB+K3X+aGTi9kfJu58jX2jUbpS/1gPVT1rZKrOJtY4fDEiASSaUbzo5pY WpTWZsmJJeW4G65E1maueXQ5/1gsvuSmq+SixKRHa97eTjiCYc1eXuvnAMSjppF4bcuISy cMyQgYk5Gm92IJZEKba0IVDsbT/hBo0= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 24CA0A3B87; Wed, 26 Jan 2022 14:56:55 +0000 (UTC) Date: Wed, 26 Jan 2022 15:56:52 +0100 From: Michal Hocko To: Sebastian Andrzej Siewior Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Johannes Weiner , Michal =?iso-8859-1?Q?Koutn=FD?= , Peter Zijlstra , Thomas Gleixner , Vladimir Davydov , Waiman Long Subject: Re: [PATCH 2/4] mm/memcg: Protect per-CPU counter by disabling preemption on PREEMPT_RT where needed. Message-ID: References: <20220125164337.2071854-1-bigeasy@linutronix.de> <20220125164337.2071854-3-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220125164337.2071854-3-bigeasy@linutronix.de> X-Rspamd-Queue-Id: 7B9AC40002 X-Rspam-User: nil Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b="KRy7/W1n"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf27.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com X-Stat-Signature: rghrswd54chzde48cdd63w14z6uwdqux X-Rspamd-Server: rspam08 X-HE-Tag: 1643209016-172886 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 25-01-22 17:43:35, Sebastian Andrzej Siewior wrote: > The per-CPU counter are modified with the non-atomic modifier. The > consistency is ensured by disabling interrupts for the update. > On non PREEMPT_RT configuration this works because acquiring a > spinlock_t typed lock with the _irq() suffix disables interrupts. On > PREEMPT_RT configurations the RMW operation can be interrupted. > > Another problem is that mem_cgroup_swapout() expects to be invoked with > disabled interrupts because the caller has to acquire a spinlock_t which > is acquired with disabled interrupts. Since spinlock_t never disables > interrupts on PREEMPT_RT the interrupts are never disabled at this > point. > > The code is never called from in_irq() context on PREEMPT_RT therefore > disabling preemption during the update is sufficient on PREEMPT_RT. > The sections which explicitly disable interrupts can remain on > PREEMPT_RT because the sections remain short and they don't involve > sleeping locks (memcg_check_events() is doing nothing on PREEMPT_RT). > > Disable preemption during update of the per-CPU variables which do not > explicitly disable interrupts. > > Signed-off-by: Sebastian Andrzej Siewior I can see that you have already discussed the choice for open coded version with Vlastimil. I do not have a strong opinion but I have to say I dislike the construct because it is not really clear just from reading the code. A wrapper could go and explain the underlying problem - including potential asserts for !PREEMPT_RT. One way or the other the change looks correct AFAICS. Acked-by: Michal Hocko > --- > mm/memcontrol.c | 21 +++++++++++++++++++-- > 1 file changed, 19 insertions(+), 2 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 36d27db673ca9..3d1b7cdd83db0 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -667,6 +667,8 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, > pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); > memcg = pn->memcg; > > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_disable(); > /* Update memcg */ > __this_cpu_add(memcg->vmstats_percpu->state[idx], val); > > @@ -674,6 +676,8 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, > __this_cpu_add(pn->lruvec_stats_percpu->state[idx], val); > > memcg_rstat_updated(memcg, val); > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_enable(); > } > > /** > @@ -756,8 +760,12 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, > if (mem_cgroup_disabled()) > return; > > + if (IS_ENABLED(PREEMPT_RT)) > + preempt_disable(); > __this_cpu_add(memcg->vmstats_percpu->events[idx], count); > memcg_rstat_updated(memcg, count); > + if (IS_ENABLED(PREEMPT_RT)) > + preempt_enable(); > } > > static unsigned long memcg_events(struct mem_cgroup *memcg, int event) > @@ -7194,9 +7202,18 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) > * i_pages lock which is taken with interrupts-off. It is > * important here to have the interrupts disabled because it is the > * only synchronisation we have for updating the per-CPU variables. > + * On PREEMPT_RT interrupts are never disabled and the updates to per-CPU > + * variables are synchronised by keeping preemption disabled. > */ > - VM_BUG_ON(!irqs_disabled()); > - mem_cgroup_charge_statistics(memcg, -nr_entries); > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) { > + VM_BUG_ON(!irqs_disabled()); > + mem_cgroup_charge_statistics(memcg, -nr_entries); > + } else { > + preempt_disable(); > + mem_cgroup_charge_statistics(memcg, -nr_entries); > + preempt_enable(); > + } > + > memcg_check_events(memcg, page_to_nid(page)); > > css_put(&memcg->css); > -- > 2.34.1 -- Michal Hocko SUSE Labs