From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8BB0C433EF for ; Mon, 21 Feb 2022 14:26:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 16FC18D0002; Mon, 21 Feb 2022 09:26:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 11E398D0001; Mon, 21 Feb 2022 09:26:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F01F78D0002; Mon, 21 Feb 2022 09:26:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0067.hostedemail.com [216.40.44.67]) by kanga.kvack.org (Postfix) with ESMTP id DAE6D8D0001 for ; Mon, 21 Feb 2022 09:26:52 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9626B9A7F6 for ; Mon, 21 Feb 2022 14:26:52 +0000 (UTC) X-FDA: 79167013464.08.69FCF1A Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf29.hostedemail.com (Postfix) with ESMTP id 0B2A6120007 for ; Mon, 21 Feb 2022 14:26:51 +0000 (UTC) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 9E6582111A; Mon, 21 Feb 2022 14:26:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1645453610; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=erFIj+CqDafIPHSlUauOlPFan3dbHqIvMXOe9E+uD7E=; b=dvBudZUIwiPAa88w5PCfHStdKNp+fGaML18owRLK1BWqsMEHnWmBC0Km+ACMsfjZ3EysyA 4DzUgfQ/WAkVGp0wc6AIWYMW8Ry1vwQPgWsYOBCT0yVY5NMlCW8dhjOW/c9bNH3uSdtJq5 EllLNXbCw13lEtn6/BWwc6bkZGMT2IM= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 6E3D9A3B81; Mon, 21 Feb 2022 14:26:50 +0000 (UTC) Date: Mon, 21 Feb 2022 15:26:49 +0100 From: Michal Hocko To: Sebastian Andrzej Siewior Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Johannes Weiner , Michal =?iso-8859-1?Q?Koutn=FD?= , Peter Zijlstra , Thomas Gleixner , Vladimir Davydov , Waiman Long , Roman Gushchin Subject: Re: [PATCH v3 1/5] mm/memcg: Revert ("mm/memcg: optimize user context object stock access") Message-ID: References: <20220217094802.3644569-1-bigeasy@linutronix.de> <20220217094802.3644569-2-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220217094802.3644569-2-bigeasy@linutronix.de> Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=dvBudZUI; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf29.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 0B2A6120007 X-Stat-Signature: p4b668x1dmtjur5hcaeizdg9aq3r55dh X-HE-Tag: 1645453611-809920 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 17-02-22 10:47:58, Sebastian Andrzej Siewior wrote: > From: Michal Hocko > > The optimisation is based on a micro benchmark where local_irq_save() is > more expensive than a preempt_disable(). There is no evidence that it is > visible in a real-world workload and there are CPUs where the opposite is > true (local_irq_save() is cheaper than preempt_disable()). > > Based on micro benchmarks, the optimisation makes sense on PREEMPT_NONE > where preempt_disable() is optimized away. There is no improvement with > PREEMPT_DYNAMIC since the preemption counter is always available. > > The optimization makes also the PREEMPT_RT integration more complicated > since most of the assumption are not true on PREEMPT_RT. > > Revert the optimisation since it complicates the PREEMPT_RT integration > and the improvement is hardly visible. > > [ bigeasy: Patch body around Michal's diff ] Thanks for preparing the changelog for this. I was planning to post mine but I was waiting for a feedback from Waiman. Anyway this looks good to me. > > Link: https://lore.kernel.org/all/YgOGkXXCrD%2F1k+p4@dhcp22.suse.cz > Link: https://lkml.kernel.org/r/YdX+INO9gQje6d0S@linutronix.de > Signed-off-by: Michal Hocko > Signed-off-by: Sebastian Andrzej Siewior > Acked-by: Roman Gushchin > Acked-by: Johannes Weiner > --- > mm/memcontrol.c | 94 ++++++++++++++----------------------------------- > 1 file changed, 27 insertions(+), 67 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 3c4816147273a..8ab2dc75e70ec 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2078,23 +2078,17 @@ void unlock_page_memcg(struct page *page) > folio_memcg_unlock(page_folio(page)); > } > > -struct obj_stock { > +struct memcg_stock_pcp { > + struct mem_cgroup *cached; /* this never be root cgroup */ > + unsigned int nr_pages; > + > #ifdef CONFIG_MEMCG_KMEM > struct obj_cgroup *cached_objcg; > struct pglist_data *cached_pgdat; > unsigned int nr_bytes; > int nr_slab_reclaimable_b; > int nr_slab_unreclaimable_b; > -#else > - int dummy[0]; > #endif > -}; > - > -struct memcg_stock_pcp { > - struct mem_cgroup *cached; /* this never be root cgroup */ > - unsigned int nr_pages; > - struct obj_stock task_obj; > - struct obj_stock irq_obj; > > struct work_struct work; > unsigned long flags; > @@ -2104,13 +2098,13 @@ static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); > static DEFINE_MUTEX(percpu_charge_mutex); > > #ifdef CONFIG_MEMCG_KMEM > -static void drain_obj_stock(struct obj_stock *stock); > +static void drain_obj_stock(struct memcg_stock_pcp *stock); > static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, > struct mem_cgroup *root_memcg); > static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages); > > #else > -static inline void drain_obj_stock(struct obj_stock *stock) > +static inline void drain_obj_stock(struct memcg_stock_pcp *stock) > { > } > static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, > @@ -2190,9 +2184,7 @@ static void drain_local_stock(struct work_struct *dummy) > local_irq_save(flags); > > stock = this_cpu_ptr(&memcg_stock); > - drain_obj_stock(&stock->irq_obj); > - if (in_task()) > - drain_obj_stock(&stock->task_obj); > + drain_obj_stock(stock); > drain_stock(stock); > clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); > > @@ -2767,41 +2759,6 @@ static struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg) > */ > #define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | __GFP_ACCOUNT) > > -/* > - * Most kmem_cache_alloc() calls are from user context. The irq disable/enable > - * sequence used in this case to access content from object stock is slow. > - * To optimize for user context access, there are now two object stocks for > - * task context and interrupt context access respectively. > - * > - * The task context object stock can be accessed by disabling preemption only > - * which is cheap in non-preempt kernel. The interrupt context object stock > - * can only be accessed after disabling interrupt. User context code can > - * access interrupt object stock, but not vice versa. > - */ > -static inline struct obj_stock *get_obj_stock(unsigned long *pflags) > -{ > - struct memcg_stock_pcp *stock; > - > - if (likely(in_task())) { > - *pflags = 0UL; > - preempt_disable(); > - stock = this_cpu_ptr(&memcg_stock); > - return &stock->task_obj; > - } > - > - local_irq_save(*pflags); > - stock = this_cpu_ptr(&memcg_stock); > - return &stock->irq_obj; > -} > - > -static inline void put_obj_stock(unsigned long flags) > -{ > - if (likely(in_task())) > - preempt_enable(); > - else > - local_irq_restore(flags); > -} > - > /* > * mod_objcg_mlstate() may be called with irq enabled, so > * mod_memcg_lruvec_state() should be used. > @@ -3082,10 +3039,13 @@ void __memcg_kmem_uncharge_page(struct page *page, int order) > void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, > enum node_stat_item idx, int nr) > { > + struct memcg_stock_pcp *stock; > unsigned long flags; > - struct obj_stock *stock = get_obj_stock(&flags); > int *bytes; > > + local_irq_save(flags); > + stock = this_cpu_ptr(&memcg_stock); > + > /* > * Save vmstat data in stock and skip vmstat array update unless > * accumulating over a page of vmstat data or when pgdat or idx > @@ -3136,26 +3096,29 @@ void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, > if (nr) > mod_objcg_mlstate(objcg, pgdat, idx, nr); > > - put_obj_stock(flags); > + local_irq_restore(flags); > } > > static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) > { > + struct memcg_stock_pcp *stock; > unsigned long flags; > - struct obj_stock *stock = get_obj_stock(&flags); > bool ret = false; > > + local_irq_save(flags); > + > + stock = this_cpu_ptr(&memcg_stock); > if (objcg == stock->cached_objcg && stock->nr_bytes >= nr_bytes) { > stock->nr_bytes -= nr_bytes; > ret = true; > } > > - put_obj_stock(flags); > + local_irq_restore(flags); > > return ret; > } > > -static void drain_obj_stock(struct obj_stock *stock) > +static void drain_obj_stock(struct memcg_stock_pcp *stock) > { > struct obj_cgroup *old = stock->cached_objcg; > > @@ -3211,13 +3174,8 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, > { > struct mem_cgroup *memcg; > > - if (in_task() && stock->task_obj.cached_objcg) { > - memcg = obj_cgroup_memcg(stock->task_obj.cached_objcg); > - if (memcg && mem_cgroup_is_descendant(memcg, root_memcg)) > - return true; > - } > - if (stock->irq_obj.cached_objcg) { > - memcg = obj_cgroup_memcg(stock->irq_obj.cached_objcg); > + if (stock->cached_objcg) { > + memcg = obj_cgroup_memcg(stock->cached_objcg); > if (memcg && mem_cgroup_is_descendant(memcg, root_memcg)) > return true; > } > @@ -3228,10 +3186,13 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, > static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, > bool allow_uncharge) > { > + struct memcg_stock_pcp *stock; > unsigned long flags; > - struct obj_stock *stock = get_obj_stock(&flags); > unsigned int nr_pages = 0; > > + local_irq_save(flags); > + > + stock = this_cpu_ptr(&memcg_stock); > if (stock->cached_objcg != objcg) { /* reset if necessary */ > drain_obj_stock(stock); > obj_cgroup_get(objcg); > @@ -3247,7 +3208,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, > stock->nr_bytes &= (PAGE_SIZE - 1); > } > > - put_obj_stock(flags); > + local_irq_restore(flags); > > if (nr_pages) > obj_cgroup_uncharge_pages(objcg, nr_pages); > @@ -6812,7 +6773,6 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) > long nr_pages; > struct mem_cgroup *memcg; > struct obj_cgroup *objcg; > - bool use_objcg = folio_memcg_kmem(folio); > > VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); > > @@ -6821,7 +6781,7 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) > * folio memcg or objcg at this point, we have fully > * exclusive access to the folio. > */ > - if (use_objcg) { > + if (folio_memcg_kmem(folio)) { > objcg = __folio_objcg(folio); > /* > * This get matches the put at the end of the function and > @@ -6849,7 +6809,7 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) > > nr_pages = folio_nr_pages(folio); > > - if (use_objcg) { > + if (folio_memcg_kmem(folio)) { > ug->nr_memory += nr_pages; > ug->nr_kmem += nr_pages; > > -- > 2.34.1 -- Michal Hocko SUSE Labs