From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2573AC05027 for ; Fri, 20 Jan 2023 08:57:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8EE966B0074; Fri, 20 Jan 2023 03:57:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 89E816B0075; Fri, 20 Jan 2023 03:57:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 73EDB6B0078; Fri, 20 Jan 2023 03:57:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 637016B0074 for ; Fri, 20 Jan 2023 03:57:09 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 27339AAD3B for ; Fri, 20 Jan 2023 08:57:09 +0000 (UTC) X-FDA: 80374572978.11.7FC2570 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf04.hostedemail.com (Postfix) with ESMTP id 3B35D40018 for ; Fri, 20 Jan 2023 08:57:06 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=ap7yzgie; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf04.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674205027; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Vf+9QXYLPJFKQAebMwH94HpP9CqaExp96hDaRf0LikI=; b=Xm3t7cqCVwzmxJ82MrlO/hJgLdT1P07xLuP88OGWubfVX4fxQ5Wo0LM4JLDLPXoBfUvrsE R3/E+CzXdfo3lVw9npeeBLYI8CkMufMwCZ7RpEGuw3O99u+purINWTHTMvwZDDjckVu5Bk PQiodhcYCcFv25lngbMPLL6caX+d3ak= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=ap7yzgie; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf04.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674205027; a=rsa-sha256; cv=none; b=bbKk0fI5RrHDtjR522O7dO7V2FCMsjlkcPOzNpunBJWkNyuT5dzWMEc2D630PDlTpFImwQ RGzKAwyEXjtIxgFn+dPzqEGvoAQWrjvdajONF/jXgd+LtNF5dTXasKhC4tUYKCzdHhWDSx OLr0HX0omz/MmkF03zpp/GA5VJ+yY+o= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id EC76D228C7; Fri, 20 Jan 2023 08:57:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1674205025; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Vf+9QXYLPJFKQAebMwH94HpP9CqaExp96hDaRf0LikI=; b=ap7yzgie765OI2Bjiskk0swmdw8q/Ou6CT5XVIRkSLlVe5E+RT0kuCHyv/2AIoIYIEnLB5 Z9p2hGnzGzzEXqut/Md5QNH6oIk/Zt/OlGvoNT0oLI/FDz8CFTSaEpCPpQwGnvJZwLRiSx 42nNzYO/6O3hOaN3vqGTPTTv9nCU1vY= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D637913251; Fri, 20 Jan 2023 08:57:05 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 4Z1MNGFXymOZIAAAMHmgww (envelope-from ); Fri, 20 Jan 2023 08:57:05 +0000 Date: Fri, 20 Jan 2023 09:57:05 +0100 From: Michal Hocko To: "Paul E. McKenney" Cc: Suren Baghdasaryan , akpm@linux-foundation.org, michel@lespinasse.org, jglisse@google.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, hughlynch@google.com, leewalsh@google.com, posk@google.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free Message-ID: References: <20230109205336.3665937-1-surenb@google.com> <20230109205336.3665937-40-surenb@google.com> <20230118183447.GG2948950@paulmck-ThinkPad-P17-Gen-1> <20230119191707.GW2948950@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230119191707.GW2948950@paulmck-ThinkPad-P17-Gen-1> X-Rspamd-Queue-Id: 3B35D40018 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: karqi14iauzege3a9ia7ufi9yy8hz7mg X-HE-Tag: 1674205026-499592 X-HE-Meta: U2FsdGVkX1+A/ZxtDHyY70VCdXNQCfiXNnOF2gOX8gcWNe+2c+bxy0rlE9EYaczFkV0+2IGGEgkgOZbJkcxF4mcerPWo4dPl+rc5yZoGVAsCo9pUTEkL4YTy2vlu3HQOZpydyDeyQzDbqJoeVAtCiq5nbzl6oJIxoBOZpgScxDR9B38inguP1oVkowxXRgWfnjDdhepLcA6AE4KM0ARECWwPrf3QdS6wnJVmj+N17PZWyObSQt5cJjFnKJxQcN6Yujpm7OsLdPti4fDn2FWk5r0SNPNGjbrFKzjH2wGTbp/uhEFR4TxLBufZHbo5F2JXP0J78ucw7NuYdQA+TLiDYH9dweVgMx1TxaQp6VI54Kxd6onPrCdkEUvHekfodm+WnCYozrlX0QTmy+CZXve6/myaAUdjS0HVmuKVqcpVCmL7v9Zf6r28nQp0zl1bShBzqp8Vq3HLrYOfnYzeF6mFMgEW7CjhL29raa+AARgKZqYvDbLYlRPAAYgxAq5mkmSbPSzzz4L3Ssyqj2MlKyLA5ScgErDGQ42juMXDuUupqiwbJMevqspD+9BF2n/2TXtvaS15WlarwGyZGMCat0HXCiOeZguFL1P5iQ7x0Hh5ckkZbqq3XLncE/n7nYdmERvUOp/IQ8cHlpwjvN3lJLIXQitwFQdStthLdWKcFM3lDBfeqB8A0DQoleNugb2u8baaXrq1LQthPNJLvTX9TdvW5jfUQ06tjrhwOANUw4JJC6TfJnQZcDd1XVTxm9oIQtYfuJqyC5C8ZrauI5f2oujt2O5UVUBkZKocdYut4dMbN160HXywovT9MQ9K3ev9qT4X83cklx3oEA0j1hFor+f/MO5TI5r7kQTHZBigrtnOF/79LJ6L+fALvqdyFeZ4CSaL2dmROrm5IvvV13HZeiPYTNEoNaOPoNscC00Odb4M1XSZYGEA8FkQOxTSEZUznQz40pCloN7Df73VTEfAsQG DVUejmpm wdI9Eayr5VlQB0BUzD395TQcV14F6zcKJ5DcuJkuEjfab++L3nXZhJRy4om+uKJv1gL0FByHGM4UOfiWvHLgsI7XW0T00BVTvDomVd5JHBDbX42kse3ohOa8GQKU4FY/AGvrc4qMEHqlAlAUThY95T2Lt+fD7uUzSkKlEyQAUe+JkBRtDIvvmm65PancR745+8aiQjXGWVI80nZ0P0OBVnFG+Ozy5v/wPSxSIdD7E12/ZAT0WpbBhIUs5ISvYaliQuGsLdgui9GXVv4VTZANY94Cjqg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 19-01-23 11:17:07, Paul E. McKenney wrote: > On Thu, Jan 19, 2023 at 01:52:14PM +0100, Michal Hocko wrote: > > On Wed 18-01-23 11:01:08, Suren Baghdasaryan wrote: > > > On Wed, Jan 18, 2023 at 10:34 AM Paul E. McKenney wrote: > > [...] > > > > There are a couple of possibilities here. > > > > > > > > First, if I am remembering correctly, the time between the call_rcu() > > > > and invocation of the corresponding callback was taking multiple seconds, > > > > but that was because the kernel was built with CONFIG_LAZY_RCU=y in > > > > order to save power by batching RCU work over multiple call_rcu() > > > > invocations. If this is causing a problem for a given call site, the > > > > shiny new call_rcu_hurry() can be used instead. Doing this gets back > > > > to the old-school non-laziness, but can of course consume more power. > > > > > > That would not be the case because CONFIG_LAZY_RCU was not an option > > > at the time I was profiling this issue. > > > Laxy RCU would be a great option to replace this patch but > > > unfortunately it's not the default behavior, so I would still have to > > > implement this batching in case lazy RCU is not enabled. > > > > > > > > > > > Second, there is a much shorter one-jiffy delay between the call_rcu() > > > > and the invocation of the corresponding callback in kernels built with > > > > either CONFIG_NO_HZ_FULL=y (but only on CPUs mentioned in the nohz_full > > > > or rcu_nocbs kernel boot parameters) or CONFIG_RCU_NOCB_CPU=y (but only > > > > on CPUs mentioned in the rcu_nocbs kernel boot parameters). The purpose > > > > of this delay is to avoid lock contention, and so this delay is incurred > > > > only on CPUs that are queuing callbacks at a rate exceeding 16K/second. > > > > This is reduced to a per-jiffy limit, so on a HZ=1000 system, a CPU > > > > invoking call_rcu() at least 16 times within a given jiffy will incur > > > > the added delay. The reason for this delay is the use of a separate > > > > ->nocb_bypass list. As Suren says, this bypass list is used to reduce > > > > lock contention on the main ->cblist. This is not needed in old-school > > > > kernels built without either CONFIG_NO_HZ_FULL=y or CONFIG_RCU_NOCB_CPU=y > > > > (including most datacenter kernels) because in that case the callbacks > > > > enqueued by call_rcu() are touched only by the corresponding CPU, so > > > > that there is no need for locks. > > > > > > I believe this is the reason in my profiled case. > > > > > > > > > > > Third, if you are instead seeing multiple milliseconds of CPU consumed by > > > > call_rcu() in the common case (for example, without the aid of interrupts, > > > > NMIs, or SMIs), please do let me know. That sounds to me like a bug. > > > > > > I don't think I've seen such a case. > > > Thanks for clarifications, Paul! > > > > Thanks for the explanation Paul. I have to say this has caught me as a > > surprise. There are just not enough details about the benchmark to > > understand what is going on but I find it rather surprising that > > call_rcu can induce a higher overhead than the actual kmem_cache_free > > which is the callback. My naive understanding has been that call_rcu is > > really fast way to defer the execution to the RCU safe context to do the > > final cleanup. > > If I am following along correctly (ha!), then your "induce a higher > overhead" should be something like "induce a higher to-kfree() latency". Yes, this is expected. > Of course, there already is a higher latency-to-kfree via call_rcu() > than via a direct call to kfree(), and callback-offload CPUs that are > being flooded with callbacks raise that latency a jiffy or so more in > order to avoid lock contention. > > If this becomes a problem, the callback-offloading code can be a bit > smarter about avoiding lock contention, but need to see a real problem > before I make that change. But if there is a real problem I will of > course fix it. I believe that Suren claims that the call_rcu is really visible in the exit_mmap case. Time-to-free actual vmas shouldn't really be material for that path. If that happens much more later on there could be some side effects by an increased memory consumption but that should be marginal. How fast exit_mmap really is should only depend on direct calls from that path. But I guess we need some specific numbers from Suren to be sure what is going on here. Thanks! -- Michal Hocko SUSE Labs