From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92D88C77B78 for ; Tue, 18 Apr 2023 18:46:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 064B48E0006; Tue, 18 Apr 2023 14:46:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 014B08E0001; Tue, 18 Apr 2023 14:46:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E1F978E0006; Tue, 18 Apr 2023 14:46:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D5BD08E0001 for ; Tue, 18 Apr 2023 14:46:27 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id AA33B1A02AA for ; Tue, 18 Apr 2023 18:46:27 +0000 (UTC) X-FDA: 80695392414.19.3C54EDD Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf17.hostedemail.com (Postfix) with ESMTP id DFC674000E for ; Tue, 18 Apr 2023 18:46:25 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=a1381ug+; spf=none (imf17.hostedemail.com: domain of mcgrof@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=mcgrof@infradead.org; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681843585; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nB52nxujZdKTDJqU+NXGOBB2vkb1FAhhHGmtI1+SM4A=; b=3UnLlCmUdhSpsgkisiCgh6ztNrUjOwlAhyt88n0X5mUD4dNq9vr8/6J+uQX8MF7T7zsagr /Be8yE2ZcnU0NTYZ9HFefUc1PmevLWwLLXoTy0tvR2onbqz3Bn3Alb73trG5+c7q9y1mDd RtlnlHrBByrB/LZNLRL2eqPLfJ456dQ= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=a1381ug+; spf=none (imf17.hostedemail.com: domain of mcgrof@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=mcgrof@infradead.org; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681843585; a=rsa-sha256; cv=none; b=H3JTBtcVWLS/SduNjLN0QQB+BD67v5J7OO0ioZXSKkjKXKmWccvM2TwKDD3koptiUjN8C6 VNdszqMkxFPTR7GiRbmd9CEFIaXqHUrMuS9zU0GKlWoNC9ufLEuANrTj4Gxqg7Fc4H/ImK p8tLCViOnAnCYF9OhU+44WqZNWm9IHw= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:In-Reply-To: Content-Transfer-Encoding:Content-Type:MIME-Version:References:Message-ID: Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description; bh=nB52nxujZdKTDJqU+NXGOBB2vkb1FAhhHGmtI1+SM4A=; b=a1381ug+O3Nh4vISHBwlSJpgdM m4klJEBpaJu1lnWDaq8Xlc2VsXzkdTSb/tyopbUPR/HR3oaAUUlosR2wd1hf03TrJ7grjaoPTB20m fEQvdBmFyZ7utVjbtR9Omn//3YPanCK9Z6HRhL+mQTfK1pqPS4yIagW0k1zvQdvxOkxgC6RjQtlXP nHfHcCcbLqIgP+2uiFukGr4RPWezPs8AUd20L+8ER5BI0pVz4wvZEooGatFfCaxnew448u4AZjStu vbarBCpSUKjaKUtkVTh0IKCM14l8vZyNsfFoyt5INwPcuadT9LXNpyEVhrCQ0ECRnu+zuVhyQQ5p2 XSyg2dsQ==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.96 #2 (Red Hat Linux)) id 1poqLF-00374O-0h; Tue, 18 Apr 2023 18:46:13 +0000 Date: Tue, 18 Apr 2023 11:46:13 -0700 From: Luis Chamberlain To: "Edgecombe, Rick P" Cc: "keescook@chromium.org" , "hch@infradead.org" , "prarit@redhat.com" , "rppt@kernel.org" , "catalin.marinas@arm.com" , "Torvalds, Linus" , "willy@infradead.org" , "song@kernel.org" , "patches@lists.linux.dev" , "pmladek@suse.com" , "david@redhat.com" , "colin.i.king@gmail.com" , "linux-kernel@vger.kernel.org" , "dave.hansen@linux.intel.com" , "jim.cromie@gmail.com" , "vbabka@suse.cz" , "christophe.leroy@csgroup.eu" , "linux-mm@kvack.org" , "tglx@linutronix.de" , "jbaron@akamai.com" , "peterz@infradead.org" , "linux-modules@vger.kernel.org" , "gregkh@linuxfoundation.org" , "petr.pavlu@suse.com" , "rafael@kernel.org" , "Hocko, Michal" , "dave@stgolabs.net" Subject: Re: [RFC 2/2] kread: avoid duplicates Message-ID: References: <20230414052840.1994456-1-mcgrof@kernel.org> <20230414052840.1994456-3-mcgrof@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: DFC674000E X-Stat-Signature: 6bs1q1cmigrgifthf3nbuebnars4ecuy X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1681843585-574850 X-HE-Meta: U2FsdGVkX1/cEQS5aKH96ONxkY1ViV9fY4nMsmbdrsDvY4/cTtyszNA2fdafwUVODqCJPV+MHG/fTCF+RAGTTsU8Mj6Erq/wZ33dYX1cqgSAp6vOWXFU58eMVcav3+L0sslSzVxsmEnb1QFr5vRK20O2R6Ttgyr742EMRp9yv/ifWoorYR5m6X7EHfjqgUWsmoDiUAKBVyCqbY9bZduUwGaVCmT1quMpYS4Kica9UUaeIIXnwVEyTvcMfEBDYcpJUFk47APPMqJ2ibGffwH8+OTGqOZk0j0iNBvgkCM6UFQaR6fiESh90mVBn3tP1U7ftXhkBgXbTzua2tOucaccaQRtHYDt7V2VTbCO79m42sZGZyUD3J2BAtT0k0UwV33WaYuOqEVC+IYDnKz66B5BDYxdBqHq7CdfVEn3ubsgKxuy7mFYyUY06VOOdzZcYLR2m1KmoYLSNmwX3E8ZWYhiAPlkwTzDHgYP1bkGtGItSUoo0aKoDg9SN2WgMnV3CcXulL4wwm5YhmV3dwchCNnXM8bztH8ZtyBpKulow1Aw/ZSAHYr1kFJzeQM/UCAmG1tucDtvRQPm60lQqTT/0DG8Gvz20fw5rUFfedDx0Btto3cZPmk8zqdKhyNeS8TY8ugQusWAp3Pk8j9kLE9WpBrsx+XrYlGyABpfJp2RDoKh30EFic4v1W+CxWYU+n/MO/dGK8obSvDJWoaQNDfVcRO1GoLiE6HYMOjWE2j0t11Jn3DPI73G73jwwJjEoumnvBzvRZkeDoGpwI1mQThtt0EAsjXmQZ3bg51a0j+Ti9tllNfAs6KXm+yW7VXtJc7zXGNWO9JEHwpYnDcii3XfbSm6DRpKnepeIW4AX4MVOHD8dvI30sBGW9QtHWAKDG/Cpi/BuIkFQRDSAowLgkY77P5/vnV07IfljgdUoY/58wBim9t6fmu+FVbydGAILNjm9WsGlHgcUnJ0UyrhRZtNyq5 f3hdR3gA hJXKM8Zx2tczMUs2MUI2Q7Yt+NltnP+RC3WK4sQVQaOQWtcgFthnrnFEY+W+BJGLV2K+RyxHvztblb5jCpE8WcGpwXBVjClTrs6uvkaEtKiy+CIHXyPOti6vr3PfxIPKrdKiuvQbEztjxcJkMrsetB96eKoCh7bsZEvbY0ZvpmFVgV+N7Ka+6ej8zopAn1BkvrOTvznbE0QeiTBsmNS9S0SsoU05BSWPuEnQ/fZT2O9lqJsbUYb6jNHisyKjctVaCjmbzAfYATyg5C9eTWKYabN2XRrP+7hNzY2MPUGpn7hCrht+Wob8hpilQC+jvviErHcrXgmkkAa1Tv9SdfN0XFXuXECv1W6N2leN9YkY4u900e8GhDET/1v5nxyWl5IvBch0fFUe4lHdBTcCK42UuZur2QqRaWM5UHyeVU81g63Fw7lpzOcbgPWbdNg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Apr 17, 2023 at 03:08:34PM -0700, Luis Chamberlain wrote: > On Mon, Apr 17, 2023 at 05:33:49PM +0000, Edgecombe, Rick P wrote: > > On Sat, 2023-04-15 at 23:41 -0700, Luis Chamberlain wrote: > > > On Sat, Apr 15, 2023 at 11:04:12PM -0700, Christoph Hellwig wrote: > > > > On Thu, Apr 13, 2023 at 10:28:40PM -0700, Luis Chamberlain wrote: > > > > > With this we run into 0 wasted virtual memory bytes. > > > > > > > > Avoid what duplicates? > > > > > > David Hildenbrand had reported that with over 400 CPUs vmap space > > > runs out and it seems it was related to module loading. I took a > > > look and confirmed it. Module loading ends up requiring in the > > > worst case 3 vmalloc allocations, so typically at least twice > > > the size of the module size and in the worst case just add > > > the decompressed module size: > > > > > > a) initial kernel_read*() call > > > b) optional module decompression > > > c) the actual module data copy we will keep > > > > > > Duplicate module requests that come from userspace end up being > > > thrown > > > in the trash bin, as only one module will be allocated.  Although > > > there > > > are checks for a module prior to requesting a module udev still > > > doesn't > > > do the best of a job to avoid that and so we end up with tons of > > > duplicate module requests. We're talking about gigabytes of vmalloc > > > bytes just lost because of this for large systems and megabytes for > > > average systems. So for example with just 255 CPUs we can loose about > > > 13.58 GiB, and for 8 CPUs about 226.53 MiB. > > > > > > I have patches to curtail 1/2 of that space by doing a check in > > > kernel > > > before we do the allocation in c) if the module is already present. > > > For > > > a) it is harder because userspace just passes a file descriptor. But > > > since we can get the file path without the vmalloc this RFC suggest > > > maybe we can add a new kernel_read*() for module loading where it > > > makes > > > sense to have only one read happen at a time. > > > > I'm wondering how difficult it would be to just try to remove the > > vmallocs in (a) and (b) and operate on a list of pages. > > Yes I think it's worth long term to do that, if possible with seq reads. OK here's what I suggest we do then: I'll resubmit the first patch which allows us to prove / disprove if module-autoloading is the culprit. With that in place folks can debug their setup and verify how udev is to blame. I'll drop the second kernel_read*() patch / effort and punt this as a userspace problem as this is also not extremely pressing. Long term should evaluate how we can avoid vmalloc for the kread and module decompression. If this really becomes a pressing issue we can revisit if we want an in kernel solution, but at this point that likely would be systems with over 400-500 CPUs with KASAN enabled. Without KASAN the issue should eventually trigger if you're enablig modules but its hard to say at what point you'd hit this issue. Luis