From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2A2CD3B9A7 for ; Tue, 26 Nov 2024 15:33:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 371896B0082; Tue, 26 Nov 2024 10:33:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 321F66B0083; Tue, 26 Nov 2024 10:33:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 210AB6B008C; Tue, 26 Nov 2024 10:33:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F1A2E6B0082 for ; Tue, 26 Nov 2024 10:33:07 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A0C18A10E7 for ; Tue, 26 Nov 2024 15:33:07 +0000 (UTC) X-FDA: 82828639488.24.1E7D260 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf30.hostedemail.com (Postfix) with ESMTP id E5EF780019 for ; Tue, 26 Nov 2024 15:32:56 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf30.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732635183; a=rsa-sha256; cv=none; b=qE+AGbjIagZu31bsi6Lha2R0DHcmXGjS/IX990Jw8eCU9sJMW6CkbPhyU0X5JAkd3pRi7e chUgeatIFmnYVs9DC5ZJ0m2dfE5mSC0z3oO+eZ6WjjnUTVtkmddInc6Yviuwf1jTQSN8Yu EbwMzOxaB/TVHov61TE8bG2QiLfnrJY= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf30.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732635183; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=K9OalOPxNPre/qvjjofrMUevhNtcRLQHvopryVtEVc8=; b=ouGVfKO6D1n17aiFZ9/UsDOgTEyk0cDW5Ve4bIVjIQsZI5xJOiI0zBZhgr3gBbEPifLPlk gOqftV9roXqD0M678rcZKJ/Ha9PlsGMENDB0KVXV2brwneOd9QVtl8h50BJhZebxJJ3DNL Y9HeoSlX7v1TJuFBzp3PXlaGWgQIQZI= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 97E84153B; Tue, 26 Nov 2024 07:33:34 -0800 (PST) Received: from [10.1.29.199] (XHFQ2J9959.cambridge.arm.com [10.1.29.199]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0C4473F5A1; Tue, 26 Nov 2024 07:33:02 -0800 (PST) Message-ID: <569f1417-0b5a-4360-930f-f7e8c9c6e605@arm.com> Date: Tue, 26 Nov 2024 15:33:01 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH] mm/slab: Avoid build bug for calls to kmalloc with a large constant Content-Language: en-GB To: Vlastimil Babka , Dave Kleikamp , Andrew Morton , Christoph Lameter , David Rientjes , Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20241014105514.3206191-1-ryan.roberts@arm.com> <20241014105912.3207374-1-ryan.roberts@arm.com> <20241014105912.3207374-6-ryan.roberts@arm.com> <44312f4a-8b9c-49ce-9277-5873a94ca1bb@oracle.com> <7fb6c5a2-b9ae-4a29-a871-2f0bdc636e41@arm.com> <9675f4f0-6290-43aa-bf17-6b9c2b461485@suse.cz> <69746c3a-72af-4c28-8f04-bcfae7a78107@arm.com> <36577539-bff6-476e-8d6b-ca20e3de2391@suse.cz> From: Ryan Roberts In-Reply-To: <36577539-bff6-476e-8d6b-ca20e3de2391@suse.cz> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Queue-Id: E5EF780019 X-Rspamd-Server: rspam01 X-Stat-Signature: hdgbb5bwgry77gcyp3pnz9g3xxbnkemf X-HE-Tag: 1732635176-549521 X-HE-Meta: U2FsdGVkX18aWzXaJgpHg1zvnZy4+HgUIL7nhZ1jbcaS/8z1WPvjRjrtnx8/6c4AukvcEBFpPxqXozKx5MAHxM3z62PSAxoli+H49XLuz6AwkBQaNHlrjF3BUMf30BEHoy9l4Jt0UYRHj9r1N9wJb02gIVV7sAQ6lAB1N464C2E/t/1+z+UUGjDTyZLsN932/zSSPjPjt/+dFXqd3qlijYQqaz3AWIPMNA5NBVzG/EVq1oiLmUywGq/+TjwEZqyL9ZzqMs4cZgOKPfEUr6sAwA7L3mIQasj+a4h0jfzqLYQdTWHZi0sQ7z3syj+xBudsCsR96xpfjoz2SYN8mhGC0C8zq1U80XkuSs3rorX6fYZRRKwcUdEwdyuXriG56kohA8rcJ9DJCUGHn9pkXT4Bp3jtq2ARUpEj73VmEb6wIP7nzRvUzUNS2vTwJRqxY4OYE4NxlxfhWJ/uk+EaGw7QZlVFFwet5woil7uoOEj3IW+sXGk9Yv823lZkr/COres+LPsh2DuyAkHV5X8e+eoy/Ugxpp1Mb07cF0Cx+PngeWul6xV28AI39EPeZ3YkMMEydaKseDdCEI+WLlPp+B4xRP4lqn8tea+u82rbd/caydz8inm+F7cNN5qjK5dSgd98sWAgXqv2r+tXv56cFhLpTHYS4yErgUHrrBY7KDPm5fuOzyswNHptI8+cVM9oEKj4Xr7XGNydA07DZvx1nxrCVE01a5wT8jnvUFr6IPzqq2G3WXH1BpUwa7SgVLFj5uGpZbqAWcbsevfq5nWzfPk53rtmO8DC3feKMIu4W7Db4dd2O11Y8O0vnY19t7HBckl60kwKaW3wbvSoPLQA2I9iFOIdcXKNc3KMHAU9bxk2iN8YWLOabZKF7MymiH2tD+X7mNhrCmUlrrAtwLEq3zRYUf03AoFFMkUZbTgFHrj9CRns/jxeDxGHEOlTYTBw+vD8udxZkvKQ4OnRWRA/lFf Lwmpx6q3 FLQuDEST4Ohijg+0q3O6Mp6M/MpNC4ltqj3CMs6zE55/UEr6m5r8B5CJznaGW8OsGra8nCJr2Zc00hGqkaHl+IgJmClFvzLBkG61xkP4fsm3pm/SUyKxe/mpvdh8wzikfghmJ3SORW7O0acOXnKlbRYDnKaTsJ+s0Rb88dYmOdHvVF3nQUoktx2ssciDcATUK5Cd3BMTASlROJO/OffjIlsh7JA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 26/11/2024 15:27, Vlastimil Babka wrote: > On 11/26/24 16:09, Vlastimil Babka wrote: >> On 11/26/24 15:53, Ryan Roberts wrote: >>> On 26/11/2024 12:36, Vlastimil Babka wrote: >>>> On 11/26/24 13:18, Ryan Roberts wrote: >>>>> On 14/11/2024 10:09, Vlastimil Babka wrote: >>>>>> On 11/1/24 21:16, Dave Kleikamp wrote: >>>>>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE >>>>>>> is no longer optimized out with a constant size, so a build bug may >>>>>>> occur on a path that won't be reached. >>>>>> >>>>>> That's rather unfortunate, the __builtin_constant_p(size) part of >>>>>> kmalloc_noprof() really expects things to resolve at compile time and it >>>>>> would be better to keep it that way. >>>>>> >>>>>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on >>>>>> PAGE_SHIFT_MAX and kept it constant, instead of introducing >>>>>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks. >>>>>> >>>>>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would >>>>>> still create and use kmalloc caches up to 128k. SLUB should handle that fine >>>>>> (if not, please report it :) >>>>> >>>>> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K >>>>> whereas before it only supported up to 8K. I was trying to avoid that since I >>>>> assumed that would be costly in terms of extra memory allocated for those higher >>>>> order buckets that will never be used. But I have no idea how SLUB works in >>>>> practice. Perhaps memory for the cache is only lazily allocated so we won't see >>>>> an issue in practice? >>>> >>>> Yes the e.g. 128k slabs themselves will be lazily allocated. There will be >>>> some overhead with the management structures (struct kmem_cache etc) but >>>> much smaller. >>>> To be completely honest, some extra overhead might come to be when the slabs >>>> are allocated ans later the user frees those allocations. kmalloc_large() >>>> wwould return them immediately, while a regular kmem_cache will keep one or >>>> more per cpu for reuse. But if that becomes a visible problem we can tune >>>> those caches to discard slabs more aggressively. >>> >>> Sorry to keep pushing on this, now that I've actually looked at the code, I feel >>> I have a slightly better understanding: >>> >>> void *kmalloc_noprof(size_t size, gfp_t flags) >>> { >>> if (__builtin_constant_p(size) && size) { >>> >>> if (size > KMALLOC_MAX_CACHE_SIZE) >>> return __kmalloc_large_noprof(size, flags); <<< (1) >>> >>> index = kmalloc_index(size); >>> return __kmalloc_cache_noprof(...); <<< (2) >>> } >>> return __kmalloc_noprof(size, flags); <<< (3) >>> } >>> >>> So if size and KMALLOC_MAX_CACHE_SIZE are constant, we end up with this >>> resolving either to a call to (1) or (2), decided at compile time. If >>> KMALLOC_MAX_CACHE_SIZE is not constant, (1), (2) and the runtime conditional >>> need to be kept in the function. >>> >>> But intuatively, I would have guessed that given the choice between the overhead >>> of keeping that runtime conditional vs keeping per-cpu slab caches for extra >>> sizes between 16K and 128K, then the runtime conditional would be preferable. I >>> would guess that quite a bit of memory could get tied up in those caches? >>> >>> Why is your preference the opposite? What am I not understanding? >> >> +CC more slab people. >> >> So the above is an inline function, but constructed in a way that it should, >> without further inline code, become >> - a call to __kmalloc_large_noprof() for build-time constant size larger >> than KMALLOC_MAX_CACHE_SIZE >> - a call to __kmalloc_cache_noprof() for build-time constant size smaller >> than KMALLOC_MAX_CACHE_SIZE, where the cache is picked from an array with >> compile-time calculated index >> - call to __kmalloc_noprof() for non-constant sizes otherwise >> >> If KMALLOC_MAX_CACHE_SIZE stops being build-time constant, the sensible way >> to handle it would be to #ifdef or otherwise compile out away the whole "if >> __builtin_constant_p(size)" part and just call __kmalloc_noprof() always, so >> we don't blow the inline paths with a KMALLOC_MAX_CACHE_SIZE check leading >> to choice between calling __kmalloc_large_noprof() or __kmalloc_cache_noprof(). > > Or maybe we could have PAGE_SIZE_MAX derived KMALLOC_MAX_CACHE_SIZE_MAX > behave as the code above currently does with KMALLOC_MAX_CACHE_SIZE, and > additionally have PAGE_SIZE_MIN derived KMALLOC_MAX_CACHE_SIZE_MIN, where > build-time-constant size larger than KMALLOC_MAX_CACHE_SIZE_MIN (which is a > compile-time test) is redirected to __kmalloc_noprof() for a run-time test. > > That seems like the optimum solution :) Yes; that feels like the better approach to me. I'll implement this by default unless anyone else objects. > >> I just don't believe we would waste so much memory with caches the extra >> sizes for sizes between 16K and 128K, so would do that suggestion only if >> proven wrong. But I wouldn't mind it that much if you chose it right away. >> The solution earlier in this thread to patch __kmalloc_index() would be >> worse than either of those two alternatives though. > >