From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 523FBC4332F for ; Tue, 15 Nov 2022 04:28:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B5E066B0071; Mon, 14 Nov 2022 23:28:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B0DC96B0072; Mon, 14 Nov 2022 23:28:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 960FE8E0001; Mon, 14 Nov 2022 23:28:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7F5A56B0071 for ; Mon, 14 Nov 2022 23:28:31 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 4F17C120BA5 for ; Tue, 15 Nov 2022 04:28:31 +0000 (UTC) X-FDA: 80134395222.02.86D64F3 Received: from esa4.hgst.iphmx.com (esa4.hgst.iphmx.com [216.71.154.42]) by imf01.hostedemail.com (Postfix) with ESMTP id 7EBF640006 for ; Tue, 15 Nov 2022 04:28:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1668486508; x=1700022508; h=message-id:date:mime-version:subject:from:to:cc: references:in-reply-to:content-transfer-encoding; bh=IlgyKi8wGWHux7YMaFUP7S/2i9Ze/98VZ3oQw75PGWk=; b=GUTAuqt10ArNcTC5cHcYHXu4FD6J9ZCRHpG4kejCeFZ1oHLsXBpbHSjx qwhjElOTalgNp6sKw85Lz7Ubb138ZLZBksjzudeiRYqEfb2FTUJGv0csk 1ZzisqZp3mJGOo217o2rkO+S66uHTHoiuP/63u6JPwjCJowp8EdCutgVt G+i//onjIciBt7sMrjgjsYEeT+bjring86N6Jqvo0FPLR2H7idABNHaSj glyjUKHBVhcx5g94q6lc52vJ8U18gpyzk+PKQno3uuNS6JcP6GvM4ye53 N3Yb01qr/HKjtEe0V7roThO2HmmJMLTZoAlzTvTc+ABXeYwHUn8gUS7FB Q==; X-IronPort-AV: E=Sophos;i="5.96,164,1665417600"; d="scan'208";a="214550834" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 15 Nov 2022 12:28:26 +0800 IronPort-SDR: bOoAmDbH0IzUHQ3B2hE/O3NvbfjruYZQHveHhfDFgRSle6hJu9FlrrO476sUWfLrfAxwreLj34 8JM6arX1H9Mk5ngW5AAoXXkn8k7U/mF8TifHBO0qsYIU2LkmgQ+9S9/XoCSJz/NCzDRfqYvHM1 xybLwa6zzvwtzxKBu1ZlxQoePBc4vfxMdeQEhkuhTu2gqqmusnBL+mgsQKw8KB1vE4hV46OLrq OSF7ebZMq+anOlDIJFt9TeBrNXh2/ZnkUnBpEQrNFm5ATNBI27EsG8sFEQW2Gz0J88Fzymaw+j JZ4= Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 14 Nov 2022 19:41:40 -0800 IronPort-SDR: NoC6rqpPXqGgX8Vo53uLuTd7cllSdfxIZdHjcNxQsLY2lbqHHkamifxPT3qw271fLJu9vpMTDn iw7lXXcgu7Wbcl3RzrZOokqp2+cYkSlBhROXXZw261dr4Y56VnmN2wiBDq28BrP3C7k6rFFAUK Srtonre4fqHSLbnLrApXPCjJxoiutcPXq0UNEhyFqxPW5iVq4m629nqEsl6YnRzU4G4VIEwp5b rvBVXsqO3pKVGIZAWPmfaIC3ghCKoMgV5wWCBQ1m9t18m6+fCOa3o/dom1C2T+YLJZ/L+UV3AZ zqk= WDCIronportException: Internal Received: from usg-ed-osssrv.wdc.com ([10.3.10.180]) by uls-op-cesaip02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 14 Nov 2022 20:28:27 -0800 Received: from usg-ed-osssrv.wdc.com (usg-ed-osssrv.wdc.com [127.0.0.1]) by usg-ed-osssrv.wdc.com (Postfix) with ESMTP id 4NBCpy5cMVz1RwqL for ; Mon, 14 Nov 2022 20:28:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d= opensource.wdc.com; h=content-transfer-encoding:content-type :in-reply-to:organization:references:to:from:content-language :subject:user-agent:mime-version:date:message-id; s=dkim; t= 1668486503; x=1671078504; bh=IlgyKi8wGWHux7YMaFUP7S/2i9Ze/98VZ3o Qw75PGWk=; b=pgIVY/w6EcZiTjrBumyWdyc+HVjOOTSunBckbeWb9LccnOntoaw l+IV3cZbZmM9O7IE+CmOXU1LQsGrGFCf4lG5lYwei/SvC1d1+ETqxhdxhddGdlRn f/8/tJC7eooUbnN6uJS5SYHJ/mZ5UAeKNyH3j8k1IjhZPS6h3EJuUA1UaeAzGN4E FWqVWGN+534ohp47xuEUSHRwuC3GcOqH9HTMZB8nINU+QJM7KLZ3KSndB6jHMM50 O0xCUb5uDhePNn9WQ0HZzsksy56uxVXHZGuceYwvmvruuNH+uUfIfel5pFt8nA86 gt7jGCbCmLhaWppa2t4MTemj4RwSAHLSD3Q== X-Virus-Scanned: amavisd-new at usg-ed-osssrv.wdc.com Received: from usg-ed-osssrv.wdc.com ([127.0.0.1]) by usg-ed-osssrv.wdc.com (usg-ed-osssrv.wdc.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id FvQpq1PdOGRl for ; Mon, 14 Nov 2022 20:28:23 -0800 (PST) Received: from [10.225.163.46] (unknown [10.225.163.46]) by usg-ed-osssrv.wdc.com (Postfix) with ESMTPSA id 4NBCpm2J9Bz1RvLy; Mon, 14 Nov 2022 20:28:16 -0800 (PST) Message-ID: <0e45a2f2-6dd5-5a43-c1a0-7520c1ed2675@opensource.wdc.com> Date: Tue, 15 Nov 2022 13:28:14 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.1 Subject: Re: Deprecating and removing SLOB Content-Language: en-US From: Damien Le Moal To: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Vlastimil Babka , Conor Dooley , Pasha Tatashin , Christoph Lameter , David Rientjes , Joonsoo Kim , Pekka Enberg , Matthew Wilcox , Roman Gushchin , Linus Torvalds , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Catalin Marinas , Rustam Kovhaev , Andrew Morton , Josh Triplett , Arnd Bergmann , Russell King , Alexander Shiyan , Aaro Koskinen , Janusz Krzysztofik , Tony Lindgren , Yoshinori Sato , Rich Felker , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "linux-arm-kernel@lists.infradead.org" , openrisc@lists.librecores.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, Geert Uytterhoeven , Conor.Dooley@microchip.com, Paul Cercueil References: <93079aba-362e-5d1e-e9b4-dfe3a84da750@opensource.wdc.com> <44da078c-b630-a249-bf50-67df83cd8347@suse.cz> <35650fd4-3152-56db-7c27-b9997e31cfc7@opensource.wdc.com> <97c0735c-3127-83d5-30ff-8e57c6634f6e@opensource.wdc.com> Organization: Western Digital Research In-Reply-To: <97c0735c-3127-83d5-30ff-8e57c6634f6e@opensource.wdc.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668486508; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=v+S+yR7IIuiE+IADm03+CAErJ5POoJXmEoKefVc3u64=; b=2ZIIWhUXIImQhqutG8x186gg1QkTmVLEIYh/OzleZ6wXeuTMweB+HZHBHXt85EQk5dPe63 3/TXIq1ymf9PmXDakQZNt1Z210KC/CMYl5O4GxrUr7Y/uWeRnAwQWbiLzbe+YfL1RTYNc4 e5PauMK+MihhQd7GJf5AFdgAAAZr5YE= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none ("invalid DKIM record") header.d=wdc.com header.s=dkim.wdc.com header.b=GUTAuqt1; dkim=pass header.d=opensource.wdc.com header.s=dkim header.b="pgIVY/w6"; spf=pass (imf01.hostedemail.com: domain of "prvs=31108a13e=damien.lemoal@opensource.wdc.com" designates 216.71.154.42 as permitted sender) smtp.mailfrom="prvs=31108a13e=damien.lemoal@opensource.wdc.com"; dmarc=pass (policy=quarantine) header.from=opensource.wdc.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668486508; a=rsa-sha256; cv=none; b=uLgz9Lq+MZSQNm/N/SXvYT0QmplrvxqV+pk+/FA1w94gk0LV3nTRQ448yjoWBogoVa6NvA gI3qvvGovUTMFdHwR4Ew9CWziAuVKILHA61BeTgqljKB72uVBB2z/KFjwYr8y6wt39AXWw hSpRj64tarbSJhERtz8Tjt1EEbSIczM= X-Stat-Signature: gacb8m41ax4grabgf3tk1zbr5g9zne9n X-Rspamd-Queue-Id: 7EBF640006 Authentication-Results: imf01.hostedemail.com; dkim=none ("invalid DKIM record") header.d=wdc.com header.s=dkim.wdc.com header.b=GUTAuqt1; dkim=pass header.d=opensource.wdc.com header.s=dkim header.b="pgIVY/w6"; spf=pass (imf01.hostedemail.com: domain of "prvs=31108a13e=damien.lemoal@opensource.wdc.com" designates 216.71.154.42 as permitted sender) smtp.mailfrom="prvs=31108a13e=damien.lemoal@opensource.wdc.com"; dmarc=pass (policy=quarantine) header.from=opensource.wdc.com X-Rspamd-Server: rspam07 X-Rspam-User: X-HE-Tag: 1668486508-243910 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 11/15/22 13:24, Damien Le Moal wrote: > On 11/14/22 23:47, Hyeonggon Yoo wrote: >> On Mon, Nov 14, 2022 at 08:35:31PM +0900, Damien Le Moal wrote: >>> On 11/14/22 18:36, Vlastimil Babka wrote: >>>> On 11/14/22 06:48, Damien Le Moal wrote: >>>>> On 11/14/22 10:55, Damien Le Moal wrote: >>>>>> On 11/12/22 05:46, Conor Dooley wrote: >>>>>>> On Fri, Nov 11, 2022 at 11:33:30AM +0100, Vlastimil Babka wrote: >>>>>>>> On 11/8/22 22:44, Pasha Tatashin wrote: >>>>>>>>> On Tue, Nov 8, 2022 at 10:55 AM Vlastimil Babka wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> as we all know, we currently have three slab allocators. As we discussed >>>>>>>>>> at LPC [1], it is my hope that one of these allocators has a future, and >>>>>>>>>> two of them do not. >>>>>>>>>> >>>>>>>>>> The unsurprising reasons include code maintenance burden, other features >>>>>>>>>> compatible with only a subset of allocators (or more effort spent on the >>>>>>>>>> features), blocking API improvements (more on that below), and my >>>>>>>>>> inability to pronounce SLAB and SLUB in a properly distinguishable way, >>>>>>>>>> without resorting to spelling out the letters. >>>>>>>>>> >>>>>>>>>> I think (but may be proven wrong) that SLOB is the easier target of the >>>>>>>>>> two to be removed, so I'd like to focus on it first. >>>>>>>>>> >>>>>>>>>> I believe SLOB can be removed because: >>>>>>>>>> >>>>>>>>>> - AFAIK nobody really uses it? It strives for minimal memory footprint >>>>>>>>>> by putting all objects together, which has its CPU performance costs >>>>>>>>>> (locking, lack of percpu caching, searching for free space...). I'm not >>>>>>>>>> aware of any "tiny linux" deployment that opts for this. For example, >>>>>>>>>> OpenWRT seems to use SLUB and the devices these days have e.g. 128MB >>>>>>>>>> RAM, not up to 16 MB anymore. I've heard anecdotes that the performance >>>>>>>>>> SLOB impact is too much for those who tried. Googling for >>>>>>>>>> "CONFIG_SLOB=y" yielded nothing useful. >>>>>>>>> >>>>>>>>> I am all for removing SLOB. >>>>>>>>> >>>>>>>>> There are some devices with configs where SLOB is enabled by default. >>>>>>>>> Perhaps, the owners/maintainers of those devices/configs should be >>>>>>>>> included into this thread: >>>>>>>>> >>>>>>>>> tatashin@soleen:~/x/linux$ git grep SLOB=y >>>>>>> >>>>>>>>> arch/riscv/configs/nommu_k210_defconfig:CONFIG_SLOB=y >>>>>>>>> arch/riscv/configs/nommu_k210_sdcard_defconfig:CONFIG_SLOB=y >>>>>>>>> arch/riscv/configs/nommu_virt_defconfig:CONFIG_SLOB=y >>>>>>> >>>>>>>> >>>>>>>> Turns out that since SLOB depends on EXPERT, many of those lack it so >>>>>>>> running make defconfig ends up with SLUB anyway, unless I miss something. >>>>>>>> Only a subset has both SLOB and EXPERT: >>>>>>>> >>>>>>>>> git grep CONFIG_EXPERT `git grep -l "CONFIG_SLOB=y"` >>>>>>> >>>>>>>> arch/riscv/configs/nommu_virt_defconfig:CONFIG_EXPERT=y >>>>>>> >>>>>>> I suppose there's not really a concern with the virt defconfig, but I >>>>>>> did check the output of `make nommu_k210_defconfig" and despite not >>>>>>> having expert it seems to end up CONFIG_SLOB=y in the generated .config. >>>>>>> >>>>>>> I do have a board with a k210 so I checked with s/SLOB/SLUB and it still >>>>>>> boots etc, but I have no workloads or w/e to run on it. >>>>>> >>>>>> I sent a patch to change the k210 defconfig to using SLUB. However... >>>> >>>> Thanks! >>>> >>>>>> The current default config using SLOB gives about 630 free memory pages >>>>>> after boot (cat /proc/vmstat). Switching to SLUB, this is down to about >>>>>> 400 free memory pages (CONFIG_SLUB_CPU_PARTIAL is off). >>>> >>>> Thanks for the testing! How much RAM does the system have btw? I found 8MB >>>> somewhere, is that correct? >>> >>> Yep, 8MB, that's it. >>> >>>> So 230 pages that's a ~920 kB difference. Last time we saw less dramatic >>>> difference [1]. But that was looking at Slab pages, not free pages. The >>>> extra overhead could be also in percpu allocations, code etc. >>>> >>>>>> This is with a buildroot kernel 5.19 build including a shell and sd-card >>>>>> boot. With SLUB, I get clean boots and a shell prompt as expected. But I >>>>>> definitely see more errors with shell commands failing due to allocation >>>>>> failures for the shell process fork. So as far as the K210 is concerned, >>>>>> switching to SLUB is not ideal. >>>>>> >>>>>> I would not want to hold on kernel mm improvements because of this toy >>>>>> k210 though, so I am not going to prevent SLOB deprecation. I just wish >>>>>> SLUB itself used less memory :) >>>>> >>>>> Did further tests with kernel 6.0.1: >>>>> * SLOB: 630 free pages after boot, shell working (occasional shell fork >>>>> failure happen though) >>>>> * SLAB: getting memory allocation for order 7 failures on boot already >>>>> (init process). Shell barely working (high frequency of shell command fork >>>>> failures) >>> >>> I forgot to add here that the system was down to about 500 free pages >>> after boot (again from the shell with "cat /proc/vmstat"). >>> >>>>> * SLUB: getting memory allocation for order 7 failures on boot. I do get a >>>>> shell prompt but cannot run any shell command that involves forking a new >>>>> process. >>> >>> For both slab and slub, I had cpu partial off, debug off and slab merge >>> on, as I suspected that would lead to less memory overhead. >>> I suspected memory fragmentation may be an issue but doing >>> >>> echo 3 > /proc/sys/vm/drop_caches >>> >>> before trying a shell command did not help much at all (it usually does on >>> that board with SLOB). Note that this is all with buildroot, so this echo >>> & redirect always works as it does not cause a shell fork. >>> >>>>> >>>>> So if we want to keep the k210 support functional with a shell, we need >>>>> slob. If we reduce that board support to only one application started as >>>>> the init process, then I guess anything is OK. >>>> >>>> In [1] it was possible to save some more memory with more tuning. Some of >>>> that required boot parameters and other code changes. In another reply [2] I >>>> considered adding something like SLUB_TINY to take care of all that, so >>>> looks like it would make sense to proceed with that. >>> >>> If you want me to test something, let me know. >> >> Would you try this please? >> >> diff --git a/mm/slub.c b/mm/slub.c >> index a24b71041b26..1c36c4b9aaa0 100644 >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -4367,9 +4367,7 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags) >> * The larger the object size is, the more slabs we want on the partial >> * list to avoid pounding the page allocator excessively. >> */ >> - s->min_partial = min_t(unsigned long, MAX_PARTIAL, ilog2(s->size) / 2); >> - s->min_partial = max_t(unsigned long, MIN_PARTIAL, s->min_partial); >> - >> + s->min_partial = 0; >> set_cpu_partial(s); >> >> #ifdef CONFIG_NUMA >> >> >> and booting with and without boot parameter slub_max_order=0? > > Test notes: I used Linus 6.1-rc5 as the base. That is the only thing I > changed in buildroot default config for the sipeed maix bit card, booting > with SD card. The test is: booting and run "cat /proc/vmstat" and register > the nr_free_pages value. I repeated the boot + cat 3 to 4 times for each case. > > Here are the results: > > 6.1-rc5, SLOB: > - 623 free pages > - 629 free pages > - 629 free pages > 6.1-rc5, SLUB: > - 448 free pages > - 448 free pages > - 429 free pages > 6.1-rc5, SLUB + slub_max_order=0: > - Init error, shell prompt but no shell command working > - Init error, no shell prompt > - 508 free pages > - Init error, shell prompt but no shell command working > 6.1-rc5, SLUB + patch: > - Init error, shell prompt but no shell command working > - 433 free pages > - 448 free pages > - 423 free pages > 6.1-rc5, SLUB + slub_max_order=0 + patch: > - Init error, no shell prompt > - Init error, shell prompt, 499 free pages > - Init error, shell prompt but no shell command working > - Init error, no shell prompt > > No changes for SLOB results, expected. > > For default SLUB, I did get all clean boots this time and could run the > cat command. But I do see shell fork failures if I keep running commands. > > For SLUB + slub_max_order=0, I only got one clean boot with 508 free > pages. Remaining runs failed to give a shell prompt or allow running cat > command. For the clean boot, I do see higher number of free pages. > > SLUB with the patch was nearly identical to SLUB without the patch. > > And SLUB+patch+slub_max_order=0 gave again a lot of errors/bad boot. I > could run the cat command only once, giving 499 free pages, so better than > regular SLUB. But it seems that the memory is more fragmented as > allocations fail more often. Note about the last case (SLUB+patch+slub_max_order=0). Here are the messages I got when the init shell process fork failed: [ 1.217998] nommu: Allocation of length 491520 from process 1 (sh) failed [ 1.224098] active_anon:0 inactive_anon:0 isolated_anon:0 [ 1.224098] active_file:5 inactive_file:12 isolated_file:0 [ 1.224098] unevictable:0 dirty:0 writeback:0 [ 1.224098] slab_reclaimable:38 slab_unreclaimable:459 [ 1.224098] mapped:0 shmem:0 pagetables:0 [ 1.224098] sec_pagetables:0 bounce:0 [ 1.224098] kernel_misc_reclaimable:0 [ 1.224098] free:859 free_pcp:0 free_cma:0 [ 1.260419] Node 0 active_anon:0kB inactive_anon:0kB active_file:20kB inactive_file:48kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB writeback_tmp:0kB kernel_stack:576kB pagetables:0kB sec_pagetables:0kB all_unreclaimable? no [ 1.285147] DMA32 free:3436kB boost:0kB min:312kB low:388kB high:464kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:28kB unevictable:0kB writepending:0kB present:8192kB managed:6240kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 1.310654] lowmem_reserve[]: 0 0 0 [ 1.314089] DMA32: 17*4kB (U) 10*8kB (U) 7*16kB (U) 6*32kB (U) 11*64kB (U) 6*128kB (U) 6*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3460kB [ 1.326883] 33 total pagecache pages [ 1.330420] binfmt_flat: Unable to allocate RAM for process text/data, errno -12 [ 1.337858] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b -- Damien Le Moal Western Digital Research