From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E61DC369D9 for ; Wed, 30 Apr 2025 22:24:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7BB516B00AF; Wed, 30 Apr 2025 18:24:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 742AF6B00B0; Wed, 30 Apr 2025 18:24:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E2AB6B00B6; Wed, 30 Apr 2025 18:24:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3C6ED6B00AF for ; Wed, 30 Apr 2025 18:24:27 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6B405120F78 for ; Wed, 30 Apr 2025 22:24:27 +0000 (UTC) X-FDA: 83392140174.14.015EAC8 Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by imf15.hostedemail.com (Postfix) with ESMTP id 5AF71A0009 for ; Wed, 30 Apr 2025 22:24:25 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=sXG1FT7Z; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of jannh@google.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746051865; a=rsa-sha256; cv=none; b=1WIMx4QhZgq2ZCkisIddtl6kgLaeHapLCDobCQ+KLl7NE1CFaKHNbJJ3s74HwwkIBzNlDt UJ0CNIGpOvuk01EIe31UHFBzvF1YJ5LPyiKpC4LHAxmCBIppR0AQveDJsRP2axzN8tCBAe IJJX2C43YihTwt8Ko8dzNWYR8Spndjc= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=sXG1FT7Z; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of jannh@google.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746051865; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XUNiapQ3xbVcqbsb25qLtByac0s7+WQgBwW6DMvVUyk=; b=5o+WsbNSeN7gN/SGS0wH6t/UtFHcJiyY51Q2Wlv66j2Pm2Dj5A2bR6yJ7NMGvfM0wlSefY +kcNj7UWqv/3NwzWhUnM6Z3aOhdoxUJtXa08n0XmwDbDrwlnHZsXLPznmVZUOSAc8paK+N 2YmblK8qvZ2J8cfyITCYdx9m4pt4yaE= Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-5f632bada3bso1361a12.1 for ; Wed, 30 Apr 2025 15:24:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1746051863; x=1746656663; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=XUNiapQ3xbVcqbsb25qLtByac0s7+WQgBwW6DMvVUyk=; b=sXG1FT7ZLzD3jATmFiC1FRRnaW9NryZM/xmoKlqzwBDPfUTq1LXPgeKPXDZcU6ZaH/ 1v028vvzDkz4om9ZhY1Stk8h0XxcNOlvqRpMCHjLi9Z8RbVsdEPAL23yljvCAfTpixVh NZiG4noE4i4FqrvWW3ez29P0mCruUM+j5xVFfYN27zwogEZT+YoBkcrMY69qu/IeFj1d dVu3GGg+NIKdcgaOBtT6scpA3oSHEozQHVbbqw540L0zRNi0wFm06/Fbq3hAn9radfxw ES194JB2ZIzVXrbqYOeaW2Uu58+8Z8lIeaokw3f4y7xOprFStat5vtrRWfEBjuP8mirz 63Mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746051864; x=1746656664; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XUNiapQ3xbVcqbsb25qLtByac0s7+WQgBwW6DMvVUyk=; b=PJ6jBW9TmX6bBvBhKciZ+6Aw0IgX6YysYUmsnGpDd/hCkOsgrvuYkcnmbh1WcFLO0k /tqueK3YLcp3BxZJI1Dsn0coSBfmlwKED4cPVk9l2noaE9pnzEbI2AcjKKHmrmloJ/v1 7uj1GWqBu6bKfjxsTq0WnrkYF7W9jWcmojpv0B0PDbu9PvmdBGLb3sjkfnTp2OI95dJv pHxMaBymyUBR/8YP352Qj+daYtFQ6yQv+Ex/eUhTJzTm1WZzUonlN8NRKZMD10VdtsWu MXBTWcZLoHsA0XkPbN8qJrnZWhAVOkRejwa/Pki+QL2Z7a819BVDw0YZ3RyRgEyX10Oq GjpQ== X-Forwarded-Encrypted: i=1; AJvYcCVDb3u9uJuOtj6ZSrUgmKEc5ESh2BJgG7SzmD58XEkwTysSiVGq1H9P1uhoQ/shpucsJKofPRE8Zg==@kvack.org X-Gm-Message-State: AOJu0Yw/2NmhzKJvQ5VRxjSTSKQLsEF+gQf/wS7KTilYxXWd0b0vX/Gs 6hS1ZWO3MrKSiu721BXPkuPrHUHAwzETrdQb208DV3jR4kMuZb/RfzkCc3LTho2yVByCyAYr/+m wmJ+0ozBJ7bQK656+D3eTA915HHyYWVif+baI X-Gm-Gg: ASbGnctJ48u39AUgLCOHs5008kGIAffIu1/+IuKe3saOrYgHS3KWntlvQ0JLkln6fke m+bBz1y+igLObmJjeDvSAeh0P1OVmRliv09d016TqzIuK/HxWy8QajSzBZe6D5mNzis8aDmLjaV qrzgqooNz4skcp78Paq6WezXJBhFmL1f6QvD23ZsaK8V/dzvP7fG4= X-Google-Smtp-Source: AGHT+IFVcSZqF3LSwjizhWIDIAGbMvuwlg83EEH88/bio2Bjc5ERIX+1GSG2WXLcYBnNvvC4Sn5lwOVMlFwInA3a9MA= X-Received: by 2002:aa7:cc12:0:b0:5e4:9ee2:afe1 with SMTP id 4fb4d7f45d1cf-5f9132eb267mr41793a12.2.1746051863339; Wed, 30 Apr 2025 15:24:23 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jann Horn Date: Thu, 1 May 2025 00:23:47 +0200 X-Gm-Features: ATxdqUGBf-5tMzw_PZpBetjJLb-AbditMFVlki6zbVxQVlraj4l0hnwwuUfe3hc Message-ID: Subject: Re: [DISCUSSION] Revisiting Slab Movable Objects To: Harry Yoo Cc: Dave Chinner , Christoph Lameter , David Rientjes , Andrew Morton , Roman Gushchin , "Tobin C. Harding" , Alexander Viro , Matthew Wilcox , Vlastimil Babka , Rik van Riel , Andrea Arcangeli , "Liam R. Howlett" , Lorenzo Stoakes , Pedro Falcato , David Hildenbrand , Oscar Salvador , Michal Hocko , Byungchul Park , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 5AF71A0009 X-Stat-Signature: a6duyq8ury8h6go87odhs6rudun8wdty X-HE-Tag: 1746051865-877017 X-HE-Meta: U2FsdGVkX19sM5j5KCCSoGtnmPpAoPWV2WNI21x1ieS5AEmMDgBMS4VyfG32OIwUbJfRZ8DIrllXp2Sr0mdktRplZJ0b5AAIo31DwDhlVPpqE4pZ0JUxtXaF/wuf9x2xVFwn1YbYObtgZ9OFAiMgUYvYC3LMbmh126Bk/SpDj6OVjXrQJZRsfake4JMeIoDpeDgiwZATWkTBGgNFkyOoHHgS8ErbgnOkFLbwfYz0lLd0VtKF3gjmow4s9QdFls9K+xPYx5qaRcXcN48YAfmMJiZ/N+6gxLzhB/Hsexjf3i0X+h1HeaMJdhY4azd6c8JLjzEcIA/I/UgxxF8KLe58PxM25Ch+LYR9OajBTLhsn1vrfMlWvabQZS9aasPcl0JfWMQ8rIk9iQj+/t6ZVmpxtlHrigMPXuq8sgKMmBAT/s7sSi2w9p9rBTYIPsYWdUFBHmTlo0YOyg2LfmvmTb6bIKhacIsEiJHcQfsE5FgYHcOR/svv588/Os/jD6r+sXs2lb3/lyQ9p8M/sqbFJzVcTca9SPDryrJljqUoYXZlnNboj6MfF9X3wn61c0WMXz34LY8UsrjvavF27E63xjP4cFiV3P4WUdzpVZDj0EmtKyT2rcd05nId2AlChO68pc3uJIeXEPiPJnzy7sgPSKhsmoy5FlOredHlR+Jyui5Y31bbNdPhvWIllXPQLwNpzhgOAf6/vDO8e8D6pZKsY0OD2j1K9J0uchX18cfTKDeCGXq6kzjBjVcyDSwP/CXqFGsGkVr0Xe7DXdtdYpaWvArBiqmkPZSOfAFuebORbCwJLACveh5Z/C8lvV6ItBWscVGKkEsPRT0dXygw5nR6fM5RrbrT4Tqh7RTPwypwb5Ji6LNpSAcY4M+0euXDgu0/k02b/4wNFq7qlkCIJhqarlVQ0mVL7f0wgH/UfeizUSHyXwTX/xL6h5yxGTVr5fXezanhQrCgxrj9Gkb7V7eCVCw frDI3AC8 6r9oVGD3vvJDNfcTnYquq5MNvg7fYp5XPCU6c7CTokRPssx3qsCXAKeLwI1Abwpzicy4mvzPLXkdyeOreRHo51o6l5s7WiCtm/oPBH3NtFOVRnR6SfGaOjRQ/riCOxVGZOS2X1idymGrAV1OPa62opxDDMFjcZ7PVo3Uhz+w1yH/7PO5HWWuF4x1Aw4z+5Fyhtml//rOs5CisezfbN8B3Ix1KzIBDwoID8+IiFTCkgnEBympijHripeME1Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Apr 30, 2025 at 3:11=E2=80=AFPM Harry Yoo wr= ote: > On Mon, Apr 28, 2025 at 05:31:35PM +0200, Jann Horn wrote: > > On Fri, Apr 25, 2025 at 1:09=E2=80=AFPM Harry Yoo wrote: > > > On Tue, Apr 22, 2025 at 07:54:08AM +1000, Dave Chinner wrote: > > > > On Mon, Apr 21, 2025 at 10:47:39PM +0900, Harry Yoo wrote: > > > > > Hi folks, > > > > > > > > > > As a long term project, I'm starting to look into resurrecting > > > > > Slab Movable Objects. The goal is to make certain types of slab m= emory > > > > > movable and thus enable targeted reclamation, migration, and > > > > > defragmentation. > > > > > > > > > > The main purpose of this posting is to briefly review what's been= tried > > > > > in the past, ask people why prior efforts have stalled (due to la= ck of > > > > > time or insufficient justification for additional complexity?), > > > > > and discuss what's feasible today. > > > > > > > > > > Please add anyone I may have missed to Cc. :) > > > > > > > > Adding -fsdevel because dentry/inode cache discussion needs to be > > > > visible to all the fs/VFS developers. > > > > > > > > I'm going to cut straight to the chase here, but I'll leave the res= t > > > > of the original email quoted below for -fsdevel readers. > > > > > > > > > Previous Work on Slab Movable Objects > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > > > > > > > > > > > Without including any sort of viable proposal for dentry/inode > > > > relocation (i.e. the showstopper for past attempts), what is the > > > > point of trying to ressurect this? > > > > > > Migrating slabs still makes sense for other objects such as xarray / = maple > > > tree nodes, and VMAs. > > > > Do we have examples of how much memory is actually wasted on > > sparsely-used slabs, and which slabs this happens in, from some real > > workloads? > > Workloads that uses a large amount of reclaimable slab memory (inode, > dentry, etc.) and triggers reclamation can observe this problem. > > On my laptop, I can reproduce the problem by running 'updatedb' command > that touches many files and triggering reclamation by running programs > that consume large amount of memory. As slab memory is reclaimed, it beco= mes > sparsely populated (as slab memory is not reclaimed folio by folio) > > During reclamation, the total slab memory utilization drops from 95% to 5= 0%. > For very sparsely populated caches, the cache utilization is between > 12% and 33%. (ext4_inode_cache, radix_tree_node, dentry, trace_event_file= , > and some kmalloc caches on my machine). > > At the time OOM-killer is invoked, there's about 50% slab memory wasted > due to sparsely populated slabs, which is about 236 MiB on my laptop. > I would say it's a sufficiently big problem to solve. > > I wonder how worse this problem would be on large file servers, > but I don't run such servers :-) > > > If sparsely-used slabs are a sufficiently big problem, maybe another > > big hammer we have is to use smaller slab pages, or something along > > those lines? Though of course a straightforward implementation of that > > would probably have negative effects on the performance of SLUB > > fastpaths, and depending on object size it might waste more memory on > > padding. > > So it'll be something like prefering low orders when in calculate_order() > while keeping fractional waste reasonably. > > One problem could be making n->list_lock contention much worse > on larger machines as you need to grab more slabs from the list? Maybe. I imagine using batched operations could help, such that the amount of managed memory that is transferred per locking operation stays the same... > > (An adventurous idea would be to try to align kmem_cache::size such > > that objects start at some subpage boundaries of SLUB folios, and then > > figure out a way to shatter SLUB folios into smaller folios at runtime > > while they contain objects... but getting the SLUB locking right for > > that without slowing down the fastpath for freeing an object would > > probably be a large pain.) > > You can't make virt_to_slab() work if you shatter a slab folio > into smaller ones? Yeah, I think that would be hard. We could maybe avoid the virt_to_slab() on the active-slab fastpath, and maybe there is some kind of RCU-transition scheme that could be used on the path for non-active slabs (a bit similarly to how percpu refcounts transition to atomic mode, with a transition period where objects are allowed to still go on the freelist of the former head page)... > A more general question: will either shattering or allocating > smaller slabs help free more memory anyway? It likely depends on > the spatial pattern of how the objects are reclaimed and remain > populated within a slab? Probably, yeah. As a crude thought experiment, if you (somewhat pessimistically?) assume that the spatial pattern is "we first allocate a lot of objects, then for each object we roll a random number and free it with a 90% probability", and you have something like a kmalloc-512 slab (normal order 2, which fits 32 objects), then the probability that an entire order-2 page will be empty would be pow(0.9, 32) ~=3D 3.4% while the probability that an individual order-0 page is empty would be pow(0.9, 8) ~=3D 43% There could be patterns that are worse, like "we preserve exactly every fourth object"; though SLUB's freelist randomization (if CONFIG_SLAB_FREELIST_RANDOM is enabled) would probably transform that into a different pattern, so that it's not actually a sequential pattern where every fourth object is allocated. In case you want to do more detailed experiments with this: FYI, I have a branch "slub-binary-snapshot" at https://github.com/thejh/linux with a draft patch that provides a debugfs API for getting a binary dump of SLUB allocations (I wrote that patch for another project): https://github.com/thejh/linux/commit/685944dc69fd21e92bf110713b491d5c05032= 8af - maybe with some changes that would be useful for analyzing SLUB fragmentation from userspace. But IDK if that's a good way to experiment with this, or if it'd be easier to directly analyze fragmentation in debugfs code in SLUB.