From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A74FC021B8 for ; Mon, 24 Feb 2025 22:21:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 939F4280007; Mon, 24 Feb 2025 17:21:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8EA296B0089; Mon, 24 Feb 2025 17:21:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 763A4280007; Mon, 24 Feb 2025 17:21:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 4F9866B0088 for ; Mon, 24 Feb 2025 17:21:29 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B59A348A8F for ; Mon, 24 Feb 2025 22:21:28 +0000 (UTC) X-FDA: 83156260656.25.4DED8A3 Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by imf06.hostedemail.com (Postfix) with ESMTP id A31AA180006 for ; Mon, 24 Feb 2025 22:21:26 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IELp9pj8; spf=pass (imf06.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740435686; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vI39FAD0dL70SFwKILeTdhflWqEiFGZhyC8arPTEZok=; b=nQQx4geB2ME20CXb6bDIbvpdNpleqTAG0DnZPRKDYh0Y0qqsWJxymrkXx+aXv4yTYDFHrf 3mOSQec626W0GQfg6OmzifZ2B4NkwQZHFAg2T/OEHS9YUofZHyzKyxAG8szbiNx/IsfyEB meMC5ql7neJNcETNzYeTEj/bgaG5FlY= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IELp9pj8; spf=pass (imf06.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740435686; a=rsa-sha256; cv=none; b=HV5McRkozXuHPxAtJUXlahtceIoQuTWdxYckceTpXnVe92BEkMv7mmilngjQNy9b44N6dx vxy04H9JJlPRlzBMAtmp4s1YiyOs96vA4NKwB6agCeKFmNt6dyWQE79EmUTyPz31+7Eu4y 9Cbg94hc+IqRtlXbh/NAVtu8RNX++Gc= Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-abbae92be71so551506666b.2 for ; Mon, 24 Feb 2025 14:21:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740435685; x=1741040485; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=vI39FAD0dL70SFwKILeTdhflWqEiFGZhyC8arPTEZok=; b=IELp9pj8M7w9c/sexT2YIv00PhW7AabqihencyxWDE9jppc0fKvxIhFVy4PoKkIleQ fgv5c08NhfqOCFAYprkpThuNCs5ia296phOuLHSBZW0IWcJXosofdsHjbdJvM6mi7UDz T7/a/meNKHYG9V0OUkADLp7m7+2n/Gcaj17cH0SaE2pQexFisob98E76pJc5KB7Bi0Px pZaYrNXSq3fDwb/3kG1Kxal/B2Qh+5vEazCX9njIcfqGXhBzaMlXophjKZnJ+VgXHSPE wMSFBXovm540IGYSV185y7notRuJnPGKAwrMBYZqMvVusoBMiBlGaw5an5yFchtzltro 11MA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740435685; x=1741040485; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vI39FAD0dL70SFwKILeTdhflWqEiFGZhyC8arPTEZok=; b=Y810NFj9EfkKJnLAERXc/Miwr8+frd0Xvdcq25+t+GmH2P/+V5dPRGhM6nWa4/MF4i 9UentZobEDsh92QvvYZcEBk1jx+N0X16A/edxys80v/9LVk9wWB0xtqO4Ap61QjHcArx laHnUNT4vb6YHiCVghe4on0lQpLtBhS0JWcCB34txa7qgNx6AbtObchkER7n08BUD0Ho d56TZTx1gqFYa28I/Sheu354nFKHoYPeqM/awI6+dUA2PFabi4JZXg09OqY8yIV864Yq QRbkV28R0kSum9rLBYeaS+PpAK7OceKFv5+6uI3pCsjD9fNZkfSRjWFKF/SD9KMoZjgS dxzQ== X-Forwarded-Encrypted: i=1; AJvYcCXP1gGZvH50mOWN29Z+FEwutMPR8YJn10vclU53WjG1gOOFDlLIRJSb1dVuoZ5KCRy6CZKh2PVvyw==@kvack.org X-Gm-Message-State: AOJu0YxelhesDHVmxLL4L9EIEwLQ8p5+O2o3jBY+LyI3px/Txudxwwby KlwO+1zK3+nVrYeiTX7kTMfVgCD16yYvJHCFyNYwOxn3lyXJUBZEMTsQAPPEqQQbhWg7X1HXdiJ HHKpSiddLIxsB7RP8HKyeJGQ8zwM= X-Gm-Gg: ASbGncsDce5OiuXTAsbrTwvkARlC9A+6jhgypJ2PZ8HZDb0M5G/lvzalYS0NUzrgD31 9e2CDtXOOB+SZG2bAw0DqS6y1q2jeas2WSgg8CdCiOzX/BxChgLxyOrvGe9bGn56ytuz8tNl/Hq foGHQwm0c= X-Google-Smtp-Source: AGHT+IGE2HSWpwEBMdMWHwI6dgsE6ZM+mTH9zsdMwe83I0Py45r5K/RkkONeNDs0ldrG/XU+WPiYONmB1eVThqZr26A= X-Received: by 2002:a05:6402:13d1:b0:5d0:c697:1f02 with SMTP id 4fb4d7f45d1cf-5e4469dad6amr1721474a12.17.1740435684655; Mon, 24 Feb 2025 14:21:24 -0800 (PST) MIME-Version: 1.0 References: <14422cf1-4a63-4115-87cb-92685e7dd91b@suse.cz> <7wjnfy7cvmxzcmh4rs5xqi7qmurj365wa4kf252u7bnjgo4bqb@x42ceby4d27p> In-Reply-To: <7wjnfy7cvmxzcmh4rs5xqi7qmurj365wa4kf252u7bnjgo4bqb@x42ceby4d27p> From: Mateusz Guzik Date: Mon, 24 Feb 2025 23:21:12 +0100 X-Gm-Features: AWEUYZmbh11x690V0KlZ7fKf6xOPxcgyOdfvzCEoQWbzL_BI-f-3M3ENvBfElzo Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] SLUB allocator, mainly the sheaves caching layer To: Shakeel Butt Cc: Vlastimil Babka , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, bpf , Christoph Lameter , David Rientjes , Hyeonggon Yoo <42.hyeyoo@gmail.com>, "Uladzislau Rezki (Sony)" , Alexei Starovoitov Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: A31AA180006 X-Stat-Signature: thbxwmmf8eugibk6jmcyehhfnuof87t3 X-HE-Tag: 1740435686-291057 X-HE-Meta: U2FsdGVkX1+xhVrEvw1KNusGk0tlVOQu+FhwR9/D4yT2Nb//xQEncdPpxC8hQ/9m95DPZsn8e3m4y7u8X0UmF1p3n9IPp6CfSq6hz7RwHXT8AyKp+O40eVgFy11zPNIJRALQLhNVhIcYx+d7NqqLqwkS/BgjmjFETsBDBlurg91w6GZdjcvlGykSep64QgB1uN1wcAfWw92g69PzNLs0Rihsu/WZDwojWsm3DW0y+4PNl7croUjX/BxcxAwLHfNBMmL4thDHZZ1GyDzzKiPuadNHSFZuDVvv1qytOKwV5x8P3Ftyqd1RODL3vMVSrFF23CiiT6kPLNWzZO+trWnni6+cnJWAxPYXkVZy+6GPDhWkMZXDtw+GYGG11QMkR4h6447u8f9AFQ6oScM5etnl7wMsHOmiPnKFo7pk3MUdxpDsk9Mn0s/YlQTyDEranauFp8gLLJ9QetY5RuuJq+UAj4rtq6Ekw+eQoc7tzQzm1IinODXG41ruU3XfPJIo88kXQWFWs/3yE+NKEwRdUdn5p+ISL+RMQSVuQxbURM1fM3EMZG+930LNbzNiUBEV+YIewvovTEvKxp0PxdbdnuC+cvjOmJiCpQ4oRD2Rsxh9G7MFI07qhlsxJII7QhTGAekFu9S174YWQPYTZz46TZWGzeXqkPohpeRfKQtYP+s6YLFQv/C5iP73boeMxmb5fAS3ZSMRpjZj/dbucYvcxTbEzC+g5wnWpSGL6J1jh5FgJenjzB6acwWb1o4ZuBlSkIY7Brzis+IMz14WULkQczCnCA7VGX3QlNzpCxx3o8USLSXIO+Gn3CaELY3l2kXPqEAyvMWoeif0v9YiL5BufJ6odnNAltKBXyIHNmvplvL1UTre++aebmn85qCdw63GBFlhBMCQ8oBvX4uTBAYcCQOHKt4ITGDV690ZoFii3ekHlcHH5uWGC30meI8uT4n51/UHgkb4GiLw0xj6N1ysTxA TbR2ArZu x4N0BZPFry50mPAjV1esg/ryblF/VgTqESfUpBBnUiwEctcTpIxfeAuPSPaiV2Xa78XcCZa1w0AFH3EOqLKq2w90FR41ClUr4j4Oxw2dJiqJAFULOUEJRZrAgNbBUNA+EpEAeZFiFuy9Wb9uSPibzQ8m99t+YtvmDckh8g0h+9mNGQLzU1eqjFnEoK3dEpApgpB83XE29Hp99+SarrAshlehiSM6CiQqALf73iRYBEyOpEuSDl6+GLPmWF0TVKdMHeTjffcbAPcRln5aGlnhHcZmyWw/i//VKl+b0U+BlmIfYeNtsIZeiZklvAFYJjyLuY7scz9xGf2UwfigRMTqnkgNrU3uxkLcRaVFjUOk1zcV7+aSq3PHWsAeowYpFmn5EJob8DCNvmy6h5HgIb8XePFT59ORjbCtFbtdW X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 24, 2025 at 10:12=E2=80=AFPM Shakeel Butt wrote: > > On Mon, Feb 24, 2025 at 07:46:52PM +0100, Mateusz Guzik wrote: > > On Mon, Feb 24, 2025 at 10:02:09AM -0800, Shakeel Butt wrote: > > > What about pre-memcg-charged sheaves? We had to disable memcg chargin= g > > > of some kernel allocations and I think sheaves can help in reenabling > > > it. > > > > It has been several months since last I looked at memcg, so details are > > fuzzy and I don't have time to refresh everything. > > > > However, if memory serves right the primary problem was the irq on/off > > trip associated with them (sometimes happening twice, second time with > > refill_obj_stock()). > > > > I think the real fix(tm) would recognize only some allocations need > > interrupt safety -- as in some slabs should not be allowed to be used > > outside of the process context. This is somewhat what sheaves is doing, > > but can be applied without fronting the current kmem caching mechanism. > > This may be a tough sell and even then it plays whackamole with patchin= g > > up all consumers. > > > > Suppose it is not an option. > > > > Then there are 2 ways that I considered. > > > > The easiest splits memcg accounting for irq and process level -- simila= r > > to what localtry thing is doing. this would only cost preemption off/on > > trip in the common case and a branch on the current state. But suppose > > this is a no-go as well. > > Have you seen 559271146efc ("mm/memcg: optimize user context object > stock access"). It got reverted for RT (or something). Maybe we can look > at it again. > Huh. I have not it, it does look like the same core idea. Even if RT itself is the problem, perhaps this could be made build time conditional on it? > > > > My primary idea was using hand-rolled sequence counters and local 8-byt= e > > cmpxchg (*without* the lock prefix, also not to be confused with 16-byt= e > > used by the current slub fast path). Should this work, it would be > > significantly faster than irq trips. > > > > The irq thing is there only to facilitate several fields being updated > > or memcg itself getting replaced in an atomic manner for process vs > > interrupt context. > > > > The observation is that all values which are getting updated are 4 > > bytes. Then perhaps an additional counter can be added next to each one > > so that an 8-byte cmpxchg is going to fail should an irq swoop in and > > change stuff from under us. > > > > The percpu state would have a sequence counter associated with the > > assigned memcg_stock_pcp. The memcg_stock_pcp object would have the sam= e > > value replicated inside for every var which can be updated in the fast > > path. > > > > Then the fast path would only succeed if the value read off from per-cp= u > > did not change vs what's in the stock thing. > > > > Any change to memcg_stock_pcp (e.g., rolling up bytes after passing the > > page size threshold) would disable interrupts and modify all these > > counters. > > > > There is some more work needed to make sure the stock obj can be safely > > swapped out for a new one and not accidentally have a value which lines > > up with the prevoius one, I don't remember what I had for that (and yes= , > > I recognize a 4 byte value will invariably roll over and *in principle* > > a conflict will be possible). > > > > This is a rough outline since Vlasta keeps prodding me about it. > > By chance do you have this code lying around somewhere? Not saying this > is the way to go but wanted to take a look. Sorry mate, there was a lot of handwaving produced around this and kmem fast paths, but no code. :) Conceptually though I think this is pretty straightforward. Anyhow, I forgot to mention another angle: perhaps a kernel-equivalent of rseq could be somehow employed here? As in you prep the op. Should an interrupt come in, it can detect you were going to execute it and redirect your IP to a fallback or just restart. I have no idea how feasible this is here, food for thought. --=20 Mateusz Guzik