From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC298C021A4 for ; Mon, 24 Feb 2025 21:12:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6504728000A; Mon, 24 Feb 2025 16:12:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 60085280002; Mon, 24 Feb 2025 16:12:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4EFED28000A; Mon, 24 Feb 2025 16:12:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 3322D280002 for ; Mon, 24 Feb 2025 16:12:15 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D21311A01E2 for ; Mon, 24 Feb 2025 21:12:14 +0000 (UTC) X-FDA: 83156086188.30.9EB9B1C Received: from out-181.mta1.migadu.com (out-181.mta1.migadu.com [95.215.58.181]) by imf04.hostedemail.com (Postfix) with ESMTP id 1705640002 for ; Mon, 24 Feb 2025 21:12:12 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=gg8HKjQ8; spf=pass (imf04.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.181 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740431533; a=rsa-sha256; cv=none; b=WM6pesTrdtIYvaGDhv18zYHKKv2KYZjZXe05j9fawh2qSUGtPAfB+PiggGnGD2Hb4sfx8E o0fRsht1SicpMXzf8IdjesKwYZBhan6X4LnPBEZIqMduCfV1vMKJ/Z04zq5l0KuzdrX6As PJUGVPvngn7ZdECc3wiajyqfotoy7TQ= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=gg8HKjQ8; spf=pass (imf04.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.181 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740431533; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YiAYqnScgnVRUYWYH85PXsYdGcZ7vXE3qYfG5M99a1I=; b=ZxkcZv1Hlqxh5v8yifnAD/JJxA1QsuTgDq19BjLv/oDIO+wOV8Xr9hiuYKVjyxsrf1mOYl 2+RTqKFYIFAlTBUlxDNmRbw8/mo6iV+GB+yi1VTvXw8Uzc41cEBLkH0KSJjd3cSQN/8JUh 6S/0SgdzeUUBxX9AhZMZ6rIDUHe8CqM= Date: Mon, 24 Feb 2025 13:12:06 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1740431531; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=YiAYqnScgnVRUYWYH85PXsYdGcZ7vXE3qYfG5M99a1I=; b=gg8HKjQ8NQhLfgWAghKC05rgJaPjwqxBVY0+SdqPy2N7ZOoy1x0CPxmIixmb+Dkv9rkkOI Id5zUsqb0hOjn/rX7wVaKeONxUunSnVPk+TyosGNFgc7li8dVtAWMw35kMCrSIgU0icE3T i0GDcn2vsWjPUfusJfvs4zlDjSwx5Lc= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Mateusz Guzik Cc: Vlastimil Babka , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, bpf , Christoph Lameter , David Rientjes , Hyeonggon Yoo <42.hyeyoo@gmail.com>, "Uladzislau Rezki (Sony)" , Alexei Starovoitov Subject: Re: [LSF/MM/BPF TOPIC] SLUB allocator, mainly the sheaves caching layer Message-ID: <7wjnfy7cvmxzcmh4rs5xqi7qmurj365wa4kf252u7bnjgo4bqb@x42ceby4d27p> References: <14422cf1-4a63-4115-87cb-92685e7dd91b@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Stat-Signature: mzxfrb6k9eepmpg4tombuw9s9jh5hfbu X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 1705640002 X-Rspam-User: X-HE-Tag: 1740431532-896570 X-HE-Meta: U2FsdGVkX196Ryz0hH54QK8bwE3/ozxOkdlKySL5/H19iXJkxxS1EwrJnSa+7LU57H3VF4mL3j2D/UFCVS1UTRWV6gTo/QCtwwrjHVJC5z+yBgcymLW3+if60UUWNvlvdkaA2bj70MHsNF38vljjE/1P6N7ZSvUoc8vX0QlI2bG7BiZ8KKw4kloP6+7iyjyZYganekp1H161u7Z10XJY2d2Cghm9yvnaUEW8U0t031xme/tfTDhWQycRt6F4mmEmnp7SM0O+q5riGIOB4umfMo3ql1OwGGdSiFBgbSX1+WFWSAlCa1nJC9S16/mLMnrmpva8/gkQxDIVTqQeOcvrhuDRLeowJOKoeI/+TGtOemsKEZMzUTmJyr+tjRtlD2iP6yheEWdQi08iCkx/4NF83Ko6ATEvE8B8zSlwy6UKHN4I8ZwMuScTkAkAy9wVQ/WKwJqtQWwNm5TqWKfLhFYBzgKc7asGqJ6k0Zd+6P91I8pZe6xlAwTqdmwVJZc5eZ81lxvC0ojvugcgp0zOE3YfssRgPf7ullbxhT/+LVnXnLKAynxdKwEjIRuGGgTCP908Xu4XCYLCsbrQ3CzlcIrjABSMzfp3nS/N3J6Sf2ecgDH6MKKMPjSvxa2CMhUzHEgZSTQt4OKO+HnVJyupj7ETVsILnQc1CoY7KInBX9xtHFtSPt/RlY0wDivBl1b71D9Ij8YNNpINdvSF9e/ESjwFw6Dt/mBANB737i6bx3Yx1j2cKuM5B1/vA16/Ln9bUVwlCVRiAWkNy4HyiKt9ElFueG173S1SrOcG/aTDHBtIxlwC20+cVwLNWo7iKoygmqR2ZmPiihjVQWEEtAXjhbiyvc/EOoxEFvU+8KrSD5aCgYeIbAarRqgr3umiNbiT12X195+ootxgHTL4j74exYhBzjJfxZELC0+JiDR8DTuKIo+BEyP6uvvqcMh93/zmG7eoqoKJFfM4q7Y0xSRnoOB 6o5a+q+z 9T3YKhWddtIYKpfVGQPKF17D37+EYTKjSXnx2LmkCmWHIlT/f0NBgRmNjlfLx0zZQQFNlPYoQhesJb/tL2bBf9GvC65+MJ65yg556PNwxhJQFZPQEjP9V/TAytafu3zC5UFPQh4sGw3H+i2BF1zuw/Cand1OTXbqbEoUKVeox3Ot6+pQL7GA0iA1orPDLG81MsceVAlIHjFMfPYbdjBBLWkTGIQiBxeFdAvGYXNDXj7ZN0xQksOGaYuJJX8iQKcOqnRQw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 24, 2025 at 07:46:52PM +0100, Mateusz Guzik wrote: > On Mon, Feb 24, 2025 at 10:02:09AM -0800, Shakeel Butt wrote: > > What about pre-memcg-charged sheaves? We had to disable memcg charging > > of some kernel allocations and I think sheaves can help in reenabling > > it. > > It has been several months since last I looked at memcg, so details are > fuzzy and I don't have time to refresh everything. > > However, if memory serves right the primary problem was the irq on/off > trip associated with them (sometimes happening twice, second time with > refill_obj_stock()). > > I think the real fix(tm) would recognize only some allocations need > interrupt safety -- as in some slabs should not be allowed to be used > outside of the process context. This is somewhat what sheaves is doing, > but can be applied without fronting the current kmem caching mechanism. > This may be a tough sell and even then it plays whackamole with patching > up all consumers. > > Suppose it is not an option. > > Then there are 2 ways that I considered. > > The easiest splits memcg accounting for irq and process level -- similar > to what localtry thing is doing. this would only cost preemption off/on > trip in the common case and a branch on the current state. But suppose > this is a no-go as well. Have you seen 559271146efc ("mm/memcg: optimize user context object stock access"). It got reverted for RT (or something). Maybe we can look at it again. > > My primary idea was using hand-rolled sequence counters and local 8-byte > cmpxchg (*without* the lock prefix, also not to be confused with 16-byte > used by the current slub fast path). Should this work, it would be > significantly faster than irq trips. > > The irq thing is there only to facilitate several fields being updated > or memcg itself getting replaced in an atomic manner for process vs > interrupt context. > > The observation is that all values which are getting updated are 4 > bytes. Then perhaps an additional counter can be added next to each one > so that an 8-byte cmpxchg is going to fail should an irq swoop in and > change stuff from under us. > > The percpu state would have a sequence counter associated with the > assigned memcg_stock_pcp. The memcg_stock_pcp object would have the same > value replicated inside for every var which can be updated in the fast > path. > > Then the fast path would only succeed if the value read off from per-cpu > did not change vs what's in the stock thing. > > Any change to memcg_stock_pcp (e.g., rolling up bytes after passing the > page size threshold) would disable interrupts and modify all these > counters. > > There is some more work needed to make sure the stock obj can be safely > swapped out for a new one and not accidentally have a value which lines > up with the prevoius one, I don't remember what I had for that (and yes, > I recognize a 4 byte value will invariably roll over and *in principle* > a conflict will be possible). > > This is a rough outline since Vlasta keeps prodding me about it. By chance do you have this code lying around somewhere? Not saying this is the way to go but wanted to take a look. > > That said, maybe someone will have a better idea. The above is up for > grabs if someone wants to do it, I can't commit to looking at it.