From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E22D6C4725D for ; Fri, 19 Jan 2024 07:48:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 69ED66B007E; Fri, 19 Jan 2024 02:48:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 64E956B0081; Fri, 19 Jan 2024 02:48:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 515DC6B0082; Fri, 19 Jan 2024 02:48:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 41ED46B007E for ; Fri, 19 Jan 2024 02:48:07 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C8124A2295 for ; Fri, 19 Jan 2024 07:48:06 +0000 (UTC) X-FDA: 81695282172.10.31CF4C4 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf28.hostedemail.com (Postfix) with ESMTP id 1E951C0008 for ; Fri, 19 Jan 2024 07:48:03 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=knRPQhiL; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of shakeelb@google.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=shakeelb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705650484; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=B24ukB2YP6slYCgEIRmf+Hk92TM7//n+GaGioxlv0Yk=; b=wtNsu8SeXnYswGrYKR4jFoOXmqwtTj04bYJAnrOlH6SppbeAcOGO+Qd1CcH1WyteuMG81g 0eT9tZw5mjQe3EzbITcOFlkfvuhhc34/naegLmvkswfHpU63RLIs8jwSuM+acq9djHSApG wQrU0YxqKXDM1X8bxzSuJ7eKtQuiuxw= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=knRPQhiL; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of shakeelb@google.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=shakeelb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705650484; a=rsa-sha256; cv=none; b=RmkbNci3TANli//G/3XgW2kQNf4OW/MXm7My0cH1UneEp7+ctdebsruDHEjcI247zo74NG t//iYQ5QAV9v9oYj5u8BZ9c9BSb1yt7mS16a1jJTkW3mb1jj9Zsb5ZszvD2pB/v3w1wMtR 6Csn+LAUea0QC7TXlAyp33CE1COP4ig= Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-1d72043fa06so45635ad.1 for ; Thu, 18 Jan 2024 23:48:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1705650483; x=1706255283; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=B24ukB2YP6slYCgEIRmf+Hk92TM7//n+GaGioxlv0Yk=; b=knRPQhiLtfs/V1B8w6Yzxr2HwzIxK8wfSZVeG3WiPhdb/zsD93dS7B7b8bZxJulK9A ReWmFLPhpJqFrkrQnWTGKpSvvRaGUV0XyNxbF0yLSg315UgdJvZyvoURvhlGWwBDTAJC uazTEJHL8K8bguGFvyitqYYMXxdtHxOwqkY5a+MtPdCjz8ejWRvor2V1RNCQm51OniI+ SvjxoPJqUUmuw1NqU1ukqXCGZ918yGk9HeiGgm65Vqg11aHGxS+4wbMAAyNS1hdDOHJZ DgWlmZKdO+G3/mlV81D3t3QIp1vnL6DYBOTpJLfFJIuIkf+wZGaGK9jEpStVNdM8rW5p fZjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705650483; x=1706255283; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=B24ukB2YP6slYCgEIRmf+Hk92TM7//n+GaGioxlv0Yk=; b=rwepokczvDaCyvjsM9qZZJqzqIatBo/HelsWyJJiVukv7oNvTNGE3EVwZTby6cGDrM KEhtra+HAOfBujDLm3+ijX52HNjqn+7V97xMf50qpFKSbqlhn5kVxtpH+ergLiRCad34 p0ZUrZTHhm4UfhIe0cUXBPyaHnk1HPmHmRhG6TX1RU7kDxw6ev/Bx7hJQIg0z1eqKlvm OZgWeLVPCp+yHwjKZKUI9yLF053jIqWQVbk1CTLIDp3G0hMYF8kVX39LVvzOuda8hC0o 4d+SuJv3pEuNDHj2NstYelAFBv59vQ+SY174upihwnmGfAvm6bKsrKtKCNBc3GYXMk5P 1e2g== X-Gm-Message-State: AOJu0YyZtHNAdXLTL47apBCqBDOlQZPL9xVBaEWhRkJQg3jA1Yjy0VUH wViG6lreXRGV5VggBi24bTA2aF8eyA0yQYxLfjbblsrXhHf1NpdxyYjHCgGzkpP4XQntKZZ897v zqLRkL4tYpDl6YGNjZltDvaHmkHRFkaLb2HM1 X-Google-Smtp-Source: AGHT+IGH3WXcjH0pc0MDrqdUz2dR/fcNlao+XhL+WRfT60dYLM8gHj0I+P+3/OgoIn2uGsadlUi4OA4BIncXcva5REw= X-Received: by 2002:a17:903:110c:b0:1d5:4c40:bf01 with SMTP id n12-20020a170903110c00b001d54c40bf01mr127187plh.17.1705650482720; Thu, 18 Jan 2024 23:48:02 -0800 (PST) MIME-Version: 1.0 References: <6667b799702e1815bd4e4f7744eddbc0bd042bb7.camel@kernel.org> <20240117193915.urwueineol7p4hg7@treble> In-Reply-To: From: Shakeel Butt Date: Thu, 18 Jan 2024 23:47:51 -0800 Message-ID: Subject: Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again To: Roman Gushchin Cc: Linus Torvalds , Josh Poimboeuf , Vlastimil Babka , Jeff Layton , Chuck Lever , Johannes Weiner , Michal Hocko , linux-kernel@vger.kernel.org, Jens Axboe , Tejun Heo , Vasily Averin , Michal Koutny , Waiman Long , Muchun Song , Jiri Kosina , cgroups@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 1E951C0008 X-Stat-Signature: fp9cq1kh4z8xwjr8q6ywhs5riku6okbs X-HE-Tag: 1705650483-617365 X-HE-Meta: U2FsdGVkX18PnKlKXm4r4Si5/a5otdQ71eyGNCGS8UjeTbb+vpgbMoxJ6Ycqjm9U0+Ga45SQVpnOQJgFjAGgzKco96Gqt9uOAHQlJoUAr0N2z87i6txL27BweiRrQWUpbjX5IgGrJivZQhHtLZ3x3A0aHRU3pA+mknp7/uySz2YyYKH94jUIJI3nbpMx0wCoa377Kx0Pi8OLpDR4MqgUAdCp2w7RVqTtS1Fl1aRAfjyxrah0ohZ1RfhvEecY3wg6O08+NwKyU5FveNI+iJHFxVbCmHJ3COs7tTyUElUb9kWTQV1u5eq6ix1vMcsG6nPBiTu6CToQJxZVJOo3yLyc/i5J+QHVNG9NcmtrebQhK3RqmPreYKTyoSLLOfNqYCT+sSqtgGa2j9b5t11KTxWDxw3F8xY21QE9pB5ZYa5MWDoCx7vFn5CMPRbmcs1cyWtFKz6gmx9noR1K8j/fHhjMtcyEjRT+Tv3/cybkUYa9sE0yKdj3NrxcktHNWk4guhRitQnfm2CjBQHiGCBBrhtvIO4O4dCrXhVZWQ1Ke00r8Q7wSUcmc8DRlSVIy39D5jb8c6p/ZTfmUUfMlk7cS7DQMVnspMXp25Xzj+5Tt6nkCqeFWmcjE3AoImfEHV51GqppUzuYZwzOEJgg2n0aetoNJ5ZGHPiGD7jGZ/ZbIZCKB0it3iBDOJgu3APFyLd6Pgg3ArRHTuPJYqwpkv5vENddNxpqwbuHwtcksZa7KCusdWjAcu3V34UNpjQqI9W09n1zuRa0UAexWZd16MVPmjhp/m6C7ZFobwZOn3aYfOBcEf2l8Svu+QbGfMIxevmtXE7Vae80R4NUNIAxvuZuxU3ox5mfK6Fkirb31MSyY8U3v7bD3dmK+NtPcaRwohtJtEKKYGXHFNd2xgwZ4e53KP4Y3FZF+nDGW+o9vxNpXRQiGzhpOyOj2XJnQIXn7Qem+MVhgUFrtRUq5ITkNJIpx4B jN9n5ERG HgAhu X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 17, 2024 at 2:20=E2=80=AFPM Roman Gushchin wrote: > > On Wed, Jan 17, 2024 at 01:02:19PM -0800, Shakeel Butt wrote: > > On Wed, Jan 17, 2024 at 12:21=E2=80=AFPM Linus Torvalds > > wrote: > > > > > > On Wed, 17 Jan 2024 at 11:39, Josh Poimboeuf wr= ote: > > > > > > > > That's a good point. If the microbenchmark isn't likely to be even > > > > remotely realistic, maybe we should just revert the revert until if= /when > > > > somebody shows a real world impact. > > > > > > > > Linus, any objections to that? > > > > > > We use SLAB_ACCOUNT for much more common allocations like queued > > > signals, so I would tend to agree with Jeff that it's probably just > > > some not very interesting microbenchmark that shows any file locking > > > effects from SLAB_ALLOC, not any real use. > > > > > > That said, those benchmarks do matter. It's very easy to say "not > > > relevant in the big picture" and then the end result is that > > > everything is a bit of a pig. > > > > > > And the regression was absolutely *ENORMOUS*. We're not talking "a fe= w > > > percent". We're talking a 33% regression that caused the revert: > > > > > > https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex= -9020/ > > > > > > I wish our SLAB_ACCOUNT wasn't such a pig. Rather than account every > > > single allocation, it would be much nicer to account at a bigger > > > granularity, possibly by having per-thread counters first before > > > falling back to the obj_cgroup_charge. Whatever. > > > > > > It's kind of stupid to have a benchmark that just allocates and > > > deallocates a file lock in quick succession spend lots of time > > > incrementing and decrementing cgroup charges for that repeated > > > alloc/free. > > > > > > However, that problem with SLAB_ACCOUNT is not the fault of file > > > locking, but more of a slab issue. > > > > > > End result: I think we should bring in Vlastimil and whoever else is > > > doing SLAB_ACCOUNT things, and have them look at that side. > > > > > > And then just enable SLAB_ACCOUNT for file locks. But very much look > > > at silly costs in SLAB_ACCOUNT first, at least for trivial > > > "alloc/free" patterns.. > > > > > > Vlastimil? Who would be the best person to look at that SLAB_ACCOUNT > > > thing? See commit 3754707bcc3e (Revert "memcg: enable accounting for > > > file lock caches") for the history here. > > > > > > > Roman last looked into optimizing this code path. I suspect > > mod_objcg_state() to be more costly than obj_cgroup_charge(). I will > > try to measure this path and see if I can improve it. > > It's roughly an equal split between mod_objcg_state() and obj_cgroup_char= ge(). > And each is comparable (by order of magnitude) to the slab allocation cos= t > itself. On the free() path a significant cost comes simple from reading > the objcg pointer (it's usually a cache miss). > > So I don't see how we can make it really cheap (say, less than 5% overhea= d) > without caching pre-accounted objects. > > I thought about merging of charge and stats handling paths, which _maybe_= can > shave off another 20-30%, but there still will be a double-digit% account= ing > overhead. > > I'm curious to hear other ideas and suggestions. > > Thanks! I profiled (perf record -a) the same benchmark i.e. lock1_processes on an icelake machine with 72 cores and got the following results: 12.72% lock1_processes [kernel.kallsyms] [k] mod_objcg_state 10.89% lock1_processes [kernel.kallsyms] [k] kmem_cache_free 8.40% lock1_processes [kernel.kallsyms] [k] slab_post_alloc_hook 8.36% lock1_processes [kernel.kallsyms] [k] kmem_cache_alloc 5.18% lock1_processes [kernel.kallsyms] [k] refill_obj_stock 5.18% lock1_processes [kernel.kallsyms] [k] _copy_from_user On annotating mod_objcg_state(), the following irq disabling instructions are taking 30% of its time. 6.64 =E2=94=82 pushfq 10.26=E2=94=82 popq -0x38(%rbp) 6.05 =E2=94=82 mov -0x38(%rbp),%rcx 7.60 =E2=94=82 cli For kmem_cache_free() & kmem_cache_alloc(), the following instruction was expensive, which corresponds to __update_cpu_freelist_fast(). 16.33 =E2=94=82 cmpxchg16b %gs:(%rsi) For slab_post_alloc_hook(), it's all over the place and refill_obj_stock() is very similar to mod_objcg_state(). I will dig more in the next couple of days.