From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E22D6C4725D
	for <linux-mm@archiver.kernel.org>; Fri, 19 Jan 2024 07:48:07 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 69ED66B007E; Fri, 19 Jan 2024 02:48:07 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 64E956B0081; Fri, 19 Jan 2024 02:48:07 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 515DC6B0082; Fri, 19 Jan 2024 02:48:07 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id 41ED46B007E
	for <linux-mm@kvack.org>; Fri, 19 Jan 2024 02:48:07 -0500 (EST)
Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id C8124A2295
	for <linux-mm@kvack.org>; Fri, 19 Jan 2024 07:48:06 +0000 (UTC)
X-FDA: 81695282172.10.31CF4C4
Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175])
	by imf28.hostedemail.com (Postfix) with ESMTP id 1E951C0008
	for <linux-mm@kvack.org>; Fri, 19 Jan 2024 07:48:03 +0000 (UTC)
Authentication-Results: imf28.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=knRPQhiL;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf28.hostedemail.com: domain of shakeelb@google.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=shakeelb@google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1705650484;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=B24ukB2YP6slYCgEIRmf+Hk92TM7//n+GaGioxlv0Yk=;
	b=wtNsu8SeXnYswGrYKR4jFoOXmqwtTj04bYJAnrOlH6SppbeAcOGO+Qd1CcH1WyteuMG81g
	0eT9tZw5mjQe3EzbITcOFlkfvuhhc34/naegLmvkswfHpU63RLIs8jwSuM+acq9djHSApG
	wQrU0YxqKXDM1X8bxzSuJ7eKtQuiuxw=
ARC-Authentication-Results: i=1;
	imf28.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=knRPQhiL;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf28.hostedemail.com: domain of shakeelb@google.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=shakeelb@google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705650484; a=rsa-sha256;
	cv=none;
	b=RmkbNci3TANli//G/3XgW2kQNf4OW/MXm7My0cH1UneEp7+ctdebsruDHEjcI247zo74NG
	t//iYQ5QAV9v9oYj5u8BZ9c9BSb1yt7mS16a1jJTkW3mb1jj9Zsb5ZszvD2pB/v3w1wMtR
	6Csn+LAUea0QC7TXlAyp33CE1COP4ig=
Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-1d72043fa06so45635ad.1
        for <linux-mm@kvack.org>; Thu, 18 Jan 2024 23:48:03 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1705650483; x=1706255283; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=B24ukB2YP6slYCgEIRmf+Hk92TM7//n+GaGioxlv0Yk=;
        b=knRPQhiLtfs/V1B8w6Yzxr2HwzIxK8wfSZVeG3WiPhdb/zsD93dS7B7b8bZxJulK9A
         ReWmFLPhpJqFrkrQnWTGKpSvvRaGUV0XyNxbF0yLSg315UgdJvZyvoURvhlGWwBDTAJC
         uazTEJHL8K8bguGFvyitqYYMXxdtHxOwqkY5a+MtPdCjz8ejWRvor2V1RNCQm51OniI+
         SvjxoPJqUUmuw1NqU1ukqXCGZ918yGk9HeiGgm65Vqg11aHGxS+4wbMAAyNS1hdDOHJZ
         DgWlmZKdO+G3/mlV81D3t3QIp1vnL6DYBOTpJLfFJIuIkf+wZGaGK9jEpStVNdM8rW5p
         fZjg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1705650483; x=1706255283;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=B24ukB2YP6slYCgEIRmf+Hk92TM7//n+GaGioxlv0Yk=;
        b=rwepokczvDaCyvjsM9qZZJqzqIatBo/HelsWyJJiVukv7oNvTNGE3EVwZTby6cGDrM
         KEhtra+HAOfBujDLm3+ijX52HNjqn+7V97xMf50qpFKSbqlhn5kVxtpH+ergLiRCad34
         p0ZUrZTHhm4UfhIe0cUXBPyaHnk1HPmHmRhG6TX1RU7kDxw6ev/Bx7hJQIg0z1eqKlvm
         OZgWeLVPCp+yHwjKZKUI9yLF053jIqWQVbk1CTLIDp3G0hMYF8kVX39LVvzOuda8hC0o
         4d+SuJv3pEuNDHj2NstYelAFBv59vQ+SY174upihwnmGfAvm6bKsrKtKCNBc3GYXMk5P
         1e2g==
X-Gm-Message-State: AOJu0YyZtHNAdXLTL47apBCqBDOlQZPL9xVBaEWhRkJQg3jA1Yjy0VUH
	wViG6lreXRGV5VggBi24bTA2aF8eyA0yQYxLfjbblsrXhHf1NpdxyYjHCgGzkpP4XQntKZZ897v
	zqLRkL4tYpDl6YGNjZltDvaHmkHRFkaLb2HM1
X-Google-Smtp-Source: AGHT+IGH3WXcjH0pc0MDrqdUz2dR/fcNlao+XhL+WRfT60dYLM8gHj0I+P+3/OgoIn2uGsadlUi4OA4BIncXcva5REw=
X-Received: by 2002:a17:903:110c:b0:1d5:4c40:bf01 with SMTP id
 n12-20020a170903110c00b001d54c40bf01mr127187plh.17.1705650482720; Thu, 18 Jan
 2024 23:48:02 -0800 (PST)
MIME-Version: 1.0
References: <cover.1705507931.git.jpoimboe@kernel.org> <ac84a832feba5418e1b58d1c7f3fe6cc7bc1de58.1705507931.git.jpoimboe@kernel.org>
 <6667b799702e1815bd4e4f7744eddbc0bd042bb7.camel@kernel.org>
 <20240117193915.urwueineol7p4hg7@treble> <CAHk-=wg_CoTOfkREgaQQA6oJ5nM9ZKYrTn=E1r-JnvmQcgWpSg@mail.gmail.com>
 <CALvZod6LgX-FQOGgNBmoRACMBK4GB+K=a+DYrtExcuGFH=J5zQ@mail.gmail.com> <ZahSlnqw9yRo3d1v@P9FQF9L96D.corp.robot.car>
In-Reply-To: <ZahSlnqw9yRo3d1v@P9FQF9L96D.corp.robot.car>
From: Shakeel Butt <shakeelb@google.com>
Date: Thu, 18 Jan 2024 23:47:51 -0800
Message-ID: <CALvZod7T=gops1B6gU3M7rOJ8D2mOrSwQ2hfpLaE-tNWZynAug@mail.gmail.com>
Subject: Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
To: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Linus Torvalds <torvalds@linux-foundation.org>, Josh Poimboeuf <jpoimboe@kernel.org>, 
	Vlastimil Babka <vbabka@suse.cz>, Jeff Layton <jlayton@kernel.org>, 
	Chuck Lever <chuck.lever@oracle.com>, Johannes Weiner <hannes@cmpxchg.org>, 
	Michal Hocko <mhocko@kernel.org>, linux-kernel@vger.kernel.org, 
	Jens Axboe <axboe@kernel.dk>, Tejun Heo <tj@kernel.org>, Vasily Averin <vasily.averin@linux.dev>, 
	Michal Koutny <mkoutny@suse.com>, Waiman Long <longman@redhat.com>, 
	Muchun Song <muchun.song@linux.dev>, Jiri Kosina <jikos@kernel.org>, cgroups@vger.kernel.org, 
	linux-mm@kvack.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspam-User: 
X-Rspamd-Server: rspam12
X-Rspamd-Queue-Id: 1E951C0008
X-Stat-Signature: fp9cq1kh4z8xwjr8q6ywhs5riku6okbs
X-HE-Tag: 1705650483-617365
X-HE-Meta: U2FsdGVkX18PnKlKXm4r4Si5/a5otdQ71eyGNCGS8UjeTbb+vpgbMoxJ6Ycqjm9U0+Ga45SQVpnOQJgFjAGgzKco96Gqt9uOAHQlJoUAr0N2z87i6txL27BweiRrQWUpbjX5IgGrJivZQhHtLZ3x3A0aHRU3pA+mknp7/uySz2YyYKH94jUIJI3nbpMx0wCoa377Kx0Pi8OLpDR4MqgUAdCp2w7RVqTtS1Fl1aRAfjyxrah0ohZ1RfhvEecY3wg6O08+NwKyU5FveNI+iJHFxVbCmHJ3COs7tTyUElUb9kWTQV1u5eq6ix1vMcsG6nPBiTu6CToQJxZVJOo3yLyc/i5J+QHVNG9NcmtrebQhK3RqmPreYKTyoSLLOfNqYCT+sSqtgGa2j9b5t11KTxWDxw3F8xY21QE9pB5ZYa5MWDoCx7vFn5CMPRbmcs1cyWtFKz6gmx9noR1K8j/fHhjMtcyEjRT+Tv3/cybkUYa9sE0yKdj3NrxcktHNWk4guhRitQnfm2CjBQHiGCBBrhtvIO4O4dCrXhVZWQ1Ke00r8Q7wSUcmc8DRlSVIy39D5jb8c6p/ZTfmUUfMlk7cS7DQMVnspMXp25Xzj+5Tt6nkCqeFWmcjE3AoImfEHV51GqppUzuYZwzOEJgg2n0aetoNJ5ZGHPiGD7jGZ/ZbIZCKB0it3iBDOJgu3APFyLd6Pgg3ArRHTuPJYqwpkv5vENddNxpqwbuHwtcksZa7KCusdWjAcu3V34UNpjQqI9W09n1zuRa0UAexWZd16MVPmjhp/m6C7ZFobwZOn3aYfOBcEf2l8Svu+QbGfMIxevmtXE7Vae80R4NUNIAxvuZuxU3ox5mfK6Fkirb31MSyY8U3v7bD3dmK+NtPcaRwohtJtEKKYGXHFNd2xgwZ4e53KP4Y3FZF+nDGW+o9vxNpXRQiGzhpOyOj2XJnQIXn7Qem+MVhgUFrtRUq5ITkNJIpx4B
 jN9n5ERG
 HgAhu
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Wed, Jan 17, 2024 at 2:20=E2=80=AFPM Roman Gushchin <roman.gushchin@linu=
x.dev> wrote:
>
> On Wed, Jan 17, 2024 at 01:02:19PM -0800, Shakeel Butt wrote:
> > On Wed, Jan 17, 2024 at 12:21=E2=80=AFPM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > On Wed, 17 Jan 2024 at 11:39, Josh Poimboeuf <jpoimboe@kernel.org> wr=
ote:
> > > >
> > > > That's a good point.  If the microbenchmark isn't likely to be even
> > > > remotely realistic, maybe we should just revert the revert until if=
/when
> > > > somebody shows a real world impact.
> > > >
> > > > Linus, any objections to that?
> > >
> > > We use SLAB_ACCOUNT for much more common allocations like queued
> > > signals, so I would tend to agree with Jeff that it's probably just
> > > some not very interesting microbenchmark that shows any file locking
> > > effects from SLAB_ALLOC, not any real use.
> > >
> > > That said, those benchmarks do matter. It's very easy to say "not
> > > relevant in the big picture" and then the end result is that
> > > everything is a bit of a pig.
> > >
> > > And the regression was absolutely *ENORMOUS*. We're not talking "a fe=
w
> > > percent". We're talking a 33% regression that caused the revert:
> > >
> > >    https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex=
-9020/
> > >
> > > I wish our SLAB_ACCOUNT wasn't such a pig. Rather than account every
> > > single allocation, it would be much nicer to account at a bigger
> > > granularity, possibly by having per-thread counters first before
> > > falling back to the obj_cgroup_charge. Whatever.
> > >
> > > It's kind of stupid to have a benchmark that just allocates and
> > > deallocates a file lock in quick succession spend lots of time
> > > incrementing and decrementing cgroup charges for that repeated
> > > alloc/free.
> > >
> > > However, that problem with SLAB_ACCOUNT is not the fault of file
> > > locking, but more of a slab issue.
> > >
> > > End result: I think we should bring in Vlastimil and whoever else is
> > > doing SLAB_ACCOUNT things, and have them look at that side.
> > >
> > > And then just enable SLAB_ACCOUNT for file locks. But very much look
> > > at silly costs in SLAB_ACCOUNT first, at least for trivial
> > > "alloc/free" patterns..
> > >
> > > Vlastimil? Who would be the best person to look at that SLAB_ACCOUNT
> > > thing? See commit 3754707bcc3e (Revert "memcg: enable accounting for
> > > file lock caches") for the history here.
> > >
> >
> > Roman last looked into optimizing this code path. I suspect
> > mod_objcg_state() to be more costly than obj_cgroup_charge(). I will
> > try to measure this path and see if I can improve it.
>
> It's roughly an equal split between mod_objcg_state() and obj_cgroup_char=
ge().
> And each is comparable (by order of magnitude) to the slab allocation cos=
t
> itself. On the free() path a significant cost comes simple from reading
> the objcg pointer (it's usually a cache miss).
>
> So I don't see how we can make it really cheap (say, less than 5% overhea=
d)
> without caching pre-accounted objects.
>
> I thought about merging of charge and stats handling paths, which _maybe_=
 can
> shave off another 20-30%, but there still will be a double-digit% account=
ing
> overhead.
>
> I'm curious to hear other ideas and suggestions.
>
> Thanks!

I profiled (perf record -a) the same benchmark i.e. lock1_processes on
an icelake machine with 72 cores and got the following results:

  12.72%  lock1_processes  [kernel.kallsyms]   [k] mod_objcg_state
  10.89%  lock1_processes  [kernel.kallsyms]   [k] kmem_cache_free
   8.40%  lock1_processes  [kernel.kallsyms]   [k] slab_post_alloc_hook
   8.36%  lock1_processes  [kernel.kallsyms]   [k] kmem_cache_alloc
   5.18%  lock1_processes  [kernel.kallsyms]   [k] refill_obj_stock
   5.18%  lock1_processes  [kernel.kallsyms]   [k] _copy_from_user

On annotating mod_objcg_state(), the following irq disabling
instructions are taking 30% of its time.

  6.64 =E2=94=82       pushfq
 10.26=E2=94=82       popq   -0x38(%rbp)
  6.05 =E2=94=82       mov    -0x38(%rbp),%rcx
  7.60 =E2=94=82       cli

For kmem_cache_free() & kmem_cache_alloc(), the following instruction
was expensive, which corresponds to __update_cpu_freelist_fast().

 16.33 =E2=94=82      cmpxchg16b %gs:(%rsi)

For slab_post_alloc_hook(), it's all over the place and
refill_obj_stock() is very similar to mod_objcg_state().

I will dig more in the next couple of days.