From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=QLeX=JG=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.5 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED,
	DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,
	URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4E2D8C433ED
	for <linux-mm@archiver.kernel.org>; Fri,  9 Apr 2021 16:05:46 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id D9CB66023C
	for <linux-mm@archiver.kernel.org>; Fri,  9 Apr 2021 16:05:45 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D9CB66023C
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 632066B006C; Fri,  9 Apr 2021 12:05:45 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 5ED026B006E; Fri,  9 Apr 2021 12:05:45 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 45CFF6B0070; Fri,  9 Apr 2021 12:05:45 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0146.hostedemail.com [216.40.44.146])
	by kanga.kvack.org (Postfix) with ESMTP id 271B36B006C
	for <linux-mm@kvack.org>; Fri,  9 Apr 2021 12:05:45 -0400 (EDT)
Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay02.hostedemail.com (Postfix) with ESMTP id BD94EBBF3
	for <linux-mm@kvack.org>; Fri,  9 Apr 2021 16:05:44 +0000 (UTC)
X-FDA: 78013304208.02.0301543
Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180])
	by imf19.hostedemail.com (Postfix) with ESMTP id 0034290009EA
	for <linux-mm@kvack.org>; Fri,  9 Apr 2021 16:05:33 +0000 (UTC)
Received: by mail-qt1-f180.google.com with SMTP id i19so4533124qtv.7
        for <linux-mm@kvack.org>; Fri, 09 Apr 2021 09:05:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=date:from:to:cc:subject:message-id:references:mime-version
         :content-disposition:content-transfer-encoding:in-reply-to;
        bh=QhmfWW84CRFKzl562/MmBkeBA4j+Ef4ragxrzhUeTuU=;
        b=Eylqsk546yTCAujbrrggfp0ZBcs7wNmlHZprEboYUeU4SNxe13tkPD/ozr3mcVvdtf
         xq9EMd0BUPtqwL8hQthmumMKKE78alM8pxypx8//Z5UrZURGeuifdT+9+ypNxhyJFDI8
         4sQq8XG6Fdz6n2pJnnrSAwn9ouG4XYPyqM8IJliaYfI5IMHGKH3xIUGuKz66g3fWavhS
         /A0OUXxWNoOT2vHUlcFV8uVMC07mjuwEhq+jbqsOGz9PmWMgpzEQ4lvqdejeduYziD9+
         XKNncvhp800wz1YKboRfWKrccd1FGKGe1OfBsHPQyE69W9jUemqPY2UzHKm3v5kl5uVB
         YONA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:cc:subject:message-id:references
         :mime-version:content-disposition:content-transfer-encoding
         :in-reply-to;
        bh=QhmfWW84CRFKzl562/MmBkeBA4j+Ef4ragxrzhUeTuU=;
        b=HxydUeQpXkbw3G/KLagKktSPoyO0zAhepHV4sYVuewjd+gRYONdu33j/rNGDWw8c9n
         JNH/9VlvaRYyzcFx2TWAFv8HoX+yuKVTmxCsSzx90c4W6QotqDd9A44reY67IeKNWfrT
         XZoqYP0adykIYhva44RPd1jHVc8oAbXFizeiznh9YdIHyLQ+uZN5ac1Mn6ZYGn8pJ5UK
         QZFTuNmwVPqeTYLUJ4lJZEeMXWO0IiVXhxEES10b2jlPUOVfwiosNp8+nus716kW+kzf
         ty3366GhEjX/1g63mny+q7J+PnzEXAj2AwHgy6qLe8Qx1T+v/dx2E6nuf5btfEQERbKr
         F+tg==
X-Gm-Message-State: AOAM533QkB3Lm6uRQs/WqwjJXHWqQo3KlnPzX2ukzPMusxs70STVTMQ8
	3YaY71viws/zVzFlqna6uw==
X-Google-Smtp-Source: ABdhPJxR0su+DdChNtYtBZ5bgNRPqB9H+CItgpGfbbmilbFcuPTMR+/JhLNerXXF4ZqZ3NlmPI13Lg==
X-Received: by 2002:a05:622a:14d4:: with SMTP id u20mr13365630qtx.185.1617984343829;
        Fri, 09 Apr 2021 09:05:43 -0700 (PDT)
Received: from gabell (209-6-122-159.s2973.c3-0.arl-cbr1.sbo-arl.ma.cable.rcncustomer.com. [209.6.122.159])
        by smtp.gmail.com with ESMTPSA id e2sm2038656qto.50.2021.04.09.09.05.43
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 09 Apr 2021 09:05:43 -0700 (PDT)
Date: Fri, 9 Apr 2021 12:05:41 -0400
From: Masayoshi Mizuma <msys.mizuma@gmail.com>
To: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>, cgroups@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: memcg: performance degradation since v5.9
Message-ID: <20210409160541.4tfkeex7mcfrwras@gabell>
References: <20210408193948.vfktg3azh2wrt56t@gabell>
 <YG9tW1h9VSJcir+Y@carbon.dhcp.thefacebook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <YG9tW1h9VSJcir+Y@carbon.dhcp.thefacebook.com>
X-Rspamd-Server: rspam05
X-Rspamd-Queue-Id: 0034290009EA
X-Stat-Signature: grg8w6gp3fqswtoabetb8x4hr1c5t3nm
Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf19; identity=mailfrom; envelope-from="<msys.mizuma@gmail.com>"; helo=mail-qt1-f180.google.com; client-ip=209.85.160.180
X-HE-DKIM-Result: pass/pass
X-HE-Tag: 1617984333-272876
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Thu, Apr 08, 2021 at 01:53:47PM -0700, Roman Gushchin wrote:
> On Thu, Apr 08, 2021 at 03:39:48PM -0400, Masayoshi Mizuma wrote:
> > Hello,
> >=20
> > I detected a performance degradation issue for a benchmark of Postgre=
sSQL [1],
> > and the issue seems to be related to object level memory cgroup [2].
> > I would appreciate it if you could give me some ideas to solve it.
> >=20
> > The benchmark shows the transaction per second (tps) and the tps for =
v5.9
> > and later kernel get about 10%-20% smaller than v5.8.
> >=20
> > The benchmark does sendto() and recvfrom() system calls repeatedly,
> > and the duration of the system calls get longer than v5.8.
> > The result of perf trace of the benchmark is as follows:
> >=20
> > =C2=A0 - v5.8
> >=20
> >    syscall            calls  errors  total       min       avg       =
max       stddev
> >                                      (msec)    (msec)    (msec)    (m=
sec)        (%)
> >    --------------- --------  ------ -------- --------- --------- ----=
-----     ------
> >    sendto            699574      0  2595.220     0.001     0.004     =
0.462      0.03%
> >    recvfrom         1391089 694427  2163.458     0.001     0.002     =
0.442      0.04%
> >=20
> > =C2=A0 - v5.9
> >=20
> >    syscall            calls  errors  total       min       avg       =
max       stddev
> >                                      (msec)    (msec)    (msec)    (m=
sec)        (%)
> >    --------------- --------  ------ -------- --------- --------- ----=
-----     ------
> >    sendto            699187      0  3316.948     0.002     0.005     =
0.044      0.02%
> >    recvfrom         1397042 698828  2464.995     0.001     0.002     =
0.025      0.04%
> >=20
> > =C2=A0 - v5.12-rc6
> >=20
> >    syscall            calls  errors  total       min       avg       =
max       stddev
> >                                      (msec)    (msec)    (msec)    (m=
sec)        (%)
> >    --------------- --------  ------ -------- --------- --------- ----=
-----     ------
> >    sendto            699445      0  3015.642     0.002     0.004     =
0.027      0.02%
> >    recvfrom         1395929 697909  2338.783     0.001     0.002     =
0.024      0.03%
> >=20
> > I bisected the kernel patches, then I found the patch series, which a=
dd
> > object level memory cgroup support, causes the degradation.
> >=20
> > I confirmed the delay with a kernel module which just runs
> > kmem_cache_alloc/kmem_cache_free as follows. The duration is about
> > 2-3 times than v5.8.
> >=20
> >    dummy_cache =3D KMEM_CACHE(dummy, SLAB_ACCOUNT);
> >    for (i =3D 0; i < 100000000; i++)
> >    {
> >            p =3D kmem_cache_alloc(dummy_cache, GFP_KERNEL);
> >            kmem_cache_free(dummy_cache, p);
> >    }
> >=20
> > It seems that the object accounting work in slab_pre_alloc_hook() and
> > slab_post_alloc_hook() is the overhead.
> >=20
> > cgroup.nokmem kernel parameter doesn't work for my case because it di=
sables
> > all of kmem accounting.
> >=20
> > The degradation is gone when I apply a patch (at the bottom of this e=
mail)
> > that adds a kernel parameter that expects to fallback to the page lev=
el
> > accounting, however, I'm not sure it's a good approach though...
>=20
> Hello Masayoshi!
>=20
> Thank you for the report!

Hi!

>=20
> It's not a secret that per-object accounting is more expensive than a p=
er-page
> allocation. I had micro-benchmark results similar to yours: accounted
> allocations are about 2x slower. But in general it tends to not affect =
real
> workloads, because the cost of allocations is still low and tends to be=
 only
> a small fraction of the whole cpu load. And because it brings up signif=
icant
> benefits: 40%+ slab memory savings, less fragmentation, more stable wor=
kingset,
> etc, real workloads tend to perform on pair or better.
>=20
> So my first question is if you see the regression in any real workload
> or it's only about the benchmark?

It's only about the benchmark so far. I'll let you know if I get the issu=
e with
real workload.

>=20
> Second, I'll try to take a look into the benchmark to figure out why it=
's
> affected so badly, but I'm not sure we can easily fix it. If you have a=
ny
> ideas what kind of objects the benchmark is allocating in big numbers,
> please let me know.

The benchmark does sendto() and recvfrom() to the unix domain socket
repeatedly, and kmem_cache_alloc_node()/kmem_cache_free() is called
to allocate/free the socket buffers.
The call graph to allocate the object is as flllows.

  do_syscall_64
    __x64_sys_sendto
      __sys_sendto
        sock_sendmsg
          unix_stream_sendmsg
            sock_alloc_send_pskb
              alloc_skb_with_frags
                __alloc_skb
                  kmem_cache_alloc_node

kmem_cache_alloc_node()/kmem_cache_free() is called about 1,400,000 times
during the benchmark and the object size is 216 byte, the GFP flag is 0x4=
00cc0:
 ___GFP_ACCOUNT | ___GFP_KSWAPD_RECLAIM | ___GFP_DIRECT_RECLAIM | ___GFP_=
FS | ___GFP_IO

I got the data by following bpftrace script.

  # cat kmem.bt=20
  #!/usr/bin/env bpftrace

  tracepoint:kmem:kmem_cache_alloc_node /comm =3D=3D "pgbench"/
  {
	@alloc[comm, args->bytes_req, args->bytes_alloc, args->gfp_flags] =3D co=
unt();
  }

  tracepoint:kmem:kmem_cache_free /comm =3D=3D "pgbench"/
  {
	@free[comm] =3D count();
  }
  # ./kmem.bt=20
  Attaching 2 probes...
  ^C

  @alloc[pgbench, 11784, 11840, 3264]: 1
  @alloc[pgbench, 216, 256, 3264]: 23
  @alloc[pgbench, 216, 256, 4197568]: 1400046

  @free[pgbench]: 1400560

  #=20

I hope this helps...

Thanks!
Masa