From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.5 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E2D8C433ED for ; Fri, 9 Apr 2021 16:05:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D9CB66023C for ; Fri, 9 Apr 2021 16:05:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D9CB66023C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 632066B006C; Fri, 9 Apr 2021 12:05:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5ED026B006E; Fri, 9 Apr 2021 12:05:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 45CFF6B0070; Fri, 9 Apr 2021 12:05:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0146.hostedemail.com [216.40.44.146]) by kanga.kvack.org (Postfix) with ESMTP id 271B36B006C for ; Fri, 9 Apr 2021 12:05:45 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id BD94EBBF3 for ; Fri, 9 Apr 2021 16:05:44 +0000 (UTC) X-FDA: 78013304208.02.0301543 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) by imf19.hostedemail.com (Postfix) with ESMTP id 0034290009EA for ; Fri, 9 Apr 2021 16:05:33 +0000 (UTC) Received: by mail-qt1-f180.google.com with SMTP id i19so4533124qtv.7 for ; Fri, 09 Apr 2021 09:05:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=QhmfWW84CRFKzl562/MmBkeBA4j+Ef4ragxrzhUeTuU=; b=Eylqsk546yTCAujbrrggfp0ZBcs7wNmlHZprEboYUeU4SNxe13tkPD/ozr3mcVvdtf xq9EMd0BUPtqwL8hQthmumMKKE78alM8pxypx8//Z5UrZURGeuifdT+9+ypNxhyJFDI8 4sQq8XG6Fdz6n2pJnnrSAwn9ouG4XYPyqM8IJliaYfI5IMHGKH3xIUGuKz66g3fWavhS /A0OUXxWNoOT2vHUlcFV8uVMC07mjuwEhq+jbqsOGz9PmWMgpzEQ4lvqdejeduYziD9+ XKNncvhp800wz1YKboRfWKrccd1FGKGe1OfBsHPQyE69W9jUemqPY2UzHKm3v5kl5uVB YONA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=QhmfWW84CRFKzl562/MmBkeBA4j+Ef4ragxrzhUeTuU=; b=HxydUeQpXkbw3G/KLagKktSPoyO0zAhepHV4sYVuewjd+gRYONdu33j/rNGDWw8c9n JNH/9VlvaRYyzcFx2TWAFv8HoX+yuKVTmxCsSzx90c4W6QotqDd9A44reY67IeKNWfrT XZoqYP0adykIYhva44RPd1jHVc8oAbXFizeiznh9YdIHyLQ+uZN5ac1Mn6ZYGn8pJ5UK QZFTuNmwVPqeTYLUJ4lJZEeMXWO0IiVXhxEES10b2jlPUOVfwiosNp8+nus716kW+kzf ty3366GhEjX/1g63mny+q7J+PnzEXAj2AwHgy6qLe8Qx1T+v/dx2E6nuf5btfEQERbKr F+tg== X-Gm-Message-State: AOAM533QkB3Lm6uRQs/WqwjJXHWqQo3KlnPzX2ukzPMusxs70STVTMQ8 3YaY71viws/zVzFlqna6uw== X-Google-Smtp-Source: ABdhPJxR0su+DdChNtYtBZ5bgNRPqB9H+CItgpGfbbmilbFcuPTMR+/JhLNerXXF4ZqZ3NlmPI13Lg== X-Received: by 2002:a05:622a:14d4:: with SMTP id u20mr13365630qtx.185.1617984343829; Fri, 09 Apr 2021 09:05:43 -0700 (PDT) Received: from gabell (209-6-122-159.s2973.c3-0.arl-cbr1.sbo-arl.ma.cable.rcncustomer.com. [209.6.122.159]) by smtp.gmail.com with ESMTPSA id e2sm2038656qto.50.2021.04.09.09.05.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Apr 2021 09:05:43 -0700 (PDT) Date: Fri, 9 Apr 2021 12:05:41 -0400 From: Masayoshi Mizuma To: Roman Gushchin Cc: Johannes Weiner , Michal Hocko , Vladimir Davydov , cgroups@vger.kernel.org, linux-mm@kvack.org Subject: Re: memcg: performance degradation since v5.9 Message-ID: <20210409160541.4tfkeex7mcfrwras@gabell> References: <20210408193948.vfktg3azh2wrt56t@gabell> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 0034290009EA X-Stat-Signature: grg8w6gp3fqswtoabetb8x4hr1c5t3nm Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf19; identity=mailfrom; envelope-from=""; helo=mail-qt1-f180.google.com; client-ip=209.85.160.180 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617984333-272876 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 08, 2021 at 01:53:47PM -0700, Roman Gushchin wrote: > On Thu, Apr 08, 2021 at 03:39:48PM -0400, Masayoshi Mizuma wrote: > > Hello, > >=20 > > I detected a performance degradation issue for a benchmark of Postgre= sSQL [1], > > and the issue seems to be related to object level memory cgroup [2]. > > I would appreciate it if you could give me some ideas to solve it. > >=20 > > The benchmark shows the transaction per second (tps) and the tps for = v5.9 > > and later kernel get about 10%-20% smaller than v5.8. > >=20 > > The benchmark does sendto() and recvfrom() system calls repeatedly, > > and the duration of the system calls get longer than v5.8. > > The result of perf trace of the benchmark is as follows: > >=20 > > =C2=A0 - v5.8 > >=20 > > syscall calls errors total min avg = max stddev > > (msec) (msec) (msec) (m= sec) (%) > > --------------- -------- ------ -------- --------- --------- ----= ----- ------ > > sendto 699574 0 2595.220 0.001 0.004 = 0.462 0.03% > > recvfrom 1391089 694427 2163.458 0.001 0.002 = 0.442 0.04% > >=20 > > =C2=A0 - v5.9 > >=20 > > syscall calls errors total min avg = max stddev > > (msec) (msec) (msec) (m= sec) (%) > > --------------- -------- ------ -------- --------- --------- ----= ----- ------ > > sendto 699187 0 3316.948 0.002 0.005 = 0.044 0.02% > > recvfrom 1397042 698828 2464.995 0.001 0.002 = 0.025 0.04% > >=20 > > =C2=A0 - v5.12-rc6 > >=20 > > syscall calls errors total min avg = max stddev > > (msec) (msec) (msec) (m= sec) (%) > > --------------- -------- ------ -------- --------- --------- ----= ----- ------ > > sendto 699445 0 3015.642 0.002 0.004 = 0.027 0.02% > > recvfrom 1395929 697909 2338.783 0.001 0.002 = 0.024 0.03% > >=20 > > I bisected the kernel patches, then I found the patch series, which a= dd > > object level memory cgroup support, causes the degradation. > >=20 > > I confirmed the delay with a kernel module which just runs > > kmem_cache_alloc/kmem_cache_free as follows. The duration is about > > 2-3 times than v5.8. > >=20 > > dummy_cache =3D KMEM_CACHE(dummy, SLAB_ACCOUNT); > > for (i =3D 0; i < 100000000; i++) > > { > > p =3D kmem_cache_alloc(dummy_cache, GFP_KERNEL); > > kmem_cache_free(dummy_cache, p); > > } > >=20 > > It seems that the object accounting work in slab_pre_alloc_hook() and > > slab_post_alloc_hook() is the overhead. > >=20 > > cgroup.nokmem kernel parameter doesn't work for my case because it di= sables > > all of kmem accounting. > >=20 > > The degradation is gone when I apply a patch (at the bottom of this e= mail) > > that adds a kernel parameter that expects to fallback to the page lev= el > > accounting, however, I'm not sure it's a good approach though... >=20 > Hello Masayoshi! >=20 > Thank you for the report! Hi! >=20 > It's not a secret that per-object accounting is more expensive than a p= er-page > allocation. I had micro-benchmark results similar to yours: accounted > allocations are about 2x slower. But in general it tends to not affect = real > workloads, because the cost of allocations is still low and tends to be= only > a small fraction of the whole cpu load. And because it brings up signif= icant > benefits: 40%+ slab memory savings, less fragmentation, more stable wor= kingset, > etc, real workloads tend to perform on pair or better. >=20 > So my first question is if you see the regression in any real workload > or it's only about the benchmark? It's only about the benchmark so far. I'll let you know if I get the issu= e with real workload. >=20 > Second, I'll try to take a look into the benchmark to figure out why it= 's > affected so badly, but I'm not sure we can easily fix it. If you have a= ny > ideas what kind of objects the benchmark is allocating in big numbers, > please let me know. The benchmark does sendto() and recvfrom() to the unix domain socket repeatedly, and kmem_cache_alloc_node()/kmem_cache_free() is called to allocate/free the socket buffers. The call graph to allocate the object is as flllows. do_syscall_64 __x64_sys_sendto __sys_sendto sock_sendmsg unix_stream_sendmsg sock_alloc_send_pskb alloc_skb_with_frags __alloc_skb kmem_cache_alloc_node kmem_cache_alloc_node()/kmem_cache_free() is called about 1,400,000 times during the benchmark and the object size is 216 byte, the GFP flag is 0x4= 00cc0: ___GFP_ACCOUNT | ___GFP_KSWAPD_RECLAIM | ___GFP_DIRECT_RECLAIM | ___GFP_= FS | ___GFP_IO I got the data by following bpftrace script. # cat kmem.bt=20 #!/usr/bin/env bpftrace tracepoint:kmem:kmem_cache_alloc_node /comm =3D=3D "pgbench"/ { @alloc[comm, args->bytes_req, args->bytes_alloc, args->gfp_flags] =3D co= unt(); } tracepoint:kmem:kmem_cache_free /comm =3D=3D "pgbench"/ { @free[comm] =3D count(); } # ./kmem.bt=20 Attaching 2 probes... ^C @alloc[pgbench, 11784, 11840, 3264]: 1 @alloc[pgbench, 216, 256, 3264]: 23 @alloc[pgbench, 216, 256, 4197568]: 1400046 @free[pgbench]: 1400560 #=20 I hope this helps... Thanks! Masa