From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D211AC64EC4 for ; Thu, 9 Mar 2023 06:47:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A2E66B0075; Thu, 9 Mar 2023 01:47:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 553096B0078; Thu, 9 Mar 2023 01:47:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3F3C06B007B; Thu, 9 Mar 2023 01:47:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 2F3B06B0075 for ; Thu, 9 Mar 2023 01:47:22 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id F1EE71604AB for ; Thu, 9 Mar 2023 06:47:21 +0000 (UTC) X-FDA: 80548428282.29.6EC18BB Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf23.hostedemail.com (Postfix) with ESMTP id 37B59140016 for ; Thu, 9 Mar 2023 06:47:18 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=Ux0bR5Pe; spf=pass (imf23.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678344440; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cmdMZM+6SpqGSq28wVt90EM4/JDjN+FBIYtz7ueAN90=; b=NGlPNHZYWay0sYi+DVqBHorD1i33hSU/iS9s3tsQDWo4TF7EWPMwdC0AYfmG9sS/sdqpP4 KPU7GttUGu3KkFYKaoHMqaBZG6qDYjcQt3a5Viz1XJ/QenWMTBOFz96escVNwwNW+OK65C a4JI3V0FoFOviYbCKyiAvpC6a6JjgC0= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=Ux0bR5Pe; spf=pass (imf23.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678344440; a=rsa-sha256; cv=none; b=QM0ykFH/Q1aOvgOaJ1CwHYVV8xF+nuV1NZlr7GcBMMDJ2i+E92DhRsWIFa726ZdEKUBclv TRrv8ji5FaV7mzS2ezZDPYDL2vdKyOdJO8EKO+FrUPlfOM8a8bZIc+zOXHi2WlUT7E+qS7 SauoFqx12OuNR1+2JSmM3+Jt58xMm9U= Received: by mail-pl1-f178.google.com with SMTP id a2so1018176plm.4 for ; Wed, 08 Mar 2023 22:47:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1678344438; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=cmdMZM+6SpqGSq28wVt90EM4/JDjN+FBIYtz7ueAN90=; b=Ux0bR5Pe93jDhgRSWqN3DzlmU6kah/8HhGTHUeqQaNqXmZKqcX7t08Zp1NbUyFivTN I+OZwx/pe7AgWvXojjRdW28iLsSoynjrtHQ5pJ50Y62MJlEI6YDON7JZF7ccwQV/p5Zb y9+mlFb/pz29UuK0ydceIEo19k0MhukWGpncZuK83T3DYsbz7UMiEouW5ejqqil8xoWq 7m3EmXX+dGwGhHLBzLmtcr5xESb5Vmw3gryOeNdfLF3bY1ubIfd9u5ePbFms9k0bsBmV CT4hcxN7Ky8wOM0raFd3XQQR/foRQErCmy5udRTV0mAIRVg8qdeRPLlSUhxi/dmanelz qjdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678344438; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=cmdMZM+6SpqGSq28wVt90EM4/JDjN+FBIYtz7ueAN90=; b=dAkEJipUJakol+lOWys2Osfbt9gCRIG6hJsw8vzhBoYIJEypjPaDsoMeH//Dnmn+7z Vdvy2wz7WZEnffbz4kR+UlMYt3ejwySb6FN/nQQavL/VHVqBrg0XA0ntDbRsVZhd7kk2 RAvyrJbI1ZoWRyGOBVp9AZuN7DX7UbCXyaHoBbPDBT817qDSnmHA7mWjUo5l6GkelSvt qfx+QiR9+eV5bzkicX/8fD2NQ4kUC5v0xM3M5jOFp67hxN8dMP6E0GXtDC7z4chZ6XdE KxeLRdUnTnXsYDMLH1WdAVoDitFQNuAfL3IwOZ+GFgDHy8LRjwQCMaMvbtx1EJNOA5ar KdFg== X-Gm-Message-State: AO0yUKXfDufVzqZi2Tm1WfAhh9KZ0QS213c36x7iA/IL/lT0WU57TX9C gGcFiIIa2E4Wo0XOAhXa4Bo6uA== X-Google-Smtp-Source: AK7set9iCKSLwl8+oQ7D3xz6a45L17ufWK9lNFjeR35FOr0lPzWz/g8xeMbZuQnpiqyR+ek/UesOpA== X-Received: by 2002:a17:902:cecc:b0:19a:7217:32af with SMTP id d12-20020a170902cecc00b0019a721732afmr23667793plg.5.1678344437819; Wed, 08 Mar 2023 22:47:17 -0800 (PST) Received: from [10.70.252.135] ([139.177.225.245]) by smtp.gmail.com with ESMTPSA id jx3-20020a170903138300b0019c919bccf8sm10915819plb.86.2023.03.08.22.47.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 08 Mar 2023 22:47:17 -0800 (PST) Message-ID: Date: Thu, 9 Mar 2023 14:47:06 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [PATCH v4 3/8] mm: vmscan: make memcg slab shrink lockless Content-Language: en-US To: Vlastimil Babka , akpm@linux-foundation.org, tkhai@ya.ru, hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com, rppt@kernel.org Cc: sultan@kerneltoast.com, dave@stgolabs.net, penguin-kernel@I-love.SAKURA.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20230307065605.58209-1-zhengqi.arch@bytedance.com> <20230307065605.58209-4-zhengqi.arch@bytedance.com> From: Qi Zheng In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 37B59140016 X-Rspam-User: X-Stat-Signature: nbkna5udujsbkskhcd9ew8az1w5zcoqj X-HE-Tag: 1678344438-930214 X-HE-Meta: U2FsdGVkX18nutxQLrv8QG3VD1/G0Kt4jpT07FBOpkQH06tRxd91o0NOVTkUiu5wi8UsUFldQ4Io0aede297Tq5eiJ3Yp9Jejmi22CCnMR8P3mmZvBh8OpDOR324yHj5uFnPeNogx1jN0aFq45nwBfmv0FK1fq7QNFafK8FfS/gCKGdZOCaGeejHje6Tw19IHXQOg8xOCJlTYj+TmJaQezpiTEs+Xr4AYULTXruiWktS6BUfXxELDb81TZk6zndyMt1wRCfPSJ6GT0xow6uWORMX5sG18M0u6UKA5LlaLtfqqqy3qTcZKAwcwlqE49KXnZnduYiabvKXUfJQhfWSVVduiNH4sdFdoVk6V21lRFedLk6B4wU5oJHT0kT8PZrXvyfX6h1pN7GAxccMRvGtWcQvVvdegiRr8gVVCX5osdpyXYAGt+C/oXrsHQGcRQunyiGNWjtwax9NUzUyJW8kpFMbisLaNvnLuKXyldyiJ08c+LI/gOJ/FN3hI3Mvyc/vpoY5YzOLlxDfN+yC9NyiqXQ1EqwnElJZqUWRq1qlr30iWsxfiwuPuQlpOFE5XSRa6as332q6z4Z0mAFRoHwMEgV3PztQTFwIH/hvwesVbLZdl9AT9wL6rNFcoEmAcs6qrRjJFNO2C5Tf8921FauZvcPERnT229eIwNZkFmd4pqPsnz3p8TnLTIWphHzeP42mJqsDyhlT5+s7iDYn2C8zD+UIjd4Tdr/sMG599mX+2B1cQad5PfzQN8VhpQErx3P5zAqT0v/dKCNHil0/zQW+tVI5/GBh2RZHcHeaEcT6lkRZcgW2+ZW+S7fS6vJAljuTdvBaRGZ5ewLJ8hiKO/LlrTK0XYK3j+lvh13YsXklP0/UreP8rPy0+VMzmOinVeSwEyoLrHRnL1fnvQRX6S+BkNltpvNIxIBSA2tK01A64+jEP6FBOpzKL8Rk4BB6jLUjJZic8rrc6DickP4yb3c sNd7UmlB C1rytwfZjiJvv1trlbPHRP9RhYaEvqb/yotSPi5NAftpu0CjyrExWLuDu0NSnttmBYoKHv6HAKCSBD2nFu4b7UE4hy49AuM7PLR2s5Q9n5PvVqlgkmQf66OyWKTx2/zlyuVZ7vex6sc0H7jYkka0AeMieay8rpQn4ceRzpgEe/Pvtb/nz4njwrtZmQOagFNS0slSVGoANvNeR9r1kkeZ0xpycD6M8CuGzWSzDIJfa9kQplclp32sN5swDcUWiDdLTFkko7neoDaswM6MwKXQblFsnYe10OPb4OWuYXDY8F+ZqdTwj9jGsnnAcn0xFjy02rwlVXtVqUDS6H9mQSBVXU6rkRr8c3BvkwKTW53bMp5QgfyGkQig/4lKQcKoiqrarngpC/REBUN67WQtyMtZmu5fEHa8kQPlLRECsPgF/9I3iF1rbwG0vEWsACw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Vlastimil, On 2023/3/9 06:46, Vlastimil Babka wrote: > On 3/7/23 07:56, Qi Zheng wrote: >> Like global slab shrink, this commit also uses SRCU to make >> memcg slab shrink lockless. >> >> We can reproduce the down_read_trylock() hotspot through the >> following script: >> >> ``` >> >> DIR="/root/shrinker/memcg/mnt" >> >> do_create() >> { >> mkdir -p /sys/fs/cgroup/memory/test >> mkdir -p /sys/fs/cgroup/perf_event/test >> echo 4G > /sys/fs/cgroup/memory/test/memory.limit_in_bytes >> for i in `seq 0 $1`; >> do >> mkdir -p /sys/fs/cgroup/memory/test/$i; >> echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; >> echo $$ > /sys/fs/cgroup/perf_event/test/cgroup.procs; >> mkdir -p $DIR/$i; >> done >> } >> >> do_mount() >> { >> for i in `seq $1 $2`; >> do >> mount -t tmpfs $i $DIR/$i; >> done >> } >> >> do_touch() >> { >> for i in `seq $1 $2`; >> do >> echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; >> echo $$ > /sys/fs/cgroup/perf_event/test/cgroup.procs; >> dd if=/dev/zero of=$DIR/$i/file$i bs=1M count=1 & >> done >> } >> >> case "$1" in >> touch) >> do_touch $2 $3 >> ;; >> test) >> do_create 4000 >> do_mount 0 4000 >> do_touch 0 3000 >> ;; >> *) >> exit 1 >> ;; >> esac >> ``` >> >> Save the above script, then run test and touch commands. >> Then we can use the following perf command to view hotspots: >> >> perf top -U -F 999 >> >> 1) Before applying this patchset: >> >> 32.31% [kernel] [k] down_read_trylock >> 19.40% [kernel] [k] pv_native_safe_halt >> 16.24% [kernel] [k] up_read >> 15.70% [kernel] [k] shrink_slab >> 4.69% [kernel] [k] _find_next_bit >> 2.62% [kernel] [k] shrink_node >> 1.78% [kernel] [k] shrink_lruvec >> 0.76% [kernel] [k] do_shrink_slab >> >> 2) After applying this patchset: >> >> 27.83% [kernel] [k] _find_next_bit >> 16.97% [kernel] [k] shrink_slab >> 15.82% [kernel] [k] pv_native_safe_halt >> 9.58% [kernel] [k] shrink_node >> 8.31% [kernel] [k] shrink_lruvec >> 5.64% [kernel] [k] do_shrink_slab >> 3.88% [kernel] [k] mem_cgroup_iter >> >> At the same time, we use the following perf command to capture >> IPC information: >> >> perf stat -e cycles,instructions -G test -a --repeat 5 -- sleep 10 >> >> 1) Before applying this patchset: >> >> Performance counter stats for 'system wide' (5 runs): >> >> 454187219766 cycles test ( +- 1.84% ) >> 78896433101 instructions test # 0.17 insn per cycle ( +- 0.44% ) >> >> 10.0020430 +- 0.0000366 seconds time elapsed ( +- 0.00% ) >> >> 2) After applying this patchset: >> >> Performance counter stats for 'system wide' (5 runs): >> >> 841954709443 cycles test ( +- 15.80% ) (98.69%) >> 527258677936 instructions test # 0.63 insn per cycle ( +- 15.11% ) (98.68%) >> >> 10.01064 +- 0.00831 seconds time elapsed ( +- 0.08% ) >> >> We can see that IPC drops very seriously when calling >> down_read_trylock() at high frequency. After using SRCU, >> the IPC is at a normal level. > > The interpretation looks somewhat weird to me. I'd say the workload is > stalled a lot as it fails the trylock (there might be some optimistic > spinning perhaps) and then goes to sleep. See how "pv_native_safe_halt" is > also more prominent in before. And because of that sleeping, there's less > instructions executed in the same amount of cycles (as it's a system wide > collection, otherwise it wouldn't be collecting the sleeping processes). But in my tests, the trylock basically did not fail, so I think it is caused by high-frequency atomic operation. bpftrace -e 'kr:down_read_trylock {@[kstack, retval]=count();} interval:s:1 {exit();}' Attaching 2 probes... <...> @[ shrink_slab+288 shrink_node+640 do_try_to_free_pages+203 try_to_free_mem_cgroup_pages+266 try_charge_memcg+412 charge_memcg+51 __mem_cgroup_charge+44 __handle_mm_fault+2119 handle_mm_fault+272 do_user_addr_fault+712 exc_page_fault+124 asm_exc_page_fault+38 clear_user_erms+14 read_zero+86 vfs_read+173 ksys_read+93 do_syscall_64+56 entry_SYSCALL_64_after_hwframe+99 , 1]: 617019 @[ shrink_slab+288 shrink_node+640 do_try_to_free_pages+203 try_to_free_mem_cgroup_pages+266 try_charge_memcg+412 charge_memcg+51 __mem_cgroup_charge+44 shmem_add_to_page_cache+545 shmem_get_folio_gfp+621 shmem_write_begin+95 generic_perform_write+257 __generic_file_write_iter+202 generic_file_write_iter+97 vfs_write+704 ksys_write+93 do_syscall_64+56 entry_SYSCALL_64_after_hwframe+99 , 1]: 617065 > >> >> Signed-off-by: Qi Zheng > > Other than that: > > Acked-by: Vlastimil Babka Thanks. > > A small thing below: > >> --- >> mm/vmscan.c | 46 +++++++++++++++++++++++++++------------------- >> 1 file changed, 27 insertions(+), 19 deletions(-) >> >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index 8515ac40bcaf..1de9bc3e5aa2 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -57,6 +57,7 @@ >> #include >> #include >> #include >> +#include > > I guess this should have been in patch 2/8 already? It may work accidentaly > because some other header pulls it transitively... Yeah, in fact, patch 3/8 also can compile successfully without srcu.h, but maybe it is better to explicitly include this header file, I will add it in patch 2/8. Thanks, Qi > >