From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 141DBC678D5 for ; Wed, 8 Mar 2023 22:46:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5DB176B0072; Wed, 8 Mar 2023 17:46:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 58B6F6B0075; Wed, 8 Mar 2023 17:46:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 452F7280002; Wed, 8 Mar 2023 17:46:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 32B666B0072 for ; Wed, 8 Mar 2023 17:46:51 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 045CDC10A9 for ; Wed, 8 Mar 2023 22:46:50 +0000 (UTC) X-FDA: 80547217422.30.D109FC8 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf29.hostedemail.com (Postfix) with ESMTP id 0D1F5120018 for ; Wed, 8 Mar 2023 22:46:47 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="koWtB/4D"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=c9os9A7D; spf=pass (imf29.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678315608; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=y7LeakoGyD3yVjPgAa69ZvLabEJu/VqZA8a/dYrxfxc=; b=HPyHTS7irEuVkQe1hyTc5EcLKPl4M0w5KjumOSHFLGL8BmSDXAtmMV3jhCEUZtkKT11zdA i4zPHqPFQ8xGUFJSXPF4U8sTGXDoLkA27o/MP5YrJg2JWV9QmQgBU07svyRR/fIXYWk2Mo NuRkI39YF2SAP3xTl4wGCHOXf+xSJH4= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="koWtB/4D"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=c9os9A7D; spf=pass (imf29.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678315608; a=rsa-sha256; cv=none; b=0qHFwIpzhS8t+kjx5K3HT77M5RW1EA9zKVMMmmd3tr3Tv/2oLf+HNlGCHkJGUH5TYKPg4v P6JqRiAn1Rv9KR9a5l7aqWQXIkn0AUnDIj3KNYbxgDZmxY1/p3WPi2Glqstr5Yp8rsCkUJ Nk8LxkrzHKjrJNhEQ6Tbls6IAHGqG4M= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id B6938219EE; Wed, 8 Mar 2023 22:46:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1678315606; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y7LeakoGyD3yVjPgAa69ZvLabEJu/VqZA8a/dYrxfxc=; b=koWtB/4D1ryptHKQON76m6fQF7XjMxOf2oMmzfCXjndNnPdjMm1sIbzREydRQCiBqEgjC3 +Y9wL66I4WqtFxv67mkJG0p6/t2NAOiKHWmDOQmQhqTfxw/ADyx0LhIEfTTsJvHrlqasbU 2dGx9LoE1OYW6BJAnJtZccF+4W3gfTg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1678315606; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y7LeakoGyD3yVjPgAa69ZvLabEJu/VqZA8a/dYrxfxc=; b=c9os9A7D0Qhs9u7xoDGGglxFSe+zu3ibhjGMb2YRjOkcerTaBMT+Ew+u6vKAsvO7HVXuB+ /+dhJVgKsC4h8sAw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 677A51391B; Wed, 8 Mar 2023 22:46:46 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id npeFGFYQCWQZbQAAMHmgww (envelope-from ); Wed, 08 Mar 2023 22:46:46 +0000 Message-ID: Date: Wed, 8 Mar 2023 23:46:46 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [PATCH v4 3/8] mm: vmscan: make memcg slab shrink lockless Content-Language: en-US To: Qi Zheng , akpm@linux-foundation.org, tkhai@ya.ru, hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com, rppt@kernel.org Cc: sultan@kerneltoast.com, dave@stgolabs.net, penguin-kernel@I-love.SAKURA.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20230307065605.58209-1-zhengqi.arch@bytedance.com> <20230307065605.58209-4-zhengqi.arch@bytedance.com> From: Vlastimil Babka In-Reply-To: <20230307065605.58209-4-zhengqi.arch@bytedance.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 0D1F5120018 X-Stat-Signature: d9qs66se4grf4r4y51yobxxfuizf7agb X-Rspam-User: X-HE-Tag: 1678315607-187569 X-HE-Meta: U2FsdGVkX1/KKDt526i/+KHEWS2JFX507O/0fTmcHCTwlmeXmrXdQEwDlm3pkEuI8KHWsQGBQ6YfPg3akwn7Cu3MTGN6BFG1ryWgUYtE3mJwfDyUNkqLBIWgaPq+vuWSBCq1WvMvMJpJm/kjsZVoqfYX1RFgHhuXWcPY1iX5A5jduW2Krq8EaDCHKeBKmdiJosizTHaBkK7ee6uYGaeaYn198C1Bz1I3xmeD0DBX2gXoWbQD62Y7mb3TuyW8v0forxMN0KyXYl7rTx0to9Z5E9bp+cHBnl84rpew5UNvZDGvRcUQDDLtAy2Z/M6FQo3IQN8E2cA9+iI0g/5NJTizcfu/pjpkvDqczmlsx2ee7rm/Pqwk7xDEgCFKg6Qb2u+ePAz/Df/XqGLwOyWSJJHLJaxEXJ8+DP53yTiOvuDuZGprl/sHSHmSVW4+6DJZLHKECq5blO4K5ttJCkCkF0Bd6cs8BwdGpA2nZipIgtEEAPFRfsR8pIRBQqJqqi0LUCx+jYJ0Sn+JBqsqIxVFkDY2MyTzMx2s/y6yOsssRdSDJUgZmJJggztK1qVOVKRTKiGTTqB8xRhsEYOSd4snb8y1A+S4a61fWrlh6M8hCMpFvvenmuKgk8s7yE4GgqSlKidbVkenCZ0l8MQ1ANVMqiLa9XXrJ7bi37r6jor3lYlTzY0OlMnSdKT3UKnCq/BNdnqtXhiDYM04Vo1hXUaq7sy7qJ8xhmm119XOMu4s9jm7Y5zrcfD+FQLStGyyS/RQ0od3Zdb1fum61C3qPNxFy7BiVvTdw+zG0zsiRJX568u7wM1RBB1GcaSTilblxvMi3PoIoOsFo8BtjyPNTBVMaPkHRb0rXDBO0UgL1BzEmY/lyxPJLoulXqYi0Hm2IhShwwWoG4wRA8BMvbt26mOMcAC58HcOwjsNdf1pTt4LtfF2LlS1KpNWAeBzABNjxaQgBGQ/vZkoLaVdX45gWOEhfXC k5rqDFs6 tX54aSUTBcUCM4/hNKDllKQ0T8gJHMq4BpRZdFq94265FOn51OF4Q6NXDa7LCcV4X0sWur5id74yEVbHAoCSkFQlEV0xBRbOKSWvifC/8sgULV7P9b7xvtFSbsZULZfqVLHVk34QBoKWntf7lOaeoUTBAJmEWH5nSw9kbmn7Oyc4i4Punrwum/pP+a1VwTxRPIUuYfuTXZBz4xgDn84QxgAYjZILV5HU2k/NrGSEZACd/iJ5CY7I1InvEojTDLeR8qz/derNJkuBnX2yRIkri9DDt5A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 3/7/23 07:56, Qi Zheng wrote: > Like global slab shrink, this commit also uses SRCU to make > memcg slab shrink lockless. > > We can reproduce the down_read_trylock() hotspot through the > following script: > > ``` > > DIR="/root/shrinker/memcg/mnt" > > do_create() > { > mkdir -p /sys/fs/cgroup/memory/test > mkdir -p /sys/fs/cgroup/perf_event/test > echo 4G > /sys/fs/cgroup/memory/test/memory.limit_in_bytes > for i in `seq 0 $1`; > do > mkdir -p /sys/fs/cgroup/memory/test/$i; > echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; > echo $$ > /sys/fs/cgroup/perf_event/test/cgroup.procs; > mkdir -p $DIR/$i; > done > } > > do_mount() > { > for i in `seq $1 $2`; > do > mount -t tmpfs $i $DIR/$i; > done > } > > do_touch() > { > for i in `seq $1 $2`; > do > echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; > echo $$ > /sys/fs/cgroup/perf_event/test/cgroup.procs; > dd if=/dev/zero of=$DIR/$i/file$i bs=1M count=1 & > done > } > > case "$1" in > touch) > do_touch $2 $3 > ;; > test) > do_create 4000 > do_mount 0 4000 > do_touch 0 3000 > ;; > *) > exit 1 > ;; > esac > ``` > > Save the above script, then run test and touch commands. > Then we can use the following perf command to view hotspots: > > perf top -U -F 999 > > 1) Before applying this patchset: > > 32.31% [kernel] [k] down_read_trylock > 19.40% [kernel] [k] pv_native_safe_halt > 16.24% [kernel] [k] up_read > 15.70% [kernel] [k] shrink_slab > 4.69% [kernel] [k] _find_next_bit > 2.62% [kernel] [k] shrink_node > 1.78% [kernel] [k] shrink_lruvec > 0.76% [kernel] [k] do_shrink_slab > > 2) After applying this patchset: > > 27.83% [kernel] [k] _find_next_bit > 16.97% [kernel] [k] shrink_slab > 15.82% [kernel] [k] pv_native_safe_halt > 9.58% [kernel] [k] shrink_node > 8.31% [kernel] [k] shrink_lruvec > 5.64% [kernel] [k] do_shrink_slab > 3.88% [kernel] [k] mem_cgroup_iter > > At the same time, we use the following perf command to capture > IPC information: > > perf stat -e cycles,instructions -G test -a --repeat 5 -- sleep 10 > > 1) Before applying this patchset: > > Performance counter stats for 'system wide' (5 runs): > > 454187219766 cycles test ( +- 1.84% ) > 78896433101 instructions test # 0.17 insn per cycle ( +- 0.44% ) > > 10.0020430 +- 0.0000366 seconds time elapsed ( +- 0.00% ) > > 2) After applying this patchset: > > Performance counter stats for 'system wide' (5 runs): > > 841954709443 cycles test ( +- 15.80% ) (98.69%) > 527258677936 instructions test # 0.63 insn per cycle ( +- 15.11% ) (98.68%) > > 10.01064 +- 0.00831 seconds time elapsed ( +- 0.08% ) > > We can see that IPC drops very seriously when calling > down_read_trylock() at high frequency. After using SRCU, > the IPC is at a normal level. The interpretation looks somewhat weird to me. I'd say the workload is stalled a lot as it fails the trylock (there might be some optimistic spinning perhaps) and then goes to sleep. See how "pv_native_safe_halt" is also more prominent in before. And because of that sleeping, there's less instructions executed in the same amount of cycles (as it's a system wide collection, otherwise it wouldn't be collecting the sleeping processes). > > Signed-off-by: Qi Zheng Other than that: Acked-by: Vlastimil Babka A small thing below: > --- > mm/vmscan.c | 46 +++++++++++++++++++++++++++------------------- > 1 file changed, 27 insertions(+), 19 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 8515ac40bcaf..1de9bc3e5aa2 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -57,6 +57,7 @@ > #include > #include > #include > +#include I guess this should have been in patch 2/8 already? It may work accidentaly because some other header pulls it transitively...