From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62C83C00528 for ; Tue, 1 Aug 2023 16:27:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DA40B940023; Tue, 1 Aug 2023 12:27:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D533B940010; Tue, 1 Aug 2023 12:27:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C1CEE940023; Tue, 1 Aug 2023 12:27:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B3B14940010 for ; Tue, 1 Aug 2023 12:27:45 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 66CE4A056A for ; Tue, 1 Aug 2023 16:27:45 +0000 (UTC) X-FDA: 81076066890.07.23C5595 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf08.hostedemail.com (Postfix) with ESMTP id CB35516562A for ; Tue, 1 Aug 2023 16:23:58 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=I+JPhdeN; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf08.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690907040; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Pz1l+QMKxerQA4dFIRKVby7zK5mB68xPOHSaoNz0WSY=; b=aESS1wS+nT00F+Yn/vGMb0kYw7Rai6Q9YEAxTqhhCxyoN0Xpv3X5+6Exs3FQYZuPqBfvIM VrKpUwyJTY8NBJp/XqCSb59kCJ74xNbBbbywN/67LoCBLaOn4wV3TnYgXl9iXZELxsWokD hWf86oC5DU5iPHLRYTm5C+NsQCNSC6k= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=I+JPhdeN; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf08.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690907040; a=rsa-sha256; cv=none; b=ifqdkQuaupLCktRvgpftLbwlKPnOqnS6IrUxI3/zaaUJuVJOwgxca6sZx6df2x+A+Ax3WB mAcTqcs3Udyp0dWPzQ54oLUk9NuGPUT6X6vEZSkJfyRHnjqNabU+yQgJ7F49/W1U6Hv+5U VMpwY7Vu5pevRBPirR6Dzrte3FErXtw= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 2B1B41FD5F; Tue, 1 Aug 2023 14:30:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1690900239; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Pz1l+QMKxerQA4dFIRKVby7zK5mB68xPOHSaoNz0WSY=; b=I+JPhdeNfZHKugkMSm3dRiHW3/+V5hLTM6bz+YUA2ilPxJsvrB1WGeE5MI4+Ty4K/HJ4DI +LGu6r+O8WcwHc1IYlRKJdWvXrBblVNgxExfag+U0p/0INLcESk4JWw8UQa0vrSwpThptj ADb/Vn0X+V4ZvO4iLQqkjORvST1JclQ= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 1DFBD13919; Tue, 1 Aug 2023 14:30:39 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id of6nBg8XyWT/ZQAAMHmgww (envelope-from ); Tue, 01 Aug 2023 14:30:39 +0000 Date: Tue, 1 Aug 2023 16:30:38 +0200 From: Michal Hocko To: Yosry Ahmed Cc: Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3] mm: memcg: use rstat for non-hierarchical stats Message-ID: References: <20230726153223.821757-1-yosryahmed@google.com> <20230726153223.821757-2-yosryahmed@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230726153223.821757-2-yosryahmed@google.com> X-Rspamd-Queue-Id: CB35516562A X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 7er5mhs99npw938xscr9f347npist66f X-HE-Tag: 1690907038-197526 X-HE-Meta: U2FsdGVkX1/mqvo3WnGDF8XJaJ1F/UU8M4M9OdCWN4soZkT9TdOfzmpa5CNXtTW6jSsg5c27yF1eq2zO/nQaJylhLbfxcyJZN8fCQSdBBR5zWUvqthsFVgBWHuWRcCCZ4bHGRA+Rm5MC5mzRZDJHIdmoyEp+UAZRjNwAsM1WxhApKBf22Ghlx3zt0VArp7DdH9S5FxOk0phCicJktxA6Oe4+PO/R+SSXGa1YoJhK30X0z4bAugfCiop0XpNCdDnkSLkxL8zUIKphIrXxcRi463L5OoV3VgYl7ojUtGmcRLlF10tfOIWftz4u9p4XkmpH3bORSpS9d/k2yu16rK03JDRet5LUxaC1mgSsXZ7FoHLVhLk1cfaWyhsKIDD4niby1mqSDecQhLBZ9UcNdPO9/06vgrRHXMZhnLkufthvPk6pADOivqtEvzljC8WqEyEarCkJ+XBb8ioVk/z2MQ1pmjSePfCGji4WReC1wEl064SD5d4nyrYF5KW1GzY62N7ePXnThlEAPmnVwH14vPaAy9j9JOps2L5qBKoh60z5NwRRT/SUwfOn32GZK5uJ9u33Aagn6tIhS0XIwOMPmaPeT+F6vgN7d7b+A275cyinlzjnNrW6e/B0kVLGvZJ4nNPdOGGm1nwHAuhjWOKwMCthIW7uHX379Ppr5USuRbL+mus24b3iCeqWYsCcS2CwdwROs8c4XLcHf5ffvIv/m/kBaX2wrtLn2rOtKZGRVnOfL/8b21D0bUgFtLtAC/Oc2gsa7FLD0jUyx0VjF4TQzVVNgSJASvLT5cTOYuHGv6s9ScIkoMdGVhrXJNIjThh7fJPMMxOdHR+fcAEakL2EUWSMEQRHkmb+U7XDWAilpFzMRNOzfe1eN0Gz0lrNUE30Yi3xiP3aOadXNK3HPpart4bCeXRMo7C082UhR50d2qPwFJnOM/pzVxCI/AMulokh4WgltyGVkbPhp50M3l1eE1g bnktxf+4 WvsJWMawQ6vpZFiB/huXmnuAy91OzozBw3QH+Mg+/FnQLDPOKVq1mdVnAuqKMmRAGLbAaDDnTaJkqwPgWkmwIJX5CwbO3E2aZzox+LYa4W5pgXYEw073BZJI5N9F2wYDfHC5vKxbIWSfonbbR9b1xFbnsGQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed 26-07-23 15:32:23, Yosry Ahmed wrote: > Currently, memcg uses rstat to maintain aggregated hierarchical stats. > Counters are maintained for hierarchical stats at each memcg. Rstat > tracks which cgroups have updates on which cpus to keep those counters > fresh on the read-side. > > Non-hierarchical stats are currently not covered by rstat. Their > per-cpu counters are summed up on every read, which is expensive. > The original implementation did the same. At some point before rstat, > non-hierarchical aggregated counters were introduced by > commit a983b5ebee57 ("mm: memcontrol: fix excessive complexity in > memory.stat reporting"). However, those counters were updated on the > performance critical write-side, which caused regressions, so they were > later removed by commit 815744d75152 ("mm: memcontrol: don't batch > updates of local VM stats and events"). See [1] for more detailed > history. > > Kernel versions in between a983b5ebee57 & 815744d75152 (a year and a > half) enjoyed cheap reads of non-hierarchical stats, specifically on > cgroup v1. When moving to more recent kernels, a performance regression > for reading non-hierarchical stats is observed. > > Now that we have rstat, we know exactly which percpu counters have > updates for each stat. We can maintain non-hierarchical counters again, > making reads much more efficient, without affecting the performance > critical write-side. Hence, add non-hierarchical (i.e local) counters > for the stats, and extend rstat flushing to keep those up-to-date. > > A caveat is that we now need a stats flush before reading > local/non-hierarchical stats through {memcg/lruvec}_page_state_local() > or memcg_events_local(), where we previously only needed a flush to > read hierarchical stats. Most contexts reading non-hierarchical stats > are already doing a flush, add a flush to the only missing context in > count_shadow_nodes(). > > With this patch, reading memory.stat from 1000 memcgs is 3x faster on a > machine with 256 cpus on cgroup v1: > # for i in $(seq 1000); do mkdir /sys/fs/cgroup/memory/cg$i; done > # time cat /dev/cgroup/memory/cg*/memory.stat > /dev/null > real 0m0.125s > user 0m0.005s > sys 0m0.120s > > After: > real 0m0.032s > user 0m0.005s > sys 0m0.027s Have you measured any potential regression for cgroup v2 which collects all this data without ever using it (AFAICS)? -- Michal Hocko SUSE Labs