From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AB26C433F5 for ; Mon, 16 May 2022 14:35:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 569D86B0072; Mon, 16 May 2022 10:35:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 519F56B0073; Mon, 16 May 2022 10:35:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3BB7C6B0074; Mon, 16 May 2022 10:35:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2D7086B0072 for ; Mon, 16 May 2022 10:35:03 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E782120F77 for ; Mon, 16 May 2022 14:35:02 +0000 (UTC) X-FDA: 79471853244.23.BA62BD9 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf21.hostedemail.com (Postfix) with ESMTP id 1D3251C001E for ; Mon, 16 May 2022 14:34:52 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C56F71FB2E; Mon, 16 May 2022 14:35:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1652711700; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=W7ceyTjxid5l9ZGGTqZ+wt/EPfbFdKSX+/F4lyNdDKg=; b=g3iVVHUtSyQWBN3Y59HjcBDj1p/IZFTO1o2s8CWmnszg6wBpymi4LOQIZAcNlK/seWQfW5 OTdSKo9VL5YnbQyO5U7Jfrl+JZ6OG/lh8V2pkehTwaRN9l9VeiSd8KtqESYl+sNjIGZ0wr /r1jDJWacCJhVoSdrQMTgwD1CqqpyIo= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 883E513AAB; Mon, 16 May 2022 14:35:00 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id KL9RIBRhgmL0GwAAMHmgww (envelope-from ); Mon, 16 May 2022 14:35:00 +0000 Date: Mon, 16 May 2022 16:34:59 +0200 From: Michal =?iso-8859-1?Q?Koutn=FD?= To: Johannes Weiner Cc: Andrew Morton , Michal Hocko , Roman Gushchin , Shakeel Butt , Seth Jennings , Dan Streetman , Minchan Kim , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH v2 6/6] zswap: memcg accounting Message-ID: <20220516143459.GA17557@blackbody.suse.cz> References: <20220510152847.230957-1-hannes@cmpxchg.org> <20220510152847.230957-7-hannes@cmpxchg.org> <20220511173218.GB31592@blackbody.suse.cz> <20220513151426.GC16096@blackbody.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Stat-Signature: kyc5bdsmkyz5b1t96eqp48fch48n1nei X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 1D3251C001E Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=g3iVVHUt; spf=pass (imf21.hostedemail.com: domain of mkoutny@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mkoutny@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com X-Rspam-User: X-HE-Tag: 1652711692-759629 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, May 13, 2022 at 01:08:13PM -0400, Johannes Weiner wrote: > Right, it's accounted as a subset rather than fully disjointed. But it > is a limitable counter of its own, so I exported it as such, with a > current and a max knob. This is comparable to the kmem counter in v1. That counter and limit didn't turn out well. I liked the analogy to writeback (and dirty limit) better. > From an API POV it would be quite strange to have max for a counter > that has no current. Likewise it would be strange for a major memory > consumer to be missing from memory.stat. My understanding would be to have all memory.stat entries as you propose, no extra .current counter and the .max knob for zswap configuration. > It needs to be configured to the workload's access frequency curve, > which can be done with trial-and-error (reasonable balance between > zswpins and pswpins) or in a more targeted manner using tools such as > page_idle, damon etc. > [...] > Because for load tuning, bytes make much more sense. That's how you > measure the workingset, so a percentage is an awkward indirection. At > the cgroup level, it makes even less sense: all memcg tunables are in > bytes, it would be quite weird to introduce a "max" that is 0-100. Add > the confusion of how percentages would propagate down the hierarchy... Thanks for the explanation. I guess there's no simple tranformation of in-kernel available information that'd allow a more semantic configuration of this value. The rather crude absolute value requires (but also simply allows) some calibration or responsive tuning. > I don't traverse all ancestors, I bail on disabled groups and skip > unlimited ones. I admit I missed that. > Flushing unnecessary groups with a ratelimit doesn't sound like an > improvement to me. Then I'm only concerned about a situation when there's a single deep memcg that undergoes both workingset_refault() and zswap querying. The latter (bare call to cgroup_rstat_flush()) won't reset stats_flush_threshold, so the former (or the async flush more likely) would attempt a flush too. The flush work (on the leaf memcg) would be done twice even though it may be within the tolerance of cumulated error the second time. This is a thing that might require attention in the future (depending on some data how it actually performs). I see how the current approach is justified. Regards, Michal