From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A57AAC433EF for ; Fri, 25 Mar 2022 10:31:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D739B6B0071; Fri, 25 Mar 2022 06:31:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D223B6B0073; Fri, 25 Mar 2022 06:31:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC34E6B0074; Fri, 25 Mar 2022 06:31:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id AE9F46B0071 for ; Fri, 25 Mar 2022 06:31:23 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 8E4C8120C38 for ; Fri, 25 Mar 2022 10:31:23 +0000 (UTC) X-FDA: 79282541646.06.5F5316E Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf08.hostedemail.com (Postfix) with ESMTP id C18FB160039 for ; Fri, 25 Mar 2022 10:31:21 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 315BB210DD; Fri, 25 Mar 2022 10:31:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1648204280; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=h6jomU3pNhNfJYGm9LatF9aqOCiUlgqkCJJ0ZhBuK44=; b=OsMPUua2iforFSm/UO/9WA9UJuZ+xXzZyx7U7SbDGrfPWJ19fd6uUDvHS5HlFiizH8bmb/ p2dURGMdjGTgK315FM0aQYhh88yWszyYJUdLrvzWW2gJttjV5HsA9O+13YgiZIiNGIXf9t fn7HnjdhscuOqORQ1tnVhDPzk3vTItA= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B7E901332D; Fri, 25 Mar 2022 10:31:19 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id /5V/K/eZPWLFHgAAMHmgww (envelope-from ); Fri, 25 Mar 2022 10:31:19 +0000 Date: Fri, 25 Mar 2022 11:31:18 +0100 From: Michal =?iso-8859-1?Q?Koutn=FD?= To: Roman Gushchin Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Richard Palethorpe , Andrew Morton , Shakeel Butt , Michal Hocko , Vlastimil Babka , "Matthew Wilcox (Oracle)" , Muchun Song , Johannes Weiner , Yang Shi , Suren Baghdasaryan , Tejun Heo , Chris Down Subject: Re: [RFC PATCH] mm: memcg: Do not count memory.low reclaim if it does not happen Message-ID: <20220325103118.GC2828@blackbody.suse.cz> References: <20220324095157.GA16685@blackbody.suse.cz> <5049EBC3-5BAE-4509-BA63-1F4A7D913517@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5049EBC3-5BAE-4509-BA63-1F4A7D913517@linux.dev> User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: C18FB160039 X-Stat-Signature: inbgkngmd7dgsiz1w3sixejcph9n7jam X-Rspam-User: Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=OsMPUua2; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf08.hostedemail.com: domain of mkoutny@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mkoutny@suse.com X-HE-Tag: 1648204281-592386 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 24, 2022 at 11:17:14AM -0700, Roman Gushchin wrote: > Ok, so it’s not really about the implementation details of the reclaim > mechanism (I mean rounding up to the batch size etc), Actually, that was what I deemed more serious first. It's the point 2 of RFCness: | 2) The observed behavior slightly impacts distribution of parent's memory.low. | Constructed example is a passive protected workload in s1 and active in s2 | (active ~ counteracts the reclaim with allocations). It could strip | protection from s1 one by one (one:=SWAP_CLUSTER_MAX/2^sc.priority). | That may be considered both wrong (s1 should have been more protected) or | correct s2 deserves protection due to its activity. | I don't have (didn't collect) data for this, so I think just masking the | false events is sufficient (or independent). > Idk, I don’t have a strong argument against this change (except that > it changes the existing behavior), but I also don’t see why such > events are harmful. Do you mind elaborating a bit more? So I've collected some demo data now. systemd-run \ -u precious.service --slice=test-protected.slice \ -p MemoryLow=50M \ /root/memeater 50 # allocates 50M anon, doesn't use it systemd-run \ -u victim.service --slice=test-protected.slice \ -p MemoryLow=0M \ /root/memeater -m 50 50 # allocates 50M anon, uses it echo "Started workloads" systemctl set-property --runtime test.slice MemoryMax=200M systemctl set-property --runtime test-protected.slice MemoryLow=50M sleep 5 systemd-run \ -u pressure.service --slice=test.slice \ -p MemorySwapMax=0M \ # to push test-protected.slice to swap /root/memeater -m 170 170 sleep 5 systemd-cgtop -b -1 -m test.slice Result with memory_recursiveprot > Control Group Tasks %CPU Memory Input/s Output/s > test.slice 3 - 199.9M - - > test.slice/pressure.service 1 - 170.5M - - > test.slice/test-protected.slice 2 - 29.4M - - > test.slice/test-protected.slice/victim.service 1 - 29.1M - - > test.slice/test-protected.slice/precious.service 1 - 292.0K - - Result without memory_recursiveprot > Control Group Tasks %CPU Memory Input/s Output/s > test.slice 3 - 199.8M - - > test.slice/pressure.service 1 - 170.5M - - > test.slice/test-protected.slice 2 - 29.3M - - > test.slice/test-protected.slice/precious.service 1 - 28.7M - - > test.slice/test-protected.slice/victim.service 1 - 560.0K - - (kernel 5.17.0, systemd 249.10) So with this result, I'd say the event reporting is an independent change (admiteddly, thanks to the current implementation (not the proposal of mine) I noticed this issue). /me scratches head, let me review my other approaches... Michal