From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E9C0C61D9C for ; Wed, 22 Nov 2023 10:02:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DA1A86B05E0; Wed, 22 Nov 2023 05:02:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D513C6B05E1; Wed, 22 Nov 2023 05:02:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C40B96B05E2; Wed, 22 Nov 2023 05:02:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B4C7F6B05E0 for ; Wed, 22 Nov 2023 05:02:40 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8756B120780 for ; Wed, 22 Nov 2023 10:02:40 +0000 (UTC) X-FDA: 81485150880.27.A3F0C66 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf06.hostedemail.com (Postfix) with ESMTP id 657E0180028 for ; Wed, 22 Nov 2023 10:02:38 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=nmnKKEye; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf06.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700647358; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=V8UXEq4tG3CrvcnhAyz5zGu6O+LKydPLS1VtSC8nRHw=; b=eYDSPFKpIdxplcltbdQeug7NrVxZQc9zszrzuNTLYhwMdFQIXcnPq05EBxhdFCxP33g/fg TFVvOwhMcf19xfmQ6CuyBLToCR2YRybJ01yuE8GIm6PlNUzr+8Dlid+0ZVTkQfFSzadIJR T/S2yf3uBtEAN2PIsOOB/C+bru/5ISk= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=nmnKKEye; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf06.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700647358; a=rsa-sha256; cv=none; b=BUQSLdvQPXcMF7c2U6WHwL3W3XShRBsLUBW0LFeiHN2FBZYTKwTT6hjtuqBcWgsxE6Xlgn Aq/lg8GmePqyw0bKaKRz4AXklnFOJqS2UtWDbz7pAZjNwunHdqKfkxhoyYMxVngkoJZwau 5IC+Rr7V7OYX14t14NsDHJe/nUEvPE0= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 49CBB21907; Wed, 22 Nov 2023 10:02:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1700647356; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=V8UXEq4tG3CrvcnhAyz5zGu6O+LKydPLS1VtSC8nRHw=; b=nmnKKEyemWfQTDRXkITM5LIaTg5DlCCv+QKEyJ9vjx/Oy7fEYU67YOosnuamLdKXrMi6wY rVhDixsmsmj1NmxQjJg5SeZhvCV/N7Ru6goNgBiReXpVfwbcajbDRq1Agx3jWx3PBarm/H fomFDLdRmK6VCEFIq9oe3wLUKTNJYoc= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 0DDBE13461; Wed, 22 Nov 2023 10:02:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id gdvOOrvRXWXTCQAAMHmgww (envelope-from ); Wed, 22 Nov 2023 10:02:35 +0000 Date: Wed, 22 Nov 2023 11:02:35 +0100 From: Michal Hocko To: Chengming Zhou Cc: LKML , linux-mm , jack@suse.cz, Tejun Heo , Johannes Weiner , Christoph Hellwig , shr@devkernel.io, neilb@suse.de Subject: Re: Question: memcg dirty throttle caused by low per-memcg dirty thresh Message-ID: References: <109029e0-1772-4102-a2a8-ab9076462454@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <109029e0-1772-4102-a2a8-ab9076462454@linux.dev> X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 657E0180028 X-Stat-Signature: o9cqrwyfcay3un6pg5tqia94s94batbb X-HE-Tag: 1700647358-191899 X-HE-Meta: U2FsdGVkX19s3X5Jh01A0NW3OJ+vSMF+MB6UkMz8PJUwNivM2fPAsUK849K5SBIza1S0hN/t3ZQt61GQZThq4jYDQYUUo+udbe9nUcGYD/dfsl0Arz5QDwRmd0+wzAjnIuLUAgvdukV8jRqnE7x0YYgbWMTaaOlQH4Sjew8uqGr9zViAZTT9Om9SlqOpLkOKm898S7NQTvhYBw58ZQBCLbWZ2dCe34L9WhbWCS/aTj9/3FmGOM2DHTc6r271f7W4bFc+P/JOQe18tuUPi1qSzhokzAQoTTvI/8xHnPF8YAX5Vc7p+UWR+F6vpS7zkcc3sSNljACAbQ+iYqtxKPwzx7wIkGbJKmZmP3eVvjCn56H0FsNlXNZtKLDj0ASsJBLMWwMi8OaRec/WrVX2ubEZZPywbX9sG26axAxClQW/zQAV5WOfXxfSgTs86kaQi8eZTE811BQvFsiIUL9Yf/rd+OBlcbkbgtk+sAsbndaBVp19TfQtp1Ag7Ra6CKazEVa85l66hTzYsGspbqfwFS0pLkbLAr+lWWHkbjYlIHx2ffKSFc583I99Fv6pns/uUKtx2nvRf8uBjPvInO1jS8YAFev9AbAOseG6KZaL9nu44ioHr2+apaq9+0ohgpbhte0OEmjTfeUQjQziRIBqXLGqauCmQoR5QHJTp2T/OPm+iSoEKRLvBzhMNOvNREr6y2uYgn0OyQ9BiXTeCbIyh/O0CqStOUyy/NtpEuxD8eteeSyN0qR8JB98cqF6AOxC6bo+ZhgtfMTgo8a3VSD/N9qBb/mZAO+gmT6kXdl0pheCcfsteuTLDuyX1gg5VFSr+B1B6uhqZefZeV4IrNme/QApkVv1wlPEZIgmJXNV8rbBkElkhVAni86HCn7LONd9jZOE+uT1uvGprxYbq8mRM200lZnxef3aXjoN67P7UNKonDmGMaCiIXn9ESQBVQmdcOvtiiEvCrhuEm+EwgSgWxR 88u6AvLw XiHWvjmaezVXNxuYYvhH8SfpHw27FR6C6DOBtxytbrAy/Bflk5cMnGYgm3DLoITSwpegvu0pkilLZzqGL2L8qtkhV2iKk81pIgZqnLk10SKwqsic0SSqr5m14Xyw6f4OIbPf1YVs6776U0AoDz9t1lKiZoUUltfHpORjIJKXboPAS+r9DFTMlzgX+K2lgdCk9SkZS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed 22-11-23 17:38:25, Chengming Zhou wrote: > Hello all, > > Sorry to bother you, we encountered a problem related to the memcg dirty > throttle after migrating from cgroup v1 to v2, so want to ask for some > comments or suggestions. > > 1. Problem > > We have the "containerd" service running under system.slice, with > its memory.max set to 5GB. It will be constantly throttled in the > balance_dirty_pages() since the memcg has dirty memory more than > the memcg dirty thresh. > > We haven't this problem on cgroup v1, because cgroup v1 doesn't have > the per-memcg writeback and per-memcg dirty thresh. Only the global > dirty thresh will be checked in balance_dirty_pages(). Yes, v1 didn't have any sensible IO throttling and so we had to rely on ugly hack to wait for writeback to finish from the memcg memory reclaim path. This is really suboptimal because it makes memcg reclaim stalls hard to predict. So it is essentially only a poor's man OOM prevention. V2 on the other hand has memcg aware dirty memory throttling which is a much better solution as it throttles at the moment when the memory is being dirtied. Why do you consider that to be a problem? Constant throttling as you suggest might be a result of the limit being too small? > > 2. Thinking > > So we wonder if we can support the per-memcg dirty thresh interface? > Now the memcg dirty thresh is just calculated from memcg max * ratio, > which can be set from /proc/sys/vm/dirty_ratio. In general I would recommend using dirty_bytes instead as the ratio doesn't scall all that great on larger systems. > We have to set it to 60 instead of the default 20 to workaround now, > but worry about the potential side effects. > > If we can support the per-memcg dirty thresh interface, we can set > some containers to a much higher dirty_ratio, especially for hungry > dirtier workloads like "containerd". But why would you want that? If you allow heavy writers to dirty a lot of memory then flushing that to the backing store will take more time. That could starve small writers as well because they could end up queued behind huge amount of data to be flushed. I am no expert on the writeback so others could give you a better arguments but from my POV the dirty data flushing and throttling is mostly a global mechanism to optmize the IO pattern and is a function of storage much more than workload specific. If you heavy writer hits throttling too much then either the limit is too low or you should stard background flushing earlier. -- Michal Hocko SUSE Labs