From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 529DBC48286 for ; Tue, 6 Feb 2024 04:01:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 82FCC6B0071; Mon, 5 Feb 2024 23:01:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7DFE56B0072; Mon, 5 Feb 2024 23:01:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6CE236B0074; Mon, 5 Feb 2024 23:01:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5E55B6B0071 for ; Mon, 5 Feb 2024 23:01:55 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 323F11A0934 for ; Tue, 6 Feb 2024 04:01:55 +0000 (UTC) X-FDA: 81760030590.17.E3EFA53 Received: from mail-yb1-f172.google.com (mail-yb1-f172.google.com [209.85.219.172]) by imf25.hostedemail.com (Postfix) with ESMTP id 7678EA001A for ; Tue, 6 Feb 2024 04:01:53 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ijGAM6C1; spf=pass (imf25.hostedemail.com: domain of tjmercier@google.com designates 209.85.219.172 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707192113; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XYPlJCC6y1QEgjgTJDcgPQGNblckOrO8DnCFLZ9O1hA=; b=zxa6/EG0+tgphD2LWGNSV4+upIAuvZOQFs3NsHd5iLnobBkK1xmbuYqvGDrz4HyqCqhy4j dGN4Ybwch3Tn/Cp/H89ppMqVfy8CI0oDLxTFGt3PtFaOWykHuNBdX9xzv/Epz69Md6TVO+ Fp8KIjAQSa3Epe7SXV/88pE1tXuYOrE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707192113; a=rsa-sha256; cv=none; b=JASCzecYvp9iBQb/4XN5CE/NIbCdut0vs/W1X1dTDhHx0isCkbnCkkR3sp4b/RKUkq5A0O 0fWGlr2Ni2nvzfAj0c/4mPqHxRfH5wXINBFsTu1xDhWpknZmW/WHoADdOQe5BI3JQV7up6 hQ0591NMpaBpXBLqfGrf6yXHjCxh6fo= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ijGAM6C1; spf=pass (imf25.hostedemail.com: domain of tjmercier@google.com designates 209.85.219.172 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f172.google.com with SMTP id 3f1490d57ef6-dc6d7e3b5bfso4622586276.2 for ; Mon, 05 Feb 2024 20:01:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1707192112; x=1707796912; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=XYPlJCC6y1QEgjgTJDcgPQGNblckOrO8DnCFLZ9O1hA=; b=ijGAM6C1Xo6bAywJc9dHruZMymu3q2c2XeVod1zL6ZFO4kzREHplqYVjnB/hPPGgUC VE5K9mrlcZ12tFzwZ0CO5b3RejoHq5r42ZIXlsWU4Q+VkSz7f91UDOhB/GupJs3+h6Dr oyDCkIHdMttkJ+glAgFcHzML5AdZoxzE6ZuII3MceMOJTyfFw48DXEQhr9lPGp7SbdlE PfC9PsLPtypZL3aHEujwRznlu++zr7xKvEyUR3EirB68AF3aEp4gEvC0ooXd+5T8NwJu zRUx4AAvaR9DG1DtcCy7ArEqqb6XDxZEQnSU6vdGIN1YKyg0W6I17oSt0nD9oBrFcFvS RZAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707192112; x=1707796912; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XYPlJCC6y1QEgjgTJDcgPQGNblckOrO8DnCFLZ9O1hA=; b=o+XmbJ1o1qUuhO1KgwHhcdeCyj0WF9tZOU0zZQpZSmtcTKldP/LrFKZswvrZPt12to Y6P2DK/B/EByJOrsYc5zSBp4RHLzJ8siGV39rPDSrZW8FEYjTMpXMD1M7Vw2tVFIM61R 62pgp2cs4eAR4rQt559J3S/FwQ2TjDBCgmekusiElN7HzuRzdu3o9a6WljHFU2tcxmg6 SwY6r+Myzki4vnNI3gWrGKCd74n6rgS+4ncXknGdFOqWNqflCzA+rcGiVPx31ZAq25JW eV0MxcuhwfwkU08VXEu1kyi0liEjQv+XPrjS2weiD7eHbiSArkpDVSiTr0DrtJnjkBGJ 39Mw== X-Gm-Message-State: AOJu0Yxe1KZ7O/4ugMp0dSg2TTZprtJZSt31+FWUuH0if7zbMTeirIDO 7bCsXrxZ8M2caOHgDimPD7rbxiDVuK6bmuS1xcG/A/tYJCg+Ri4E1LzfFvt9/xZvuf3HmqYhwQg LIV8960inRmC6ND/CWdHmgKIdQ9u9NEl0PIrj X-Google-Smtp-Source: AGHT+IHQs+XI0qiDLBdYO6OpEO2UI/si9y8p5hNXubv9yjzLeMiJXV7WVmkKgnzX2Iyq1NZxVBG50KqKIm+Dezq7+1g= X-Received: by 2002:a25:b10d:0:b0:dc2:554f:ef41 with SMTP id g13-20020a25b10d000000b00dc2554fef41mr503561ybj.13.1707192112288; Mon, 05 Feb 2024 20:01:52 -0800 (PST) MIME-Version: 1.0 References: <20240202233855.1236422-1-tjmercier@google.com> In-Reply-To: From: "T.J. Mercier" Date: Mon, 5 Feb 2024 20:01:40 -0800 Message-ID: Subject: Re: [PATCH v3] mm: memcg: Use larger batches for proactive reclaim To: Michal Hocko Cc: Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Efly Young , android-mm@google.com, yuzhao@google.com, mkoutny@suse.com, Yosry Ahmed , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 7678EA001A X-Rspam-User: X-Stat-Signature: 9fsm6y4rshxbtkzb8icwqz911ishtzkg X-Rspamd-Server: rspam03 X-HE-Tag: 1707192113-943561 X-HE-Meta: U2FsdGVkX1+yKebu/st04Q6gT/KYvx99j1RIIbX/wsZU9hEQM0JtMKYKQUjA324PcLWh8isoctKf/RkIfhpbGqcQ6ZuAIt3oqgoRCDi9OTwLs+m0SQbgZLO5+B0uYdUa4+in8U/wcSxSfP7vRWYoL8K0TpLOrB39WsyVUi1SXphbH8C2ZvdXfFmevjpwMqQQVz1nS77D3+qrfIDD3SXZbbaUSEXfaFrWW9Zo4kOiDD4+o27eKpcANzWRBCQXtrqHvQxcu6FrPAs6PgHlAO+f4P4rmA2t4AukIkfZc72Hjbiw7nObZurSu8oejOYn+t0AIwFFRIpqPtTceU5fS8on5b3s8TQCB6oK5stl/LrRyec/cI7dVghVhC/grxV8JPbn7LC3QIdldiJq/UgzxHC3N6RGEz1daV1ag59JT8VNgYfHFN2vVQvGqJumE8IdteBKeqVX2j3u51MCzSEUh9WRcJIcxctXCNHLCgfS8qZiqu76JH20bII64S447ipJJoWXJvSdoV5cP+2FjDS7ggImqK6PxDDwVDz9MGYIDUKCnwGyXSD0XUtBZ/doKgNqXATWKur8PhjPnvKEMoqRLHRF/VHztYwuICqJ4HUU5ZqLVo46I5meoAAAxHWYwz9InBWERPAAmbM7DWP5KcnrEzN9i2cjL/18/xFVM10fSe7k1ECdnEg8mPvimAltwGO//y0ODHMBIirzPJ6EdZD3SLO+yI8Lvel6hTJMlZ0mhXnlwMxTX5nvbGQcc/b3vu2l+pRtGfp+qdEx58m+NML4dp8JLnGCGjl5pOby4rKbwE9dujbUCYeAVMiu/L4fRryrl1CZSPzsFiUV4m7xjHiR7EhHm+DNxlk/1uyfsGh+DRZhS21R9QV0o2mRAu1zfjm0aa8t7pEONRJqE4jCDskkxjU6nFV1QQiHpedtISmjAswOhPl0+0G7t2cPcnHTVzAQW5tCdLRPhsO0MC3eduA2qGh lf8S8Hde NqDp6Thh3yfaAy1qzVBxY6xM7dMu9wmfInja2YWZiAi6RLcwOfL4VdVPKtALQ6w5oVB0t+rpl2Rne7PnCSnFKWJcSbj6ajR0mux5B3p3Wph860/fNKB5zz/RzDaZ0q5es5tK9/tISwNGdf5cCxx9RjOi8HUk4GbkhdIpNm/xYclND+r4W3fAtBDbLzOhwFOndR8M+kqYxG8FxJdeR38dGuxfmh1QB2UjwRQ9rRr58wQp1uamH3A76OeKtovz4rsGzJoFFTIF7S7MnWBXkcCxp6o303/Tz+IsZxLA9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000121, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 5, 2024 at 1:16=E2=80=AFPM Michal Hocko wrote= : > > On Mon 05-02-24 12:47:47, T.J. Mercier wrote: > > On Mon, Feb 5, 2024 at 12:36=E2=80=AFPM Michal Hocko = wrote: > [...] > > > This of something like > > > timeout $TIMEOUT echo $TARGET > $MEMCG_PATH/memory.reclaim > > > where timeout acts as a stop gap if the reclaim cannot finish in > > > TIMEOUT. > > > > Yeah I get the desired behavior, but using sc->nr_reclaimed to achieve > > it is what's bothering me. > > I am not really happy about this subtlety. If we have a better way then > let's do it. Better in its own patch, though. > > > It's already wired up that way though, so if you want to make this > > change now then I can try to test for the difference using really > > large reclaim targets. > > Yes, please. If you want it a separate patch then no objection from me > of course. If you do no like the nr_to_reclaim bailout then maybe we can > go with a simple break out flag in scan_control. > > Thanks! It's a bit difficult to test under the too_many_isolated check, so I moved the fatal_signal_pending check outside and tried with that. Performing full reclaim on the /uid_0 cgroup with a 250ms delay before SIGKILL, I got an average of 16ms better latency with sc->nr_to_reclaim across 20 runs ignoring one 1s outlier with SWAP_CLUSTER_MAX. The return values from memory_reclaim are different since with sc->nr_to_reclaim we "succeed" and don't reach the signal_pending check to return -EINTR, but I don't think it matters since the return code is 137 (SIGKILL) in both cases. With SWAP_CLUSTER_MAX there was an outlier at nearly 1s, and in general the latency numbers were noiser: 2% RSD vs 13% RSD. I'm guessing that's a function of nr_to_scan being occasionally much less than SWAP_CLUSTER_MAX causing nr[lru] to drain slowly. But it could also have simply been scheduled out more often at the cond_resched in shrink_lruvec, and that would help explain the 1s outlier. I don't have enough debug info on the outlier to say much more. With sc->nr_to_reclaim, the largest sc->nr_reclaimed value I saw was about 2^53 for a sc->nr_to_reclaim of 2^51, but for large memcg hierarchies I think it's possible to get more than that. There were only 15 cgroups under /uid_0. This is the only thing that gives me pause, since we could touch more than 2k cgroups in shrink_node_memcgs, each one adding 4 * 2^51, potentially overflowing sc->nr_to_reclaim. Looks testable but I didn't get to it.