From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8341C4828F for ; Fri, 2 Feb 2024 18:22:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A8076B0075; Fri, 2 Feb 2024 13:22:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6576C6B007B; Fri, 2 Feb 2024 13:22:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F99E6B007D; Fri, 2 Feb 2024 13:22:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 4078A6B0075 for ; Fri, 2 Feb 2024 13:22:49 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 16D27C0FD6 for ; Fri, 2 Feb 2024 18:22:49 +0000 (UTC) X-FDA: 81747684858.05.9CF69A4 Received: from mail-ot1-f45.google.com (mail-ot1-f45.google.com [209.85.210.45]) by imf14.hostedemail.com (Postfix) with ESMTP id 536BE100009 for ; Fri, 2 Feb 2024 18:22:47 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=K14YrwoP; spf=pass (imf14.hostedemail.com: domain of tjmercier@google.com designates 209.85.210.45 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706898167; a=rsa-sha256; cv=none; b=bMVoDzByXE7QDowoTkbs/1afZqA/w3GcL1xWB4s6EOFKJUG20aMH4cTKAaD1iQDWyn90K0 2eGLKr31EXFJAuNGXcuGqi/D9ZYdXc1+0gOOMBAjLumgogq/WDAho+yqcmT5nO0KnK9A5d yejIzsqtQ59axyrNWOimqRTL45Rq5Bo= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=K14YrwoP; spf=pass (imf14.hostedemail.com: domain of tjmercier@google.com designates 209.85.210.45 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706898167; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5Vgtnno6wkR+VkAzje/RiytvB0nk1FOGbedoxkxlBHI=; b=7LQdWGKA538f6o6Jx3KkIBYbOLH6o6S8EG8+dcsFsVrxlx3F/uYH5Jx3UUork62hyJZMID ndHKG1f8ECtZxh0yDVulNqVNiUUEp0SJ2FokUZyA0h14R1KlvGxKB0HSGDb4PvbOks7hW9 JwK/IPP1uIUMPMhk/kXe9qPwrdeWFkA= Received: by mail-ot1-f45.google.com with SMTP id 46e09a7af769-6e12d0af927so1679140a34.0 for ; Fri, 02 Feb 2024 10:22:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706898166; x=1707502966; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5Vgtnno6wkR+VkAzje/RiytvB0nk1FOGbedoxkxlBHI=; b=K14YrwoPb9CBu3wj+HR7JUq9gtVz5Tq8pOD3JRLocwTDijzuaSQf58fppvyJTdJIgP XOPTsrk/aKnJg6PMq3TEMGQIPW5A7cDUIQW/awF4SILlmtWI1kSdfvCDZ85FKOT+bEbP mKTlpjllfADLrftMKbHb4p8nrHPI0mVqWKkE+b6LgM/01JGVAd8jWW9FSXF0H2A9f6CQ KQWGRolOUbk/Fx/6AC4XdtCb3b7JI4Lt03j1mdIvHvAZIRmQGIbbkjqsRJ7J8Jj+TV5q uwfrx7QEqGBN5X8za+z4Yczi8Ro4mRViGjIGt+758W1/AU+JKgcoPQbXlirBLlSqq4K7 71+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706898166; x=1707502966; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5Vgtnno6wkR+VkAzje/RiytvB0nk1FOGbedoxkxlBHI=; b=iyLQLY/uAaFAyTc93cw6A07y83AlRGRucoHzGsfHXRR45vKHpGMLnMrZsVL9wf0n1E hTeMS29+Hzsj7/dmwRkWhGVAdi3JSAk5tcadwEOHIJST9LU1BX/6GpT/Wylbsskmc6gG ap8GO1t3SfUHyNmp1uekeJnmb9qyIX+j7krhL4tlYrxk68RZLd7lvluldwnz957SIibA zVAhpiwvo+R1o5xp9yHpIcwfYg2W/ItIk3HVxjQdgLONujHhucILZsDdzd2nj4+cDlo7 TrN4SWtJXDyFYbS+PnuW4OOQe2s9oQsmdypUmFI4apyUpjbB6tgRmPNckelrzvY9WYAC ODVg== X-Gm-Message-State: AOJu0YzaUlG64paventmwR9BdQfUuicr0zrGc5LgKe1H5Q4nTprSPBlP N1ouaWlCLCnl8+Uw4aPRE99NEi7jlXEqG8dmo57pxBXaGKd/iUQask+ZAzhwF5kUXQ4b9WZnguA jEZNSyB0HxFLLEZwdmOfvNPzNBnqJ77bgfNzl X-Google-Smtp-Source: AGHT+IGcU+EKjFdt3AgFAbkSPSfwDQmkDe3lffsGg4gSwfWzSHPSCNHR2D8ZEri2mw0YLTGlB0gcT19xB71Pgzgw4qk= X-Received: by 2002:a9d:7302:0:b0:6dd:ecaa:3aea with SMTP id e2-20020a9d7302000000b006ddecaa3aeamr10041878otk.1.1706898166366; Fri, 02 Feb 2024 10:22:46 -0800 (PST) MIME-Version: 1.0 References: <20240201153428.GA307226@cmpxchg.org> <20240202050247.45167-1-yangyifei03@kuaishou.com> In-Reply-To: From: "T.J. Mercier" Date: Fri, 2 Feb 2024 10:22:34 -0800 Message-ID: Subject: Re: Re: [PATCH] mm: memcg: Use larger chunks for proactive reclaim To: =?UTF-8?Q?Michal_Koutn=C3=BD?= Cc: Efly Young , hannes@cmpxchg.org, akpm@linux-foundation.org, android-mm@google.com, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhocko@kernel.org, muchun.song@linux.dev, roman.gushchin@linux.dev, shakeelb@google.com, yuzhao@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 536BE100009 X-Stat-Signature: bisd65pganugacerjt7n7cwe9yynu5aj X-Rspam-User: X-HE-Tag: 1706898167-385067 X-HE-Meta: U2FsdGVkX19gSv0hDxIHeO/bFj3bRxExACN8KlrkvJatcfI1aYLJxuPibg2zJO5++18yFRau4ls/YSjkBjFWHCDWnC/Rw03nWve3/lO3nTpJFSCMCSDEuvT4mgNl3lbFhcBGFgrZcKMRYGC3Kc4qKpQsx5kNbVWS6BlPZZx8FAzDNPdmSJideZ54K0Og2k6AUbGsXph5S0pH5M4KMvJ6DCke9es6lDVTRwyMG/RiyFxQuk7Nm+Fzwt8a2NHjlYeAfwDMEVVVJFqaugMR/JVZUWda0C2RlHiRuv5Bagmt9+/IUtSevlIYg7YAh4jYpVURKH5vT3mrXe4QsLv8AHX+zxzP8F3mn46V3piPTfuTjXd4LoBMOpEDL5SsHRrVUkXgqz8XLQAtlq4R6AQw89cQ0bQIizeNcAUNvC3uZ7UGzc4rY4NJaXaudpA1oOnfL0C5y138SYNtyZp+DVQ2ThY8LnYWWIy4meo9jrR0ozQGXT6yzAQTOlb9jvQZmU4W9Mx5X0iX7sKVogSTn0+ObIDScL5wdPYDMBLld6JvnU8G5Od/VmNNy5mEd9vmDiQW/q64+4pBKO2gOO/Y5Xo65vklb/h3myhSrFcIxBZtgMkQTty8XtI4s0G9KQGW9DzVijLvF1/eNfrbX7nZ6Lwh5mmRtdHogqVXKZV8Ij8Wv8kNallsCo3j1SfViwmlwBQHfOEqePIkgDW6j0z7LsUBJnZ1m25dGKIaaeYKpU7G8BgOBSv+IidWG5hyNMMJyZKOBPYmqetYZSajAcciYqDLLEMm6QP3Ll2z4+AnO89JMdJMo8/ao97CSUnhVECz2ebP/c6Qo2E3aKDhzqhaktmzwS1aGLwGD8e8tnc45C0a+ItkGlqFn0STmyPbqYB3dFC2fppm9YPHi7k5ggdJaXBjXonBk3y6z/i67Z0OD7ImPSCCFFdTPOVH82GePEvny2HtHDTfH0obkWlJRH0Ke1iZ6pf 0PEmXLDv LgFDTvM533NBvKPzC4yA1z+MAyihu+9ECJcYqvziPJeIrYnvGr3fvWVAtBpW2gz61D+UgV5xgv2sbi+JmU3+5tbvm8+92xz3hitsAN+cqvRGceGBM3sWWHab2mwPtnJYg8vsgFIvMWaE61km4HEgSh/H0hzjIb9+Xzb71jN+DoVQ6hMlXGd/86378pxrFIM499gNqrOeDa6LVuKMcXKominRjQfIJd0l/7JAEoi12/9SX1MVkzcT1V4rb29hk7yKflXerHIb/kcquCP/XhBdAaxhCqu1YS0CByS+Jha61aBX6vKOcnIfzLYftQlddZK8EbOdQzqABtWW7T9boTxwfDy0omg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Feb 2, 2024 at 2:15=E2=80=AFAM Michal Koutn=C3=BD wrote: > > On Fri, Feb 02, 2024 at 01:02:47PM +0800, Efly Young wrote: > > > Looking at the code, I'm not quite sure if this can be read this > > > literally. Efly might be able to elaborate, but we do a full loop of > > > all nodes and cgroups in the tree before checking nr_to_reclaimed, an= d > > > rely on priority level for granularity. So request size and complexit= y > > > of the cgroup tree play a role. I don't know where the exact factor > > > two would come from. > > > > I'm sorry that this conclusion may be arbitrary. It might just only sui= t > > for my case. In my case, I traced it loop twice every time before check= ing > > nr_reclaimed, and it reclaimed less than my request size(1G) every time= . > > So I think the upper bound is 2 * request. But now it seems that this i= s > > related to cgroup tree I constucted and my system status and my request > > size(a relatively large chunk). So there are many influencing factors, > > a specific upper bound is not accurate. > > Alright, thanks for the background. > > > > IMO it's more accurate to phrase it like this: > > > > > > Reclaim tries to balance nr_to_reclaim fidelity with fairness across > > > nodes and cgroups over which the pages are spread. As such, the bigge= r > > > the request, the bigger the absolute overreclaim error. Historic > > > in-kernel users of reclaim have used fixed, small request batches to > > > approach an appropriate reclaim rate over time. When we reclaim a use= r > > > request of arbitrary size, use decaying batches to manage error while > > > maintaining reasonable throughput. > > Hm, decay... > So shouldn't the formula be > nr_pages =3D delta <=3D SWAP_CLUSTER_MAX ? delta : (delta + 3*SWAP_CLUS= TER_MAX) / 4 > where > delta =3D nr_to_reclaim - nr_reclaimed > ? > (So that convergence for smaller deltas is same like original- and other > reclaims while conservative factor is applied for effectivity of higher > user requests.) Tapering out at 32 instead of 4 doesn't make much difference in practice because of how far off the actually reclaimed amount can be from the request size. We're talking thousands of pages of error for a request size of a few megs, and hundreds of pages of error for requests less than 100 pages. So all of these should be more or less equivalent: delta <=3D SWAP_CLUSTER_MAX ? delta : (delta + 3*SWAP_CLUSTER_MAX) / 4 max((nr_to_reclaim - nr_reclaimed) / 4, (nr_to_reclaim - nr_reclaimed) % 4) (nr_to_reclaim - nr_reclaimed) / 4 + 4 (nr_to_reclaim - nr_reclaimed) / 4 I was just trying to avoid putting in a 0 for the request size with the mod= .