From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C743CE7717F for ; Fri, 13 Dec 2024 17:06:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A0C06B00A4; Fri, 13 Dec 2024 12:06:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 250D66B00A5; Fri, 13 Dec 2024 12:06:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F1836B00A6; Fri, 13 Dec 2024 12:06:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id DDDA56B00A4 for ; Fri, 13 Dec 2024 12:06:33 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 75765141006 for ; Fri, 13 Dec 2024 17:06:33 +0000 (UTC) X-FDA: 82890563868.10.E7D8451 Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf16.hostedemail.com (Postfix) with ESMTP id 35DE118000F for ; Fri, 13 Dec 2024 17:06:04 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=K5iC7gPK; spf=pass (imf16.hostedemail.com: domain of tjmercier@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734109574; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iyIlfqeTSYnGwoVUw2fO+COs34egMKzATNjjOmDg8bc=; b=RUcjRBAf0gsjGMRejwhp4LHiHfDvitZ9QXtS7hnCvi09v7/pDqQxUAwR+EwWJoRc3lXB1R vj3OctTTFQ7KxAoOuwV6FDm9Bs5tDIZixeAlojo2sZYQWwr69qvQHqaOeM8FV0+ZAegoU7 8A788aD4Y6uT5qqpI6MYsQqwBmkWxrA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734109574; a=rsa-sha256; cv=none; b=R8niB21PWWVLOqerlm6dV2Op9DFsAlbkKGuzVcUTmGZFRV/R1V+vpvNDkhyUfzFQS3tt5O RmktFpV+7ftbNXgVrwaVbya+pSvjTl53c5FOGNQ7kCag5ZiZ6trrMGCVSY/J2XigAabwNY Su6Nb+OGCTYbM/235OOGTQXSum9gycQ= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=K5iC7gPK; spf=pass (imf16.hostedemail.com: domain of tjmercier@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-467896541e1so241231cf.0 for ; Fri, 13 Dec 2024 09:06:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734109590; x=1734714390; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=iyIlfqeTSYnGwoVUw2fO+COs34egMKzATNjjOmDg8bc=; b=K5iC7gPKCRkvOka1yBPigMMFNtJ3+fnHMB7qveLRY425C6AEoiaRsqclZmgS8MtTZc sS8aihM+aPcmB/+VanA9LeRa6D6HKiWBaxRcUkK9GGxesM97WfEFXO0fV95cJrzNe9po FGBIjNGXHseY3Sjy6TH1DRz/XFo9UMjwa/xnM3pygEhry34rwEpxKmH7rthyq2wf1cCr T1vG0KI7zGR2pHtnLfgT8X/Os8lsOUmxzBn8ONYtcrZXKfrSSXIJcqp7TW2iRCdGoF7X QahNprEVrlA6nPLgXa0QuFQopA3ruA8L/vgMr4EMit8qPQ/ggfBWeyYSo80kfWVZxxF3 ODJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734109590; x=1734714390; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iyIlfqeTSYnGwoVUw2fO+COs34egMKzATNjjOmDg8bc=; b=CvRofNKzer9NmFme7i+dITB8OnNLWgKXlC6slDccdsuSOeYBgkCRAoIThuigybhXc2 6UwP48WYUrnJPRhxWGdbpIo1x5tUxL9/ulRUvrxUPeywNhFG9PV3LQ9C0eb1//bQgXNm G3OJfudEc+V3dp9BCIf6KtzcfiyYP01DblWArXhqH9gaKUTuJOy/3zUlfE08LY0Rz/bd EDd52AIgULAbKS3i3xlX3GMZ+rDpnOHIOTwKZy9UoeQeZM+XdrKH/fq3OIf5TvOWJRvd 8wilMZ6Hmov/RC2vCUYzqJgJ+aR1Sm2xkTq6yPQJJRX520BeQzA2wj7quAKrXpS1i5zB T/6g== X-Forwarded-Encrypted: i=1; AJvYcCXyPN0D2q2BUBoXahdj36JFJ7Dfglq/0fwk70g/HF08QIm71c5jqFwBNdPWxfHbEAZAaqM6aASDWQ==@kvack.org X-Gm-Message-State: AOJu0YxiktWsFQzTXSFQ3NVpcZivbxFqNkWAGX2cmoz0PI+IwHp26YJi Blgtqd7ZkiLLZJfTs8ofIcgmJgbAJGcwn2bvYOuV20a2d3/cH6/BRmTdziIv95vHjmBLg7ReiUu GdXz93jGjDr93MhVClA5S2fNssyC5sSyTaq4Y X-Gm-Gg: ASbGncsde0+PJyHCZ/Y6z5Dd7N2tmRzbhx6W7DSIuBlcPJBAjdTm16ylL98Am5GWHaU mjjK7OPxcgoFkJedQh0tNSsHz6AhtBNn6kuH9 X-Google-Smtp-Source: AGHT+IGRkRuDW6q7/PmvB1JStrDpLKLNbu6M5WFn/fBcivNwgYnFcQLVKnDyWTz0mgKKzXPtocJrva8gm3CGJ42jf/A= X-Received: by 2002:a05:622a:1924:b0:467:7c30:3446 with SMTP id d75a77b69052e-467a59cdbebmr3184701cf.25.1734109590181; Fri, 13 Dec 2024 09:06:30 -0800 (PST) MIME-Version: 1.0 References: <20241212095646.16164-1-hailong.liu@oppo.com> <20241213022619.ph22z2mxxyh3u3tw@oppo.com> In-Reply-To: <20241213022619.ph22z2mxxyh3u3tw@oppo.com> From: "T.J. Mercier" Date: Fri, 13 Dec 2024 09:06:17 -0800 Message-ID: Subject: Re: [RFC PATCH] mm/mglru: keep the root_memcg reclaim behavior the same as memcg reclaim To: hailong Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, yuzhao@google.com, 21cnbao@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 35DE118000F X-Stat-Signature: mdpmqyg6g4rum7txdtgjhqntqy5i17fd X-Rspam-User: X-HE-Tag: 1734109564-216715 X-HE-Meta: U2FsdGVkX1/QD556/ecNwBgprV7H3UBMpcVV9kSJXDhPYutpyGw93NBl054ommG7bfdfnI60p8CEwpVIHp5N4YXv1locqGKy8GGEN53r3M517XJ53zK9yJHiAjfJ20PfAbQshI5yYI1m+AehWvktppaQB5XrkQFIZ2bVY2QSw/z72n6kDJoq1tQ6xOgYMf0zOG5NBog/QRmW/gNlsbKLya49Y2f1yxntVflxhtcweNdi9zn1hsGjPlrv2qS+WjYbH0658fdo4cwHU/n1jBDDx4Oj5Yt/5A5gTh5SK1fqLjg/ub+OaFLloLPEh4khY34dfLgxjng4BreRxEF1orEvsbGNVQzNYQoRND75BIA4CaQ8duRfolNU+SCeQ0RXMcZ54UJOV9SKw9IzM6hH5oP+MEJGbcO4KxXYhPQ+Vs0Xyr14KgCSutLOj0fAA72D7vYKazwz/drsgvWvO7UPvEigzuZXbmeKfyQrG8jvMRIpIJ8D8UoNUuvw2OZakp5PA4/RUEPN7AbD5fKzeMriEXVlTVJ5qJ7oquNu5WtPjv0ac1gPS77Ogu285TNu7TGIpNm0altMfZMSXP8wlssZRGX4G7DxDcLbGAPpx0R6EQKeTC9vpH+v8YZWBbg8RPXz9B+pkg5u3rbYYoxAFesGcqhkTUB4hXdOP9Yc4Y4DWRGWAjJ45YZJ9ej+xwxLiKHdHM2UspFvbqp3AYGsazYZdjE8jQRznOOHrDfcG/qt6wrq/+U1bjUJjUBRYKlXk8OqvNiW9fvsNmuhpZNa1kR/wj+TpOHgJ/VEDwBqvWskHepBR+z2QqoUOlj3JDGtzCpBudInZd/GEWfiRboYQwrZA40K1Fra3XioDgJ2ZFN2cBUXMRCT6r2y/+g9olFUvqJQTuxga+7SIN9HYSBufIYt2wLu4lbVX/RT9JGNKQp4APGTF5z8qKI2eI3B9lfccUwoLuNH5ZhIv4cV6uSk7mCQepz jnp3+ZKX QmhRGw8pOLCP5hzioFE8ixWveiF4hJy3m1zfgwrZGSfj9SZgBFaOyeuwj5eV/SeiB/dnC3t5DZG7HGE+4MKdj5s3vfZEokSH02uuGsq0Iaw56FFYmUQkf7ivWKt/nqnp9xrxXttjWs5E7cM33Y7Fzs+h9aff6DxaR6K0A3EILlh7lvZKDhCwaWCY1kjaMO0HpGrfcxrQoMlCsRbw1q7fQf+yDFl/8SLWBORkGhkb+TnyXSqlXKDcL9tqcxH1G9AixqnKBffECQUhi9+vSgDL4n6zkkfvlN4Wrx0Nqqoq9HgZGfjEXtwYtq9jcTUwmXfovn4ykDc73yab+N6WN3OKpce141fU+m8XLjsoRpw7SK7Kgh2oD1sw+TdJPww== X-Bogosity: Ham, tests=bogofilter, spamicity=0.154913, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 12, 2024 at 6:26=E2=80=AFPM hailong wrot= e: > > On Thu, 12. Dec 10:22, T.J. Mercier wrote: > > On Thu, Dec 12, 2024 at 1:57=E2=80=AFAM hailong = wrote: > > > > > > From: Hailong Liu > > > > > > commit a579086c99ed ("mm: multi-gen LRU: remove eviction fairness saf= eguard") said > > > Note that memcg LRU only applies to global reclaim. For memcg reclaim= , > > > the eviction will continue, even if it is overshooting. This becomes > > > unconditional due to code simplification. > > > > > > Howeven, if we reclaim a root memcg by sysfs (memory.reclaim), the be= havior acts > > > as a kswapd or direct reclaim. > > > > Hi Hailong, > > > > Why do you think this is a problem? > > > > > Fix this by remove the condition of mem_cgroup_is_root in > > > root_reclaim(). > > > Signed-off-by: Hailong Liu > > > --- > > > mm/vmscan.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > index 76378bc257e3..1f74f3ba0999 100644 > > > --- a/mm/vmscan.c > > > +++ b/mm/vmscan.c > > > @@ -216,7 +216,7 @@ static bool cgroup_reclaim(struct scan_control *s= c) > > > */ > > > static bool root_reclaim(struct scan_control *sc) > > > { > > > - return !sc->target_mem_cgroup || mem_cgroup_is_root(sc->targe= t_mem_cgroup); > > > + return !sc->target_mem_cgroup; > > > } > > > > > > /** > > > -- > > > Actually we switch to mglru on kernel-6.1 and see different behavior = on > > > root_mem_cgroup reclaim. so is there any background fot this? > > > > Reclaim behavior differs with MGLRU. > > https://lore.kernel.org/lkml/20221201223923.873696-1-yuzhao@google.com/ > > > > On even more recent kernels, regular LRU reclaim has also changed. > > https://lore.kernel.org/lkml/20240514202641.2821494-1-hannes@cmpxchg.or= g/ > > Thanks for the details. > > Take this as a example. > root > / | \ > / | \ > a b c > | \ > | \ > d e > IIUC, the mglru can resolve the direct reclaim latency due to the > sharding. However, for the proactive reclaim, if we want to reclaim > b, b->d->e, however, if reclaiming the root, the reclaim path is > uncertain. The call stack is as follows: > lru_gen_shrink_node()->shrink_many()->hlist_nulls_for_each_entry_rcu()->s= hrink_one() > > So, for the proactive reclaim of root_memcg, whether it is mglru or > regular lru, calling shrink_node_memcgs() makes the behavior certain > and reasonable for me. The ordering is uncertain, but ordering has never been specified as part of that interface AFAIK, and you'll still get what you ask for (X bytes from the root or under). Assuming partial reclaim of a cgroup (which I hope is true if you're reclaiming from the root?) if I have the choice I'd rather have the memcg LRU ordering to try to reclaim from colder memcgs first, rather than a static pre-order traversal that always hits the same children first. The reason it's a choice only for the root is because the memcg LRU is maintained at the pgdat level, not at each individual cgroup. So there is no mechanism to get memcg LRU ordering from a subset of cgroups, which would be pretty cool but that sounds expensive. - T.J. > Help you, Help me, > Hailong.