From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F24EC02180 for ; Wed, 15 Jan 2025 17:56:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A17D7280002; Wed, 15 Jan 2025 12:56:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A176280001; Wed, 15 Jan 2025 12:56:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F39F280002; Wed, 15 Jan 2025 12:56:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 59609280001 for ; Wed, 15 Jan 2025 12:56:57 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C00BD121565 for ; Wed, 15 Jan 2025 17:56:56 +0000 (UTC) X-FDA: 83010442032.08.D64258C Received: from mail-lj1-f180.google.com (mail-lj1-f180.google.com [209.85.208.180]) by imf26.hostedemail.com (Postfix) with ESMTP id 9194C140018 for ; Wed, 15 Jan 2025 17:56:54 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Q+CUJrnQ; spf=pass (imf26.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736963814; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/F19eYIAfxzOq+J3qei8sP7ZtgT1uBo07hK4CIods08=; b=kZKfYGhdvqzzAverrgOXEJxGIQ5Uo2XOfWhy2vrpWiprFZQAGepsgrOP0jHeIDh/v9hRNS XTLuunIaP1tCwf54mdq4FRiY1dbiPIpcR+BxALqgmSjFWFMyFEx/CHHpI48E/7tX6l1LLg R28mdOuQ20MdgnOEwBmgVWiypB1rUlo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736963814; a=rsa-sha256; cv=none; b=fCyD0//h8C1lQ+vPGUmzAD8hv4yQ7IJbfslCqyZMgkwH17bDL3VjpMIXVT158WaFqXs3N8 iP8ftgs7U29oKpo7q0m38BBz5ViosA70HN/NU3L0z2ov3zLlwGcy229woa5mki/x/CBshX uFQ+qeO8yVEiaJHh2w7JGdGgNd1fwIs= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Q+CUJrnQ; spf=pass (imf26.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lj1-f180.google.com with SMTP id 38308e7fff4ca-303548a933aso431161fa.3 for ; Wed, 15 Jan 2025 09:56:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736963813; x=1737568613; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/F19eYIAfxzOq+J3qei8sP7ZtgT1uBo07hK4CIods08=; b=Q+CUJrnQjz/LeKsHcbHygCtp1yrzXbwt1WGvBXF9BCTW2hE0QKO99rXLhzCOdLW0Em bFy9Em610ZjmDdMa1rAav0yJmxyRcgq9UA3fz1w3W5Q84fj3cnyYr3f9k5m8T0U0tyvy FdU46dA4VkdfdrUURPr+wipthD4aHO69cDjTJZKspegGpg7xo4z782h/HpmeS3ZfwvzO +5WHyNXSLk4xCMwNA4EMfsYcawpiU6teTfSkUfD+Vv7YhDIYi8Gu6SKy+ZGR3+FEbakW GPkN5ip7F8yAUiAmYWHvWUhRiW8E2B5HB2Gy/zMjBGVnwfDB+F52nYFNmBNcjVd3dYik 7n4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736963813; x=1737568613; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/F19eYIAfxzOq+J3qei8sP7ZtgT1uBo07hK4CIods08=; b=fJWfnkk6vy97PBQ/wTHvp1iLgx2cvJVJiafU7smvlu26FByrx/D1W2qpU78UcZ3PYZ GZahyz/01vFdpkmbDCySVorH9tTRZ5GgLQE3ZHhffev1pXLNiGL8NNxNTL73Qm812U7i huyjqvj6GkgIDL/1LfEhfgm+OFsFd7AGTYBD/qjY7LUQp/dJqQvaVrTV5B+NvyJJN0A5 EuWOJYzOoJL001Bzv/qR19V0i9h/21Kmvgt3Dwq7IYKeGg0IgtSFy+u63d9mVVz5PllB ppvnzAwgihsD9x5WvN2RasawAmP5G9FDZrDfyNZ0YTJcK1+uz1FrsOAKnv0Gsz/cI+g5 Lr/w== X-Forwarded-Encrypted: i=1; AJvYcCWIkF7Ncp29lWhBnSyFvMzKQ0sF4Zqroj+ZTEjTLqy0qY/FjDMCQTnEFZrJuQAVHMcljU8mgRBdMw==@kvack.org X-Gm-Message-State: AOJu0YxR00FPJjd1f4FsPoGERzUMC7iMHcNh0pFZZ+AJ7Vk+VaGroW/w B1iMkg9X/u8R4dejU76WvlEgQXRnopjpkNtQvlUZx7n5jDiCfv838M/HcPxWAKi2zrfg7dC33Sa s4qUHaBnn0GUjLpouYKwC4wwD43g= X-Gm-Gg: ASbGnctdt7yIa/WO+gpOcPFiSVpd/9YfHOE4Pk9gPFy9C9EKVqDIBeZtaIGH/hKFw22 1rOJilGXU+vXvoRSojH7B2yp1xz2CWX0IDUMQlg== X-Google-Smtp-Source: AGHT+IEE70n32GgfggCn5qap64hdiLy2Sact5O7Hl2woJQ6R/+2aR/HWE5OB2qWkTIR1CfCimWJA89AWxoCChKljEoE= X-Received: by 2002:a05:651c:e02:b0:300:33b1:f0cc with SMTP id 38308e7fff4ca-305f45efb08mr74759021fa.34.1736963812219; Wed, 15 Jan 2025 09:56:52 -0800 (PST) MIME-Version: 1.0 References: <20241231043538.4075764-1-yuzhao@google.com> <20241231043538.4075764-4-yuzhao@google.com> In-Reply-To: From: Kairui Song Date: Thu, 16 Jan 2025 01:56:35 +0800 X-Gm-Features: AbW1kvaNoukbtVRlKHzqmvfLZetfRLkPY3m1euGwq2CPZnNLp_JrvKJRj8vUxpY Message-ID: Subject: Re: [PATCH mm-unstable v4 3/7] mm/mglru: rework aging feedback To: Yu Zhao Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, David Stevens , Kalesh Singh Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 9194C140018 X-Stat-Signature: 9ggq1t5gwrakp614pktity8na53h5bpa X-Rspam-User: X-HE-Tag: 1736963814-500345 X-HE-Meta: U2FsdGVkX1+Cmfc9y9Hsped+brhvKbaqfuL3IqVzqyxtm6/ZztgqcuhWrinKy3FK4BNDPudU/SHeNqoNNznyNWUq0LvpO8/hl2qbshV7yXEeKMJQRpao0UNC9QinaxfFmdi7Zf43gTnmakTPPhKNlMDFMOfb9CA20zKfB8XLS7UjUgPKdZAF/U9AEQ+Vmxz8vti+mBQGtQ+PcShFQZB/Nvji3gkYs2icUrht2CbucCHqbujBPrZ+KCsTIQ5zF29j24+ATNAK0WZSzHja7nw4xa9A0YSaw1eSnlHWH4+EXCV/YDR8bqI1UoACn/a/zC7kj+PVLjx08J/kx+zabUJZnQzQarv+OHx2NK9idtDiXjx1q8LmfyIsLliP7RuvEpwj1acyLi+069AaJFKFvsY+4jp8PMNZTR25Ob+1sMS0jzEwbganGZPGsJAds8wvjw+gta0ZKUi12ldiOtmfnOUqOuSaQ7pDsKZ/HNWP0jB+hmmJKd8tEDjUxnbSYYDO7uspuwowszLiUqQxQbFqYfDV1/6VIfWSUC2/KJbsh9spUxtUCA72/xStV8IewXiqub+WogV8aAsk8YIkidgOxkGeF1+ZT8d7HSRiQtVxpW3WZm8lnJ0Qq2aCHUlpTRymtk9RFGCxBMqXitm/FBJIZPec4eacNJmHk3Va7nTkPesUfOYoLEgmaIstRkCNeop17Ld57wZ/Ks8P3b7w470g/Zd8tvL42yXCEy4KUxgCXK0wKODsUF0E/LuTa8KKGIKOeslekobmFkLyc6hLfpcvmVp06GyiPCQiFJvO5+ECn66AnywL/7iR9+QDCe9aooW4zFx001S3UW2EZf5a2vhDl/c7QfmLoWL5zD+3OkDDT6FEo4bS1F1Pa1SzM2KGW4mHl418K20ABmKZem6OeF5iRI9yY0oregD68dp2/4Zm2/c3bHNlopGbxqrrtnWOx5Z2qKBS0zipUZhuDX+uPPWDzzQ Deuonze3 ihz3IHbmd8tNFx+mOS8S5pvVQ3Hj7KNQqpms3O5UZYWVloiMWVuPLRrMRqunwTscbYWH8OyC7BazZWval3pjpnMgW8f0IRdm/IIHq68v2FX5L2XiV/iMIshDHAiIC7gMz71MVJ7jWG/QTXrTQVsdUF1AuEYYeFt2sDuaohzzusawhidZnW9FOWzLfSY9TzeJs5yhyQnMsJ0JE3FBddVK9jh0JUb7hrFx1B31Xy6gmP+uhFaZ6nZX/qah+2T1FxyKP0CCE+Z+V596T9HObn2kISGc4YVCM0QVYzO1+5bvZ+5VCB3UpUKNqQBsvQwFaycrrrNLHsm0R92Amac77DNHrCNXX6z1ZSQ4k4RWdSwbvv42m6ss= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jan 13, 2025 at 2:51=E2=80=AFPM Yu Zhao wrote: > > On Wed, Jan 08, 2025 at 01:14:58AM +0800, Kairui Song wrote: > > On Tue, Dec 31, 2024 at 12:36=E2=80=AFPM Yu Zhao wr= ote: > > > > Hi Yu, > > > > > > > > The aging feedback is based on both the number of generations and the > > > distribution of folios in each generation. The number of generations > > > is currently the distance between max_seq and anon min_seq. This is > > > because anon min_seq is not allowed to move past file min_seq. The > > > rationale for that is that file is always evictable whereas anon is > > > not. However, for use cases where anon is a lot cheaper than file: > > > 1. Anon in the second oldest generation can be a better choice than > > > file in the oldest generation. > > > 2. A large amount of file in the oldest generation can skew the > > > distribution, making should_run_aging() return false negative. > > > > > > Allow anon and file min_seq to move independently, and use solely the > > > number of generations as the feedback for aging. Specifically, when > > > both anon and file are evictable, anon min_seq can now be greater tha= n > > > file min_seq, and therefore the number of generations becomes the > > > distance between max_seq and min(min_seq[0],min_seq[1]). And > > > should_run_aging() returns true if and only if the number of > > > generations is less than MAX_NR_GENS. > > > > > > As the first step to the final optimization, this change by itself > > > should not have userspace-visiable effects beyond performance. The > > > next twos patch will take advantage of this change; the last patch in > > > this series will better distribute folios across MAX_NR_GENS. > > > > > > Reported-by: David Stevens > > > Signed-off-by: Yu Zhao > > > Tested-by: Kalesh Singh > > > --- > > > include/linux/mmzone.h | 17 ++-- > > > mm/vmscan.c | 200 ++++++++++++++++++---------------------= -- > > > 2 files changed, 96 insertions(+), 121 deletions(-) > > > > > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > > index b36124145a16..8245ecb0400b 100644 > > > --- a/include/linux/mmzone.h > > > +++ b/include/linux/mmzone.h > > > @@ -421,12 +421,11 @@ enum { > > > /* > > > * The youngest generation number is stored in max_seq for both anon= and file > > > * types as they are aged on an equal footing. The oldest generation= numbers are > > > - * stored in min_seq[] separately for anon and file types as clean f= ile pages > > > - * can be evicted regardless of swap constraints. > > > - * > > > - * Normally anon and file min_seq are in sync. But if swapping is co= nstrained, > > > - * e.g., out of swap space, file min_seq is allowed to advance and l= eave anon > > > - * min_seq behind. > > > + * stored in min_seq[] separately for anon and file types so that th= ey can be > > > + * incremented independently. Ideally min_seq[] are kept in sync whe= n both anon > > > + * and file types are evictable. However, to adapt to situations lik= e extreme > > > + * swappiness, they are allowed to be out of sync by at most > > > + * MAX_NR_GENS-MIN_NR_GENS-1. > > > * > > > * The number of pages in each generation is eventually consistent a= nd therefore > > > * can be transiently negative when reset_batch_size() is pending. > > > @@ -446,8 +445,8 @@ struct lru_gen_folio { > > > unsigned long avg_refaulted[ANON_AND_FILE][MAX_NR_TIERS]; > > > /* the exponential moving average of evicted+protected */ > > > unsigned long avg_total[ANON_AND_FILE][MAX_NR_TIERS]; > > > - /* the first tier doesn't need protection, hence the minus on= e */ > > > - unsigned long protected[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_T= IERS - 1]; > > > + /* can only be modified under the LRU lock */ > > > + unsigned long protected[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_T= IERS]; > > > /* can be modified without holding the LRU lock */ > > > atomic_long_t evicted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIE= RS]; > > > atomic_long_t refaulted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_T= IERS]; > > > @@ -498,7 +497,7 @@ struct lru_gen_mm_walk { > > > int mm_stats[NR_MM_STATS]; > > > /* total batched items */ > > > int batched; > > > - bool can_swap; > > > + int swappiness; > > > bool force_scan; > > > }; > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > index f236db86de8a..f767e3d34e73 100644 > > > --- a/mm/vmscan.c > > > +++ b/mm/vmscan.c > > > @@ -2627,11 +2627,17 @@ static bool should_clear_pmd_young(void) > > > READ_ONCE((lruvec)->lrugen.min_seq[LRU_GEN_FILE]), = \ > > > } > > > > > > +#define evictable_min_seq(min_seq, swappiness) = \ > > > + min((min_seq)[!(swappiness)], (min_seq)[(swappiness) !=3D MAX= _SWAPPINESS]) > > > + > > > #define for_each_gen_type_zone(gen, type, zone) = \ > > > for ((gen) =3D 0; (gen) < MAX_NR_GENS; (gen)++) = \ > > > for ((type) =3D 0; (type) < ANON_AND_FILE; (type)++) = \ > > > for ((zone) =3D 0; (zone) < MAX_NR_ZONES; (zo= ne)++) > > > > > > +#define for_each_evictable_type(type, swappiness) = \ > > > + for ((type) =3D !(swappiness); (type) <=3D ((swappiness) !=3D= MAX_SWAPPINESS); (type)++) > > > + > > > #define get_memcg_gen(seq) ((seq) % MEMCG_NR_GENS) > > > #define get_memcg_bin(bin) ((bin) % MEMCG_NR_BINS) > > > > > > @@ -2677,10 +2683,16 @@ static int get_nr_gens(struct lruvec *lruvec,= int type) > > > > > > static bool __maybe_unused seq_is_valid(struct lruvec *lruvec) > > > { > > > - /* see the comment on lru_gen_folio */ > > > - return get_nr_gens(lruvec, LRU_GEN_FILE) >=3D MIN_NR_GENS && > > > - get_nr_gens(lruvec, LRU_GEN_FILE) <=3D get_nr_gens(lru= vec, LRU_GEN_ANON) && > > > - get_nr_gens(lruvec, LRU_GEN_ANON) <=3D MAX_NR_GENS; > > > + int type; > > > + > > > + for (type =3D 0; type < ANON_AND_FILE; type++) { > > > + int n =3D get_nr_gens(lruvec, type); > > > + > > > + if (n < MIN_NR_GENS || n > MAX_NR_GENS) > > > + return false; > > > + } > > > + > > > + return true; > > > } > > > > > > /*******************************************************************= *********** > > > @@ -3087,9 +3099,8 @@ static void read_ctrl_pos(struct lruvec *lruvec= , int type, int tier, int gain, > > > pos->refaulted =3D lrugen->avg_refaulted[type][tier] + > > > atomic_long_read(&lrugen->refaulted[hist][ty= pe][tier]); > > > pos->total =3D lrugen->avg_total[type][tier] + > > > + lrugen->protected[hist][type][tier] + > > > atomic_long_read(&lrugen->evicted[hist][type][ti= er]); > > > - if (tier) > > > - pos->total +=3D lrugen->protected[hist][type][tier - = 1]; > > > pos->gain =3D gain; > > > } > > > > > > @@ -3116,17 +3127,15 @@ static void reset_ctrl_pos(struct lruvec *lru= vec, int type, bool carryover) > > > WRITE_ONCE(lrugen->avg_refaulted[type][tier],= sum / 2); > > > > > > sum =3D lrugen->avg_total[type][tier] + > > > + lrugen->protected[hist][type][tier] + > > > atomic_long_read(&lrugen->evicted[hist]= [type][tier]); > > > - if (tier) > > > - sum +=3D lrugen->protected[hist][type= ][tier - 1]; > > > WRITE_ONCE(lrugen->avg_total[type][tier], sum= / 2); > > > } > > > > > > if (clear) { > > > atomic_long_set(&lrugen->refaulted[hist][type= ][tier], 0); > > > atomic_long_set(&lrugen->evicted[hist][type][= tier], 0); > > > - if (tier) > > > - WRITE_ONCE(lrugen->protected[hist][ty= pe][tier - 1], 0); > > > + WRITE_ONCE(lrugen->protected[hist][type][tier= ], 0); > > > } > > > } > > > } > > > @@ -3261,7 +3270,7 @@ static int should_skip_vma(unsigned long start,= unsigned long end, struct mm_wal > > > return true; > > > > > > if (vma_is_anonymous(vma)) > > > - return !walk->can_swap; > > > + return !walk->swappiness; > > > > > > if (WARN_ON_ONCE(!vma->vm_file || !vma->vm_file->f_mapping)) > > > return true; > > > @@ -3271,7 +3280,10 @@ static int should_skip_vma(unsigned long start= , unsigned long end, struct mm_wal > > > return true; > > > > > > if (shmem_mapping(mapping)) > > > - return !walk->can_swap; > > > + return !walk->swappiness; > > > + > > > + if (walk->swappiness =3D=3D MAX_SWAPPINESS) > > > + return true; > > > > > > /* to exclude special mappings like dax, etc. */ > > > return !mapping->a_ops->read_folio; > > > @@ -3359,7 +3371,7 @@ static unsigned long get_pmd_pfn(pmd_t pmd, str= uct vm_area_struct *vma, unsigned > > > } > > > > > > static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgr= oup *memcg, > > > - struct pglist_data *pgdat, bool ca= n_swap) > > > + struct pglist_data *pgdat) > > > { > > > struct folio *folio; > > > > > > @@ -3370,10 +3382,6 @@ static struct folio *get_pfn_folio(unsigned lo= ng pfn, struct mem_cgroup *memcg, > > > if (folio_memcg(folio) !=3D memcg) > > > return NULL; > > > > > > - /* file VMAs can contain anon pages from COW */ > > > - if (!folio_is_file_lru(folio) && !can_swap) > > > - return NULL; > > > - > > > return folio; > > > } > > > > > > @@ -3429,7 +3437,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned= long start, unsigned long end, > > > if (pfn =3D=3D -1) > > > continue; > > > > > > - folio =3D get_pfn_folio(pfn, memcg, pgdat, walk->can_= swap); > > > + folio =3D get_pfn_folio(pfn, memcg, pgdat); > > > if (!folio) > > > continue; > > > > > > @@ -3514,7 +3522,7 @@ static void walk_pmd_range_locked(pud_t *pud, u= nsigned long addr, struct vm_area > > > if (pfn =3D=3D -1) > > > goto next; > > > > > > - folio =3D get_pfn_folio(pfn, memcg, pgdat, walk->can_= swap); > > > + folio =3D get_pfn_folio(pfn, memcg, pgdat); > > > if (!folio) > > > goto next; > > > > > > @@ -3726,22 +3734,26 @@ static void clear_mm_walk(void) > > > kfree(walk); > > > } > > > > > > -static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_sw= ap) > > > +static bool inc_min_seq(struct lruvec *lruvec, int type, int swappin= ess) > > > { > > > int zone; > > > int remaining =3D MAX_LRU_BATCH; > > > struct lru_gen_folio *lrugen =3D &lruvec->lrugen; > > > + int hist =3D lru_hist_from_seq(lrugen->min_seq[type]); > > > int new_gen, old_gen =3D lru_gen_from_seq(lrugen->min_seq[typ= e]); > > > > > > - if (type =3D=3D LRU_GEN_ANON && !can_swap) > > > + if (type ? swappiness =3D=3D MAX_SWAPPINESS : !swappiness) > > > goto done; > > > > > > - /* prevent cold/hot inversion if force_scan is true */ > > > + /* prevent cold/hot inversion if the type is evictable */ > > > for (zone =3D 0; zone < MAX_NR_ZONES; zone++) { > > > struct list_head *head =3D &lrugen->folios[old_gen][t= ype][zone]; > > > > > > while (!list_empty(head)) { > > > struct folio *folio =3D lru_to_folio(head); > > > + int refs =3D folio_lru_refs(folio); > > > + int tier =3D lru_tier_from_refs(refs); > > > + int delta =3D folio_nr_pages(folio); > > > > > > VM_WARN_ON_ONCE_FOLIO(folio_test_unevictable(= folio), folio); > > > VM_WARN_ON_ONCE_FOLIO(folio_test_active(folio= ), folio); > > > @@ -3751,6 +3763,9 @@ static bool inc_min_seq(struct lruvec *lruvec, = int type, bool can_swap) > > > new_gen =3D folio_inc_gen(lruvec, folio, fals= e); > > > list_move_tail(&folio->lru, &lrugen->folios[n= ew_gen][type][zone]); > > > > > > + WRITE_ONCE(lrugen->protected[hist][type][tier= ], > > > + lrugen->protected[hist][type][tier= ] + delta); > > > + > > > if (!--remaining) > > > return false; > > > } > > > @@ -3762,7 +3777,7 @@ static bool inc_min_seq(struct lruvec *lruvec, = int type, bool can_swap) > > > return true; > > > } > > > > > > -static bool try_to_inc_min_seq(struct lruvec *lruvec, bool can_swap) > > > +static bool try_to_inc_min_seq(struct lruvec *lruvec, int swappiness= ) > > > { > > > int gen, type, zone; > > > bool success =3D false; > > > @@ -3772,7 +3787,7 @@ static bool try_to_inc_min_seq(struct lruvec *l= ruvec, bool can_swap) > > > VM_WARN_ON_ONCE(!seq_is_valid(lruvec)); > > > > > > /* find the oldest populated generation */ > > > - for (type =3D !can_swap; type < ANON_AND_FILE; type++) { > > > + for_each_evictable_type(type, swappiness) { > > > while (min_seq[type] + MIN_NR_GENS <=3D lrugen->max_s= eq) { > > > gen =3D lru_gen_from_seq(min_seq[type]); > > > > > > @@ -3788,13 +3803,17 @@ static bool try_to_inc_min_seq(struct lruvec = *lruvec, bool can_swap) > > > } > > > > > > /* see the comment on lru_gen_folio */ > > > - if (can_swap) { > > > - min_seq[LRU_GEN_ANON] =3D min(min_seq[LRU_GEN_ANON], = min_seq[LRU_GEN_FILE]); > > > - min_seq[LRU_GEN_FILE] =3D max(min_seq[LRU_GEN_ANON], = lrugen->min_seq[LRU_GEN_FILE]); > > > + if (swappiness && swappiness !=3D MAX_SWAPPINESS) { > > > + unsigned long seq =3D lrugen->max_seq - MIN_NR_GENS; > > > + > > > + if (min_seq[LRU_GEN_ANON] > seq && min_seq[LRU_GEN_FI= LE] < seq) > > > + min_seq[LRU_GEN_ANON] =3D seq; > > > + else if (min_seq[LRU_GEN_FILE] > seq && min_seq[LRU_G= EN_ANON] < seq) > > > + min_seq[LRU_GEN_FILE] =3D seq; > > > } > > > > > > - for (type =3D !can_swap; type < ANON_AND_FILE; type++) { > > > - if (min_seq[type] =3D=3D lrugen->min_seq[type]) > > > + for_each_evictable_type(type, swappiness) { > > > + if (min_seq[type] <=3D lrugen->min_seq[type]) > > > continue; > > > > > > reset_ctrl_pos(lruvec, type, true); > > > @@ -3805,8 +3824,7 @@ static bool try_to_inc_min_seq(struct lruvec *l= ruvec, bool can_swap) > > > return success; > > > } > > > > > > -static bool inc_max_seq(struct lruvec *lruvec, unsigned long seq, > > > - bool can_swap, bool force_scan) > > > +static bool inc_max_seq(struct lruvec *lruvec, unsigned long seq, in= t swappiness) > > > { > > > bool success; > > > int prev, next; > > > @@ -3824,13 +3842,11 @@ static bool inc_max_seq(struct lruvec *lruvec= , unsigned long seq, > > > if (!success) > > > goto unlock; > > > > > > - for (type =3D ANON_AND_FILE - 1; type >=3D 0; type--) { > > > + for (type =3D 0; type < ANON_AND_FILE; type++) { > > > if (get_nr_gens(lruvec, type) !=3D MAX_NR_GENS) > > > continue; > > > > > > - VM_WARN_ON_ONCE(!force_scan && (type =3D=3D LRU_GEN_F= ILE || can_swap)); > > > - > > > - if (inc_min_seq(lruvec, type, can_swap)) > > > + if (inc_min_seq(lruvec, type, swappiness)) > > > continue; > > > > > > spin_unlock_irq(&lruvec->lru_lock); > > > @@ -3874,7 +3890,7 @@ static bool inc_max_seq(struct lruvec *lruvec, = unsigned long seq, > > > } > > > > > > static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long = seq, > > > - bool can_swap, bool force_scan) > > > + int swappiness, bool force_scan) > > > { > > > bool success; > > > struct lru_gen_mm_walk *walk; > > > @@ -3885,7 +3901,7 @@ static bool try_to_inc_max_seq(struct lruvec *l= ruvec, unsigned long seq, > > > VM_WARN_ON_ONCE(seq > READ_ONCE(lrugen->max_seq)); > > > > > > if (!mm_state) > > > - return inc_max_seq(lruvec, seq, can_swap, force_scan)= ; > > > + return inc_max_seq(lruvec, seq, swappiness); > > > > > > /* see the comment in iterate_mm_list() */ > > > if (seq <=3D READ_ONCE(mm_state->seq)) > > > @@ -3910,7 +3926,7 @@ static bool try_to_inc_max_seq(struct lruvec *l= ruvec, unsigned long seq, > > > > > > walk->lruvec =3D lruvec; > > > walk->seq =3D seq; > > > - walk->can_swap =3D can_swap; > > > + walk->swappiness =3D swappiness; > > > walk->force_scan =3D force_scan; > > > > > > do { > > > @@ -3920,7 +3936,7 @@ static bool try_to_inc_max_seq(struct lruvec *l= ruvec, unsigned long seq, > > > } while (mm); > > > done: > > > if (success) { > > > - success =3D inc_max_seq(lruvec, seq, can_swap, force_= scan); > > > + success =3D inc_max_seq(lruvec, seq, swappiness); > > > WARN_ON_ONCE(!success); > > > } > > > > > > @@ -3961,13 +3977,13 @@ static bool lruvec_is_sizable(struct lruvec *= lruvec, struct scan_control *sc) > > > { > > > int gen, type, zone; > > > unsigned long total =3D 0; > > > - bool can_swap =3D get_swappiness(lruvec, sc); > > > + int swappiness =3D get_swappiness(lruvec, sc); > > > struct lru_gen_folio *lrugen =3D &lruvec->lrugen; > > > struct mem_cgroup *memcg =3D lruvec_memcg(lruvec); > > > DEFINE_MAX_SEQ(lruvec); > > > DEFINE_MIN_SEQ(lruvec); > > > > > > - for (type =3D !can_swap; type < ANON_AND_FILE; type++) { > > > + for_each_evictable_type(type, swappiness) { > > > unsigned long seq; > > > > > > for (seq =3D min_seq[type]; seq <=3D max_seq; seq++) = { > > > @@ -3987,6 +4003,7 @@ static bool lruvec_is_reclaimable(struct lruvec= *lruvec, struct scan_control *sc > > > { > > > int gen; > > > unsigned long birth; > > > + int swappiness =3D get_swappiness(lruvec, sc); > > > struct mem_cgroup *memcg =3D lruvec_memcg(lruvec); > > > DEFINE_MIN_SEQ(lruvec); > > > > > > @@ -3996,8 +4013,7 @@ static bool lruvec_is_reclaimable(struct lruvec= *lruvec, struct scan_control *sc > > > if (!lruvec_is_sizable(lruvec, sc)) > > > return false; > > > > > > - /* see the comment on lru_gen_folio */ > > > - gen =3D lru_gen_from_seq(min_seq[LRU_GEN_FILE]); > > > + gen =3D lru_gen_from_seq(evictable_min_seq(min_seq, swappines= s)); > > > birth =3D READ_ONCE(lruvec->lrugen.timestamps[gen]); > > > > > > return time_is_before_jiffies(birth + min_ttl); > > > @@ -4064,7 +4080,6 @@ bool lru_gen_look_around(struct page_vma_mapped= _walk *pvmw) > > > unsigned long addr =3D pvmw->address; > > > struct vm_area_struct *vma =3D pvmw->vma; > > > struct folio *folio =3D pfn_folio(pvmw->pfn); > > > - bool can_swap =3D !folio_is_file_lru(folio); > > > struct mem_cgroup *memcg =3D folio_memcg(folio); > > > struct pglist_data *pgdat =3D folio_pgdat(folio); > > > struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, pgdat); > > > @@ -4117,7 +4132,7 @@ bool lru_gen_look_around(struct page_vma_mapped= _walk *pvmw) > > > if (pfn =3D=3D -1) > > > continue; > > > > > > - folio =3D get_pfn_folio(pfn, memcg, pgdat, can_swap); > > > + folio =3D get_pfn_folio(pfn, memcg, pgdat); > > > if (!folio) > > > continue; > > > > > > @@ -4333,8 +4348,8 @@ static bool sort_folio(struct lruvec *lruvec, s= truct folio *folio, struct scan_c > > > gen =3D folio_inc_gen(lruvec, folio, false); > > > list_move_tail(&folio->lru, &lrugen->folios[gen][type= ][zone]); > > > > > > - WRITE_ONCE(lrugen->protected[hist][type][tier - 1], > > > - lrugen->protected[hist][type][tier - 1] + = delta); > > > + WRITE_ONCE(lrugen->protected[hist][type][tier], > > > + lrugen->protected[hist][type][tier] + delt= a); > > > return true; > > > } > > > > > > @@ -4533,7 +4548,6 @@ static int isolate_folios(struct lruvec *lruvec= , struct scan_control *sc, int sw > > > { > > > int i; > > > int type; > > > - int scanned; > > > int tier =3D -1; > > > DEFINE_MIN_SEQ(lruvec); > > > > > > @@ -4558,21 +4572,23 @@ static int isolate_folios(struct lruvec *lruv= ec, struct scan_control *sc, int sw > > > else > > > type =3D get_type_to_scan(lruvec, swappiness, &tier); > > > > > > - for (i =3D !swappiness; i < ANON_AND_FILE; i++) { > > > + for_each_evictable_type(i, swappiness) { > > > > Thanks for working on solving the reported issues, but one concern > > about this for_each_evictable_type macro and its usage here. > > > > It basically forbids eviction of file pages with "swappiness =3D=3D 200= " > > even for global pressure, this is a quite a change. > > > > For both active / inactive or MGLRU, max swappiness used to make > > kernel try reclaim anon as much as possible, but still fall back to > > file eviction. Forbidding file eviction may cause unsolvable OOM, > > unlike anon pages, killing process won't necessarily release file > > pages, so the system could hung easily. > > > > For existing systems with swappiness =3D=3D 200 which were running fine > > before, may also hit OOM very quickly. > > Do you know anyone actually uses 200? I only use 200 for testing but I > can use 201 instead, since the debugfs interface isn't limited to [0, > 200]. > Thanks for the update patch. Yes, I've seen some users using 200, especially with ZRAM/ZSWAP, so the kernel prefers to keep page cache in memory when under pressure. We also use 200 for some workloads. We have an internal patch similar to your update, which allows using 201 for proactive reclaim, so proactive reclaim is able to only compress pages in-RAM, to avoid increased IO due to page cache miss, and it worked very well.