From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E20D8C46CCD for ; Thu, 21 Dec 2023 23:16:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 67D736B0082; Thu, 21 Dec 2023 18:16:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 62D4C6B0087; Thu, 21 Dec 2023 18:16:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F50A6B0088; Thu, 21 Dec 2023 18:16:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 3EB286B0082 for ; Thu, 21 Dec 2023 18:16:11 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E1B581C0AA0 for ; Thu, 21 Dec 2023 23:16:10 +0000 (UTC) X-FDA: 81592385700.24.92E2D07 Received: from mail-yw1-f180.google.com (mail-yw1-f180.google.com [209.85.128.180]) by imf25.hostedemail.com (Postfix) with ESMTP id 23C1AA001B for ; Thu, 21 Dec 2023 23:16:07 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=QIJ7Unxi; spf=pass (imf25.hostedemail.com: domain of yuanchu@google.com designates 209.85.128.180 as permitted sender) smtp.mailfrom=yuanchu@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703200568; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WpexymlvbsYlSMlmmsXVgPV1uAQkpsZNBznFvXRLoYU=; b=md/DjVacP1OPL86nMnig0AZyLVqCybv1uNgux65oqX+Pfe56Hu+3+qCy/bDCmGcm6sC48c LCljGmGFlCTzRHMkWWuXtgWlPwTw6b+6cGxKoPQY5XMSMaG69Kxn/7IUHqN/p+tNbT2amS gu3XSLG1ZsKZ+nKI3BZ5vuowlZItwfs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703200568; a=rsa-sha256; cv=none; b=0F2CGndQKS1QPB9EHlns63I/VUgtvTonOvh89G6ULBBWrWW7OvztBuZaK+N8breppTXEmd n0OQe0V/BegNVsq79IHWFYQM06a2daBtD5MfXK4rJxmZhIKd9Ut2DaXBmiNlXPKu4lYWT6 +TfC0VrcTgmtQFHFM3eeS/rQdkC1rNk= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=QIJ7Unxi; spf=pass (imf25.hostedemail.com: domain of yuanchu@google.com designates 209.85.128.180 as permitted sender) smtp.mailfrom=yuanchu@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f180.google.com with SMTP id 00721157ae682-5e266e8d39eso13959227b3.1 for ; Thu, 21 Dec 2023 15:16:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1703200567; x=1703805367; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WpexymlvbsYlSMlmmsXVgPV1uAQkpsZNBznFvXRLoYU=; b=QIJ7Unxi12mWPofjWelrkB7hoHkOMmQ0Bp5dRt+eLQ9BGlKSeb01LfYgC/FWURjoFF Q/IYe4egrY7t6meveQ2N+4YothQLItu3jGUssHghKmKVcRDH8ix/n5u7w9NkfNx6tBRO vtPNMZ6fN0BRRwJeshwx/I6lKc+WdkxW7Gi/dQkuv+s77YrytTUN8XNKuSpEM0gRMSGI tV2H6ZPNktfvkFtXtfS6CngWhfqfmoOhfh63g2lflBjR76UMC0Ul5vAAMHDcyQ/HdRjy fFU9Bxdowp4FGaeiVq3/p0b6mgKUqoYYVBRmRjkBRctszdDxJpmAbeBxT4jJNSCpOXX1 4BDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703200567; x=1703805367; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WpexymlvbsYlSMlmmsXVgPV1uAQkpsZNBznFvXRLoYU=; b=xPm2c0SP+80qzbr+xpFSUaPZ7Xuvpt3mZuIkeusl2Io+iJ8W20ariW580LTnWlpP+N T0cO7LpYNAC6/1Z5Zr6MjMMItd2BLmOXVP961bEgcloD9LKW0ifx9GvH7juQgQVZmMaP bxVwGVUfVBc3bIbJO3t1hNHTIgjVY+JTwXrqURCAzxSXEV0iGADUwaS7OyrN6gN5pC9g NSy774uuZ81gtOthiZgWKv08ryhtW/cd10vDJjYKEoV8chEL8huN3ifxZsiF/vEREVSm rfR6ZtdX1tyLLxw827rU86e/SYoMULm2yCyNZBI5XftGjCX8JzsQD30OgN4u3wnFAgF8 ws3g== X-Gm-Message-State: AOJu0YybCkzTgfFP3JblBwhkrB8/DcXPjTEusizcHG/48av9UCHxSWYk vQoYKfT4uwlDKQDj+yKNKAUigZ+0jWsRI5F8Fs3q/iGPJZ5I X-Google-Smtp-Source: AGHT+IEGf1KoiAg3qalYzzNpGvsry2XmOirPe4CPw3+bqr7ew0YtDxKoOdwbNY9/lbIL0zPmtJamo7Q5lch6OgyXEq8= X-Received: by 2002:a81:6dcb:0:b0:5e8:bfc4:c09 with SMTP id i194-20020a816dcb000000b005e8bfc40c09mr556497ywc.52.1703200567063; Thu, 21 Dec 2023 15:16:07 -0800 (PST) MIME-Version: 1.0 References: <20231215105324.41241-1-henry.hj@antgroup.com> In-Reply-To: <20231215105324.41241-1-henry.hj@antgroup.com> From: Yuanchu Xie Date: Thu, 21 Dec 2023 15:15:54 -0800 Message-ID: Subject: Re: [RFC v2] mm: Multi-Gen LRU: fix use mm/page_idle/bitmap To: Henry Huang Cc: yuzhao@google.com, akpm@linux-foundation.org, =?UTF-8?B?6LCI6Ym06ZSL?= , linux-kernel@vger.kernel.org, linux-mm@kvack.org, =?UTF-8?B?5pyx6L6JKOiMtuawtCk=?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: ipd6wcca8iu5jgzygoybtbgmc9gzctmc X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 23C1AA001B X-Rspam-User: X-HE-Tag: 1703200567-44510 X-HE-Meta: U2FsdGVkX18JtmKuhPvhxPlc1412Hs6c33/FTEHtqoFt5U1dFVX58H2DDfrpOGEYMB7zx/GLbMiJsDdq5gOTmFXpnNYY84m1D6Z0PDUbDygRsKEK2tduDl2Uw7aHidWUIbw0ZMzC42IHQxtbz+w5UsVSk9zMlXY/D4nC6BEZc8kam0Y4+G8ZJdo/Y0kh7KskJh2oSXf4wcllvhvxDnv51Su+hDfwzBJzmbKjWHoeGvbjCh27dURQcFMAOgmdp6tlQFVQxLlFN/dNxiNCsqUPGJF7yIh5B2HnwD+2+20nisLWpns34CQcekK2oa2/4KDanPI2m+SmjluBDmo5JU4+tlaoNIzVCB5PILPFzx3T05roxx/m5XZ9/U7tIXOJTcwygRD4ZcLW+N/bcFXJqPKK4R9DaFYHoSff9Jy4bkmsucntTBITEZ7SVKRp1/iMtJ4ewz8oK8h8D38wdDopX1epwBQcDxWWsp3Ur9R1OoLXxc+ZKMjOxLLvdSntTh8htReoODb7VWZ8tdbJ6j7qesEYK12/cEVoUIap2BvkSi8KciwFOClFXLgt00YkM1Z5kf6dgF+zyKPkDR6zSPov+/tO+bxlqvSAs6Mn7J8aaE6ppDjJVQhrj6dnupBnnyhWvqTlv/WWBOWCNhWLQmtFqA7u5ItqjOBo5Ox1ltjQujN0yjAVBOFNnVTZ+utI2CEhcf+Fl5FZ0NB6A09FES7BJrbmU8G7N3bIJHMrY9Hjv5pP/IzyHOMqkUz4fRsu7KSLnPW0yNpW+M+hllPiMXrcjaGpcLEda64KFT+/9St1F06SzL5xS2aEJZ6GVwt286PsUJXGjVDInDTB2kqYoRs6M3aFW3g4qaGBf/qpJoBt6aOzX7GX26E4TbS9N+jYeAgi58WHDSCspCLw6YclU1XAsgI7s8xQ5XJ9IBwaRd5sLDuFnoXEzsrHhGZHKSV39IONSyWQCrORjWVEBWxcLj+W6ue DPqdtEuS g/3owJzII6+3RTbuECsHRMDoMMpWD+vCnV9tsrl+YViv3UNoES0EqF3tBkUA/ngshPmmov+oUgSUY0fGkeUkNlR6jCxRwNGp2aUCBEWN4xcBHbDZ0I8oZkH9LmM9UoX+rk43LHOVCgFv/f9xCU+7Dn8jGEQGuIDQb4zaGrEnX1MvxeZSaycvRk3CE41vLLK7s5wahonpveHXDiHDNlhpoC0sbvR7u1yNfedAfFMdxJ44nsEI0ZINnB5l6miBftdZJ5N3JCGZPMw24uVLc1no/+TyieU+Inn1NRLjPFq0fP6t+og+i3IqMHdMCt/hy2ofvHfy5/70U3k0RAnblED53ZK9Zqb3LWYEKDYhI6nOC8DktNv/l23O88PkDh2Qvnp/VuR/GEYmXFyPY4b21EY+TVNYqK5k/ZAUfynjQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Henry, I have a question on memcg charging for the shared pages. Replied inline. On Fri, Dec 15, 2023 at 2:53=E2=80=AFAM Henry Huang = wrote: > > On Fri, Dec 15, 2023 at 14:46=E2=80=AFPM Yu Zhao wrot= e: > > > > > > Thanks for replying this RFC. > > > > > > > 1. page_idle/bitmap isn't a capable interface at all -- yes, Google > > > > proposed the idea [1], but we don't really use it anymore because o= f > > > > its poor scalability. > > > > > > In our environment, we use /sys/kernel/mm/page_idle/bitmap to check > > > pages whether were accessed during a peroid of time. > > > > Is it a production environment? If so, what's your > > 1. scan interval > > 2. memory size > > > I'm trying to understand why scalability isn't a problem for you. On > > an average server, there are hundreds of millions of PFNs, so it'd be > > very expensive to use that ABI even for a time interval of minutes. > > Thanks for replying. > > Our scan interval is 10 minutes and total memory size is 512GB. > We perferred to reclaim pages which idle age > 1 hour at least. > > > > We manage all pages > > > idle time in userspace. Then use a prediction algorithm to select pag= es > > > to reclaim. These pages would more likely be idled for a long time. > > > "There is a system in place now that is based on a user-space process > > that reads a bitmap stored in sysfs, but it has a high CPU and memory > > overhead, so a new approach is being tried." > > https://lwn.net/Articles/787611/ > > > > Could you elaborate how you solved this problem? > > In out environment, we found that we take average 0.4 core and 300MB memo= ry > to do scan, basic analyse and reclaim idle pages. > > For reducing cpu & memroy usage, we do: > 1. We implement a ratelimiter to control rate of scan and reclaim. > 2. All pages info & idle age were stored in local DB file. Our prediction > algorithm don't need all pages info in memory at the same time. > > In out environment, about 1/3 memory was attemped to allocate as THP, > which may save some cpu usage of scan. > > > > We only need kernel to tell use whether a page is accessed, a boolean > > > value in kernel is enough for our case. > > > > How do you define "accessed"? I.e., through page tables or file > > descriptors or both? > > both > > > > > 2. PG_idle/young, being a boolean value, has poor granularity. If > > > > anyone must use page_idle/bitmap for some specific reason, I'd > > > > recommend exporting generation numbers instead. > > > > > > Yes, at first time, we try using multi-gen LRU proactvie scan and > > > exporting generation&refs number to do the same thing. > > > > > > But there are serveral problems: > > > > > > 1. multi-gen LRU only care about self-memcg pages. In our environment= , > > > it's likely to see that different memcg's process share pages. > > > > This is related to my question above: are those pages mapped into > > different memcgs or not? > > There is a case: > There are two cgroup A, B (B is child cgroup of A) > Process in A create a file and use mmap to read/write this file. > Process in B mmap this file and usually read this file.\ How does the shared memory get charged to the cgroups? Does it all go to cgroup A or B exclusively, or do some pages get charged to each one? > > > > We still have no ideas how to solve this problem. > > > > > > 2. We set swappiness 0, and use proactive scan to select cold pages > > > & proactive reclaim to swap anon pages. But we can't control passive > > > scan(can_swap =3D false), which would make anon pages cold/hot invers= ion > > > in inc_min_seq. > > > > There is an option to prevent the inversion, IIUC, the force_scan > > option is what you are looking for. > > It seems that doesn't work now. > > static void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_= scan) > { > ...... > for (type =3D ANON_AND_FILE - 1; type >=3D 0; type--) { > if (get_nr_gens(lruvec, type) !=3D MAX_NR_GENS) > continue; > > VM_WARN_ON_ONCE(!force_scan && (type =3D=3D LRU_GEN_FILE || can_s= wap)); > > if (inc_min_seq(lruvec, type, can_swap)) > continue; > > spin_unlock_irq(&lruvec->lru_lock); > cond_resched(); > goto restart; > } > ..... > } > > force_scan is not a parameter of inc_min_seq. > In our environment, swappiness is 0, so can_swap would be false. > > static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap) > { > int zone; > int remaining =3D MAX_LRU_BATCH; > struct lru_gen_folio *lrugen =3D &lruvec->lrugen; > int new_gen, old_gen =3D lru_gen_from_seq(lrugen->min_seq[type]); > > if (type =3D=3D LRU_GEN_ANON && !can_swap) > goto done; > ...... > } > > If can_swap is false, would pass anon lru list. > > What's more, in passive scan, force_scan is also false. > > static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc= , bool can_swap) > { > ...... > /* skip this lruvec as it's low on cold folios */ > return try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false) ? -1 = : 0; > } > > Is it a good idea to include a global parameter no_inversion, and modify = inc_min_seq > like this: > > static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap) > { > int zone; > int remaining =3D MAX_LRU_BATCH; > struct lru_gen_folio *lrugen =3D &lruvec->lrugen; > int new_gen, old_gen =3D lru_gen_from_seq(lrugen->min_seq[type]); > > - if (type =3D=3D LRU_GEN_ANON && !can_swap) > + if (type =3D=3D LRU_GEN_ANON && !can_swap && !no_inversion) > goto done; > ...... > } > > -- > 2.43.0 > >