From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2F98D70E09 for ; Fri, 29 Nov 2024 03:07:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E0F166B0083; Thu, 28 Nov 2024 22:07:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D97AE6B0085; Thu, 28 Nov 2024 22:07:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C11B96B0088; Thu, 28 Nov 2024 22:07:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9FE776B0083 for ; Thu, 28 Nov 2024 22:07:44 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 234E41C7817 for ; Fri, 29 Nov 2024 03:07:44 +0000 (UTC) X-FDA: 82837647564.29.4F901EB Received: from mail-vk1-f174.google.com (mail-vk1-f174.google.com [209.85.221.174]) by imf11.hostedemail.com (Postfix) with ESMTP id EAB5840005 for ; Fri, 29 Nov 2024 03:07:33 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="HAd/C38N"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732849655; a=rsa-sha256; cv=none; b=wmsjqhCsk6pOFyDbaqrDn757QgRclr2kHimPpqF5jv2zeeidNtIeqt63HXTW9VLwTZZZiR muYd9Ho6gl1ieOxvtTTZlROi2E7qtOCeHYczj688JHCc0JK7Q3aMaxV9XcFyCp1iEIzRqd +bchuQpTTZ0R5vsYRBXS3lUYOetWgVc= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="HAd/C38N"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732849655; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2PxQCJ6IM1PdfpwcUesdeHaMzC0q6d8jho6vb/GjvMs=; b=cy3bdYK9E7dY2UbUB5AMGfAnJmIFBzilW6hl8i8UI8fpkPV7qMgXIL8sKEcbfkvjgD0jWA TEaNPElYSzLRW+IwAyDpzcNxLr/HsU8KYU2Twm9YSOLReLdJh8hjTboXTqHhuaQdxAFuy7 wxz8u9+B6CG0CH8ejQUvBR5N6Ha1WIk= Received: by mail-vk1-f174.google.com with SMTP id 71dfb90a1353d-5152a869d10so339004e0c.1 for ; Thu, 28 Nov 2024 19:07:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732849661; x=1733454461; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2PxQCJ6IM1PdfpwcUesdeHaMzC0q6d8jho6vb/GjvMs=; b=HAd/C38NG19zfgASc57ImLPTq61S1Xwm6VMMFUqL5tf9ipVHczdA6Ui6AHoXzEXS3O hiIlu1BniYdZNF70xjECNnGp+tmg+kE6rQcVauLl1a+rA4/KpR9/jExpqK8eJcv76V26 UKuIq/MNQrTSqT1jjgyRjeP9XtS/8KxevgOqYTFtsDry7ZoaLvboGwiTj1ecgci4AK37 usdlbTToRHi+JqviD+/vQ1R4v3AoyWW/zCGNgRloQXu43iMokcYzHtKxWbS5I5AMnxmO pPXW/G8H5eWi4byTzl/w9613P5AvKwiIcpUqjmEF5/iFBOL2cAOqaAd9qoWIB4HKI9b6 mKxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732849661; x=1733454461; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2PxQCJ6IM1PdfpwcUesdeHaMzC0q6d8jho6vb/GjvMs=; b=CrAgtR33TfaGjZzU6fdr5XAh6oXAy4FQqlybthg9SIVrHX8nfKf84sfc7OxiXrZlLo w81Td06MPOLRLTq4hWUnN4IMYsHu13qCZYqv+1J6SZMz3J8GGmHaOUAmf1DUHBNTncum emej6sC4I/3SVzV+4DSt4ngmnxmMElLM0xlKKlF6mmhanfRu5eVBfVctJBArMoh4oAAA 9hiSFtG25g2pA2+cDEPbSWPza2SXfkqyFuUSK/Gv5zEQfIhI4OW1gBfo58K0kxEVrm7Q MkXDhVGoZeqDkNrwfzi+pzOBE72aXY9hjXUer6tA2P6vUr1AuFopJG90Hb4XKqm2PvgX X9sA== X-Forwarded-Encrypted: i=1; AJvYcCVtlilFJc8qrFs/ecFHFnAcUIAAopmER5fPmG1JnswBvLQllTdrJCZFO1uP45NXYDJ2m8aKtOwPpg==@kvack.org X-Gm-Message-State: AOJu0Yy4gNbqL8B19zRIPp3wpsPeFVqEhuaF9sHgi7NaP9uWrBmx4nW4 kRKjopMakvvN/isPB7sKAhVbeDKfvZrwbq7/YWc/mDpnYTxkxxeCl7jo/xeV2uB/X6OOUKIi3KB aoUzIGfU4u9aBcR9R6hb/iVGyvUs= X-Gm-Gg: ASbGnctbjAljmQvoKTrJnea53mTnIh0aim0RMqkr1hTeEElR1HONhRQ2um8xWy37SFu yHQdVlUQR83itnhuhvb4Cp1y3U9G9WiOW X-Google-Smtp-Source: AGHT+IEuhffoilVAynztg0PraImdh7qfFg+O+uctebKpKqpDYUbkAQXxfUdbtefy6lqcVLRrM6np0YGYmPWCIDO0ROE= X-Received: by 2002:a05:6122:2a41:b0:510:185:5d9c with SMTP id 71dfb90a1353d-51556a3e2b4mr11871026e0c.11.1732849661348; Thu, 28 Nov 2024 19:07:41 -0800 (PST) MIME-Version: 1.0 References: <20241116091658.1983491-1-chenridong@huaweicloud.com> <20241116091658.1983491-2-chenridong@huaweicloud.com> <7e617fe7-388f-43a1-b0fa-e2998194b90c@huawei.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Fri, 29 Nov 2024 16:07:29 +1300 Message-ID: Subject: Re: [RFC PATCH v2 1/1] mm/vmscan: move the written-back folios to the tail of LRU after shrinking To: chenridong Cc: Yu Zhao , Matthew Wilcox , Chris Li , Chen Ridong , akpm@linux-foundation.org, mhocko@suse.com, hannes@cmpxchg.org, yosryahmed@google.com, david@redhat.com, ryan.roberts@arm.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, wangweiyang2@huawei.com, xieym_ict@hotmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 8pwoeyt8xrh8cx6gjhapdxbom8ucmwg3 X-Rspam-User: X-Rspamd-Queue-Id: EAB5840005 X-Rspamd-Server: rspam08 X-HE-Tag: 1732849653-142947 X-HE-Meta: U2FsdGVkX1/sJSiIcMmNVwDrhfCyJz9+1CgOTbJfvd8vch3GcUbGv8V4d1xdd+X3s/2lkroRK4UuhQSD1rX3dmaVTAZfOTHvCeS1FfxjMQcESrjpUaq+miAHXV8MXhPrm+xVVQjFRLTlozBrpzJss8Zfjtsf9wGgDWuVA43i0K+/2bQ25VFlrbyPs4xZ2vXJ78eBjAed1HV+DSNJKG/WGCe6tnSB4T/APSppPoruZSoJGDKiIoVOdAVvkxaDk2Tq5kW12OjdVL7ow19N7qaWixJqJlvxa+Jl6c+puWz+TzD4gqVSp8t9PlN6jMfiK4c9tAkwZZNVbEXtop5i79Gb4zhytfK7El4Z7hriHpfLG+lQAwPpc3ArVNKSMH4S1QaTtr9Ig/DoIwplpiFpx1rAM3zHuvOSn4vUGwMt3CBAp5DqVapuoj/G9e91IUidEz/rnUCkixgYu2R3bTeu7lYIHBdjo0gaxsa5X3CBjM7I76GJ9tSSGAFsCY/bh242ZIyOxIVpGMKp+MBPVUY5wyq3XB13E7QAEZ2eh39C+Zvb87qq3M8RfwNocMFaJbEcRWC5G0qdFjj0xUXJ8TrZN1Iy6CYkAK2D9bCYr3IeMkmmYrchcoM3te0ar/acypUz6qXpWishpi7XU4VwpaEf747UzeyepBKUW4gbfzd8NMUKvCo6gDqkBE12W1kdzK/RLFZ5NYt6LiVheyc0/daFhAms6pT9BLcX0vDwKkXtm1M60Om8vERWhf6Ca5z1nkJNbs8lRx3PzdZFIzG4AEdtWaTvXm1Lv+EwpyCFhok+Pgrfkez7rRlydUlfgAZhgHV4fzqtLClPCg31h+9pTwxlHml9OzvRMYaB/sh30GrOKfBKfy4I3G42L+vu8R4A9+hPGKynGzG7Da5iRvMO/oXHOLqrLchBEashmfiKG4UytzRnhm+dcw9E7Y4japx6l5eDoqdZaNsBT1a4Cvnib231KPh lw/dhMKD EAjK+NsFWIW/K0hVqKMRE3fpeMmgRJOYqcLbmbhyIRd3urHJ0swyPnmrWDZGRVhITm0tZByxjbeMqI4Rw4ZaS76hoLYCbINnoNwX8wddlCKtGmautWtUnQNmhIMfIdLr2HUkiqLOAck4E1gyTdkdMPk70ZDrgQe7Q09UIzFke2UlQe7EgpjngD70ZeesG+a6Zx0nIyFK+EE5AC0zI9grvc9FPvLj8v+fM1QDKJU60RN+NKlVvf+YVDqcpN4oe4RwpDGyTypwk5XiZ7yLgVSUR1EYAXNvwdRqnqVufICNmOFOzmfm1lYNnZzPfFaHsaAhEHS5Fb4WEDJJRMbgwpiQdKmzYqSvMlIwI+7oHm/xsGLV/6WAwgx5OaVkAT/IHH8tnljDf8rVLv7YeHbDvo2/Gapb/R2cFcmUjilVzKgQgXoPc3mI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Nov 29, 2024 at 3:25=E2=80=AFPM chenridong = wrote: > > > > On 2024/11/29 7:08, Barry Song wrote: > > On Mon, Nov 25, 2024 at 2:19=E2=80=AFPM chenridong wrote: > >> > >> > >> > >> On 2024/11/18 12:21, Matthew Wilcox wrote: > >>> On Mon, Nov 18, 2024 at 05:14:14PM +1300, Barry Song wrote: > >>>> On Mon, Nov 18, 2024 at 5:03=E2=80=AFPM Matthew Wilcox wrote: > >>>>> > >>>>> On Sat, Nov 16, 2024 at 09:16:58AM +0000, Chen Ridong wrote: > >>>>>> 2. In shrink_page_list function, if folioN is THP(2M), it may be s= plited > >>>>>> and added to swap cache folio by folio. After adding to swap ca= che, > >>>>>> it will submit io to writeback folio to swap, which is asynchro= nous. > >>>>>> When shrink_page_list is finished, the isolated folios list wil= l be > >>>>>> moved back to the head of inactive lru. The inactive lru may ju= st look > >>>>>> like this, with 512 filioes have been move to the head of inact= ive lru. > >>>>> > >>>>> I was hoping that we'd be able to stop splitting the folio when add= ing > >>>>> to the swap cache. Ideally. we'd add the whole 2MB and write it ba= ck > >>>>> as a single unit. > >>>> > >>>> This is already the case: adding to the swapcache doesn=E2=80=99t re= quire splitting > >>>> THPs, but failing to allocate 2MB of contiguous swap slots will. > >>> > >>> Agreed we need to understand why this is happening. As I've said a f= ew > >>> times now, we need to stop requiring contiguity. Real filesystems do= n't > >>> need the contiguity (they become less efficient, but they can scatter= a > >>> single 2MB folio to multiple places). > >>> > >>> Maybe Chris has a solution to this in the works? > >>> > >> > >> Hi, Chris, do you have a better idea to solve this issue? > > > > Not Chris. As I read the code again, we have already the below code to = fixup > > the issue "missed folio_rotate_reclaimable()" in evict_folios(): > > > > /* retry folios that may have missed > > folio_rotate_reclaimable() */ > > list_move(&folio->lru, &clean); > > > > It doesn't work for you? > > > > commit 359a5e1416caaf9ce28396a65ed3e386cc5de663 > > Author: Yu Zhao > > Date: Tue Nov 15 18:38:07 2022 -0700 > > mm: multi-gen LRU: retry folios written back while isolated > > > > The page reclaim isolates a batch of folios from the tail of one of= the > > LRU lists and works on those folios one by one. For a suitable > > swap-backed folio, if the swap device is async, it queues that foli= o for > > writeback. After the page reclaim finishes an entire batch, it put= s back > > the folios it queued for writeback to the head of the original LRU = list. > > > > In the meantime, the page writeback flushes the queued folios also = by > > batches. Its batching logic is independent from that of the page r= eclaim. > > For each of the folios it writes back, the page writeback calls > > folio_rotate_reclaimable() which tries to rotate a folio to the tai= l. > > > > > > folio_rotate_reclaimable() only works for a folio after the page re= claim > > has put it back. If an async swap device is fast enough, the page > > writeback can finish with that folio while the page reclaim is stil= l > > working on the rest of the batch containing it. In this case, that= folio > > will remain at the head and the page reclaim will not retry it befo= re > > reaching there. > > > > This patch adds a retry to evict_folios(). After evict_folios() ha= s > > finished an entire batch and before it puts back folios it cannot f= ree > > immediately, it retries those that may have missed the rotation. > > Before this patch, ~60% of folios swapped to an Intel Optane missed > > folio_rotate_reclaimable(). After this patch, ~99% of missed folio= s were > > reclaimed upon retry. > > > > This problem affects relatively slow async swap devices like Samsun= g 980 > > Pro much less and does not affect sync swap devices like zram or zs= wap at > > all. > > > >> > >> Best regards, > >> Ridong > > > > Thanks > > Barry > > Thank you for your reply, Barry. > I found this issue with 5.10 version. I reproduced this issue with the > next version, but the CONFIG_LRU_GEN_ENABLED kconfig is disabled. I > tested again with CONFIG_LRU_GEN_ENABLED enabled, and this issue can be > fixed. > > IIUC, the 359a5e1416caaf9ce28396a65ed3e386cc5de663 commit can only work > when CONFIG_LRU_GEN_ENABLED is enabled, but this issue exists when > CONFIG_LRU_GEN_ENABLED is disabled and it should be fixed. > > I read the code of commit 359a5e1416caaf9ce28396a65ed3e386cc5de663, it > found folios that are missed to rotate in a more complicated way, but it > makes it much clearer what is being done. Should I implement in Yu > Zhao's way? yes. this is completely the same thing. since Yu only fixed in mglru and you are still using active/inactive, the same fix should apply to active/inactive lru. > > Best regards, > Ridong thanks barry