From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1840AD49205 for ; Mon, 18 Nov 2024 09:56:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 797BB6B00B0; Mon, 18 Nov 2024 04:56:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 747486B00C4; Mon, 18 Nov 2024 04:56:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60F696B00C5; Mon, 18 Nov 2024 04:56:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3E8A96B00B0 for ; Mon, 18 Nov 2024 04:56:01 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B8F2B40118 for ; Mon, 18 Nov 2024 09:56:00 +0000 (UTC) X-FDA: 82798758714.16.C1091E1 Received: from mail-vk1-f170.google.com (mail-vk1-f170.google.com [209.85.221.170]) by imf18.hostedemail.com (Postfix) with ESMTP id F3B391C0011 for ; Mon, 18 Nov 2024 09:55:36 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WWqRKflb; spf=pass (imf18.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731923558; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6XBn3eltmgdPbS9blb6pj2zhebO55+mcn//WKceu2aI=; b=Pwg5TS1M7cuuhwzFNCXkfsLGYABcjVlYtkRdtuMx00WdmeEnoyfoVu6JqJIIIt0V6bQarx XsEs32qw7p5avem0LRcdgXPbTKH2XFSxVJL6zYCbhwkxqN7kDsRRpPkHY5V9dvQiNJNflr Yr6pFLhiO0nPhRtUibh/5RNuGqCO/gs= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WWqRKflb; spf=pass (imf18.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731923558; a=rsa-sha256; cv=none; b=jllTxvWGbHIv98YuxPqhMuqy6VCGP5CURuwC2TsYiharIMH+SifRgSaf9od35m1+TDUH0c 8yMgrCIawhI7itcDQfuYlH1usR+KGZB2LBVdFOiZSLoqyztKEQOltqPOkd5L+QuUZeXC8L jemLq/3d31addY1P1iWLWIBt+HbF5F0= Received: by mail-vk1-f170.google.com with SMTP id 71dfb90a1353d-514543a08d0so706016e0c.0 for ; Mon, 18 Nov 2024 01:55:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731923758; x=1732528558; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6XBn3eltmgdPbS9blb6pj2zhebO55+mcn//WKceu2aI=; b=WWqRKflbPA61IpJ4NDLt63ZuWgB9IpQsJXGCfCC8/LSWk4FZVGrWtb5/aELzEdncad 9kjYUIy+q6U27NXa34rvrtdma8QLfA4I6W939o3LIvEozjC3Jix1pRgAcExUnodI3g8w AufcaRXqxHcmdDgVs1DGZMZUV7EVSA0zTBQ1yDR5N7+jJDL+T0BeBEBX+vkb/rQWEJYr ChlTYLpaxrvGpZKk+PKzXUAoBOuJJNDGW3sLDC/Emm1AJFLIiqUQtGSPSLyD68v04LR5 OrtEGG1Z36vPYUHeI+Qk5QpkerFo8oyIwcZvzP7odvIC9pgjeAaRCwMKbe30XfXZokvo BEqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731923758; x=1732528558; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6XBn3eltmgdPbS9blb6pj2zhebO55+mcn//WKceu2aI=; b=idZRiyEl60eXvD2IyrYjO/GTTp8zjuZlhAeBBsBwbj2sg4m3rdyq7WaI/rnFrUCvXp BBJujfL7TG2Z6wfLyQiFIk4pUoCZLc+MuIbeg1LB0HDI044eY2x2zSKKr+EStwsUy+ue vwlJhL6rcdgRQ8M0XTy3NHc3uSTXvg0KnatLtx7OzzrIeXLVIAOyxzX2gsHgutaQIG5Y ImLAbmCH3AucY9Fu0FIDNS8rm8XH223ToklPK+00SXvFNTb7bub4rBjvigBAB6GW4dlY ydc30tbF6sIqpNyL9saT8y+YUHv+5Z8alxg1Zvt7NSODAbpvTLuSCShJDP1p06KtagDP bUYA== X-Forwarded-Encrypted: i=1; AJvYcCXCwanmvmCsBgzm68nRyI6SLpDPk1dum8AnEDAx3kdB93tLzB3dcEAlZYy36pyrmbMKT3u0FvYdyQ==@kvack.org X-Gm-Message-State: AOJu0YyRU+2y+OfSjd5RDeJ3ektPFeSNZbdGk6SgJR162PJ65lifvZq5 /eNNBnvneMIaAPvgO8JxnrbH0KtH2ez9X7mXIPJeVm7fwEB9eDKNgISndpPuBRzbnXCtZTDQIZJ 2/0JIwV+evGYmiVojcnpcNh0w4/Q= X-Google-Smtp-Source: AGHT+IEYb+dQBWlXZysNIhX4ZVXX5//5CKenDruGCDE5lV/Io1VJmSrNOwP2DlJsm8+ewh5qt4HrslfVaGttD4wEhsU= X-Received: by 2002:a05:6122:1689:b0:50c:4eb7:90fa with SMTP id 71dfb90a1353d-51477eedcf0mr10191771e0c.5.1731923757802; Mon, 18 Nov 2024 01:55:57 -0800 (PST) MIME-Version: 1.0 References: <20241116091658.1983491-1-chenridong@huaweicloud.com> <20241116091658.1983491-2-chenridong@huaweicloud.com> <03c18a7b-24fa-4ee6-8682-63f1a81363e5@huawei.com> In-Reply-To: <03c18a7b-24fa-4ee6-8682-63f1a81363e5@huawei.com> From: Barry Song <21cnbao@gmail.com> Date: Mon, 18 Nov 2024 22:55:46 +1300 Message-ID: Subject: Re: [RFC PATCH v2 1/1] mm/vmscan: move the written-back folios to the tail of LRU after shrinking To: chenridong , Chris Li Cc: Matthew Wilcox , Chen Ridong , akpm@linux-foundation.org, mhocko@suse.com, hannes@cmpxchg.org, yosryahmed@google.com, yuzhao@google.com, david@redhat.com, ryan.roberts@arm.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, wangweiyang2@huawei.com, xieym_ict@hotmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: F3B391C0011 X-Stat-Signature: nrrgu9d5kjf3p4ywag9goa1mka3ijtqw X-Rspam-User: X-HE-Tag: 1731923736-401413 X-HE-Meta: U2FsdGVkX18SwMJ0zK+2vGQ0Wd6BYLdlGFuzwiVnXHP2PEwmwNGgySgoLRAXg/usLcouGIgpJKRZX4OatlNTvJB4rux6RUhTWngeIQLm9W8CTcS/GJ86K29p1kGHV6tJLcB0i2D7z3CE+BGfAINh8ppuNf8WL2EABR1XFYqunWHhFLjyIe6MPv+47QUZBHtP4Ebkex+LzLcvFwH5bSpkrVoRAqnrH1yJw4aEFb6KnbpNhwkjFWChkqZLN+CF3bwfLE6udKCxoEak5daYOS3aK0qGp+7BwQ1poZB8VXwWUxpHms/mpVQu/X8bI1kTQmSstOywsgBzPNBw049xQGoX7XzOYJWHVYpjxCgcIIc0Ai/wEmozpX46RJm5aybTMG1mwrQu2CSuNGmF7ataH6X2RYUno+JGRJeO0zQ2+VzQhyQi8Qym9dohsksCvrdE7qGeSbDoPpOf7a5l9OzLHTBxBjdtWl+ZkoKs1a/tUYdTfTiZPZzgBgiyzFSroFtnlQBtrOupKhBitLVqYlUs2qOkarC48aI2zWXUhacmoWZAf2LEZt23jYBIh/c8+DABzIwS1tS953gtvl/vWn/wjeRFmpV41e9ymGPgCaE1xfTmUcjujDg3NwoVQPHV8BKifhtOL24ZLaEo2cqguwUgrs2nM0LcOKehBq4XDYvEufnJQt8rw1+RMz6wG6uuEFfCgf1x4AhraduCoowAkK+zKtEjQqK23kRW7Gx2rnKl6N6FSSsSZwo6KT8TrXmsaA63B04ern72QsLS/CNjcUPUeVNeOO1w1/sTh4e+PR+I6SN2/jmceS9Ogg5Tif/kYQJK2Alt7usdIuBrE7WlsG9Kt+z4dNF3c/J6BiOQnCS1h3qMLrZdnHk0pGPtGXqks6WTahNz0sjY+dM/klpt/I/TJy81ZjU4syuoG7cIBZCZQ7VM+pbcsL6VIwJMu/SRC/Ob2y4r2Czrw36OFynXAjUHT/G T3El12H+ +gm/dea8gt3t8aSKx4gYqgOqOx0PUg3hXCjsoqYJW/HxSZEN0hJBgo/K2k8q/cyQCXu7loxt4IILq0WdOu3YxOqrdpPykve46Tpr8AFDwVczwVCYpMpCWn/w9f0tYSWhAta0rG671dPKqwHOyQWxaMZgg31H5STNYHkIrrGsWZCFJTvDqIuqFFydUV+EpVGtZgj5r29dXzf7srNO1wDADJR1pfMVy3Ts3sNFjKE/9qsi6USKWocOSJsAV2h29I6BRgQ+V2YxnJ1EvIqUd4fGfcjbqADW51SRI4JO8YnkkQc2JxElfo0KO9kAgVQsBJeVYpnOykJKOxkhV0PQhGVwAaC70brEvGjLN11amGcOcwBAugjmvhtXfiFMuVuDLKS3yM35seUHLy1w7Oai1q8Ibh99gvQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 18, 2024 at 10:41=E2=80=AFPM chenridong = wrote: > > > > On 2024/11/18 12:14, Barry Song wrote: > > On Mon, Nov 18, 2024 at 5:03=E2=80=AFPM Matthew Wilcox wrote: > >> > >> On Sat, Nov 16, 2024 at 09:16:58AM +0000, Chen Ridong wrote: > >>> 2. In shrink_page_list function, if folioN is THP(2M), it may be spli= ted > >>> and added to swap cache folio by folio. After adding to swap cache= , > >>> it will submit io to writeback folio to swap, which is asynchronou= s. > >>> When shrink_page_list is finished, the isolated folios list will b= e > >>> moved back to the head of inactive lru. The inactive lru may just = look > >>> like this, with 512 filioes have been move to the head of inactive= lru. > >> > >> I was hoping that we'd be able to stop splitting the folio when adding > >> to the swap cache. Ideally. we'd add the whole 2MB and write it back > >> as a single unit. > > > > This is already the case: adding to the swapcache doesn=E2=80=99t requi= re splitting > > THPs, but failing to allocate 2MB of contiguous swap slots will. > > > >> > >> This is going to become much more important with memdescs. We'd have = to > >> allocate 512 struct folios to do this, which would be about 10 4kB pag= es, > >> and if we're trying to swap out memory, we're probably low on memory. > >> > >> So I don't like this solution you have at all because it doesn't help = us > >> get to the solution we're going to need in about a year's time. > >> > > > > Ridong might need to clarify why this splitting is occurring. If it=E2= =80=99s due to the > > failure to allocate swap slots, we still need a solution to address it. > > > > Thanks > > Barry > > shrink_folio_list > add_to_swap > folio_alloc_swap > get_swap_pages > scan_swap_map_slots > /* > * Swapfile is not block device or not using clusters so unable > * to allocate large entries. > */ > if (!(si->flags & SWP_BLKDEV) || !si->cluster_info) > return 0; > > In my test, I use a file as swap, which is not 'SWP_BLKDEV'. So it > failed to get get_swap_pages. Alright, a proper non-rotating swap block device would be much better. In your case, though, cluster allocation isn=E2=80=99t supported. > > I think this is a race issue between 'shrink_folio_list' executing and > writing back asynchronously. In my test, 512 folios(THP split) were > added to swap, only about 60 folios had not been written back when > 'move_folios_to_lru' was invoked after 'shrink_folio_list'. What if > writing back faster? Maybe this will happen even 32 folios(without THP) > are in the 'folio_list' of shrink_folio_list's inputs. On a real non-rotate swap device, the race condition would occur only when contiguous 2MB swap slots are unavailable. Hi Chris, I recall you mentioned unifying the code for swap devices and swap files, o= r for non-rotating and rotating devices. I assume a swap file (not a block de= vice) would also be a practical user case? > > Best regards, > Ridong Thanks Barry