From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3212D66BA1 for ; Wed, 27 Nov 2024 00:17:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 065FC6B0083; Tue, 26 Nov 2024 19:17:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 015CF6B0085; Tue, 26 Nov 2024 19:17:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E1F676B0088; Tue, 26 Nov 2024 19:17:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C5AA46B0083 for ; Tue, 26 Nov 2024 19:17:18 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 7EC478133A for ; Wed, 27 Nov 2024 00:17:18 +0000 (UTC) X-FDA: 82829960388.23.DC47E01 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf24.hostedemail.com (Postfix) with ESMTP id 0172318001B for ; Wed, 27 Nov 2024 00:17:15 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Vc6AruXl; spf=pass (imf24.hostedemail.com: domain of chrisl@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732666631; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rFTkDXNUWOJTICqI/A8Nz6U6bHmDpmxNnO0bDgKvzSg=; b=Ugv3oy2AMxqLoVvMIrh0zgHlgRUFxH/SP1IIDpoCoatvqc067j9AObxIZzKcVrhjCA9AP/ 17tlR0ZC/UBVNHPNDMD1K4mP8idyfyIt9i5xiMQmre53u+qzNmgTscV1fztepA6DvuOud2 921TID1S8o+cTE3BsE4XZxdUYbelD3k= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732666631; a=rsa-sha256; cv=none; b=y5MDhRduVWkvSurKTLH9cp1whkY/ovC+Y4xZHnzjpGhgZpyIsM5+gyXNh4wrQfMJAIBACL /bQFIavgYP8O0N1NExO8oorpCwB9kNvk7bqEBESqIg4vlSEvaW/wd8BcyRKXIIhyInEE7L bbAjlF2HOAayhMZSx9+p58LRHJzG0/4= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Vc6AruXl; spf=pass (imf24.hostedemail.com: domain of chrisl@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id EDED2A4180E for ; Wed, 27 Nov 2024 00:15:22 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 38441C4CEDA for ; Wed, 27 Nov 2024 00:17:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1732666635; bh=sBwY6OuAGJlWYY/aLMf6AmX2I9z8yQu0PCqyCoQw4II=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=Vc6AruXlZK5jhklUDKrsrOscsvZlUHI2ugvt2S8EB98Y3znXQa4K56ooZrA/OrBPa wb6Tc82KGpOmChap6GU80tQvSRHBZTT1LJ3hYUoYRUSBFd6jSVOjCMHaQaK0fu4pm0 OxsQjFcTJdylI3JvCd3cExJZCPGv8R6lY4GPrcrdedOIoKJleFoRzAiUrUEpxx4GR8 czycCzuOUbwZjg3vrM8+V403lOquhqQ7ZQbOm6kHRF3QKRVN1k2XeVMFpJDK0OWZeI Tk78++08tJHJRtWqYcgbT2OPohkQnzkLHkbfdzdEwvCaKCKNLnyETF56C2BtEheMp4 NLCSXhDf53iVQ== Received: by mail-yw1-f178.google.com with SMTP id 00721157ae682-6ef0c64d75cso29502267b3.2 for ; Tue, 26 Nov 2024 16:17:15 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCWsoWz2oPeAn9KQU+ZE9ChtywtM0SERdJuXUoIugG6oFvah11YolPQ6ukC12BGPgaD4enZ40AnaLg==@kvack.org X-Gm-Message-State: AOJu0Yz7IP99SW8HSuLYIClDTkf3onvNVrpZmkJX/X7JoTl8MvZmQG+v 2d88T0z9BiMyW5cHCXcO4H02yrgc8R4rhnDmx3IwCSsJVSjZFfzsezsJfmYKa+n/lbwcx4B4vQQ RZgvCX/m7tH+tYU3bcxm4FFxR1cesENzG6LBNFg== X-Google-Smtp-Source: AGHT+IGjB9saCzLKHjgxnnKPjGwq5oWgpFsEfJQQXxRtcDAABh36q/sUVARVGrb3gCXafCOgLYlSa8FALTgkvFhqZLE= X-Received: by 2002:a05:690c:74c9:b0:6ec:b108:e5ce with SMTP id 00721157ae682-6ef37279c69mr18243317b3.28.1732666634411; Tue, 26 Nov 2024 16:17:14 -0800 (PST) MIME-Version: 1.0 References: <20241116091658.1983491-1-chenridong@huaweicloud.com> <20241116091658.1983491-2-chenridong@huaweicloud.com> <03c18a7b-24fa-4ee6-8682-63f1a81363e5@huawei.com> In-Reply-To: From: Chris Li Date: Tue, 26 Nov 2024 16:17:03 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH v2 1/1] mm/vmscan: move the written-back folios to the tail of LRU after shrinking To: Barry Song <21cnbao@gmail.com> Cc: chenridong , Matthew Wilcox , Chen Ridong , akpm@linux-foundation.org, mhocko@suse.com, hannes@cmpxchg.org, yosryahmed@google.com, yuzhao@google.com, david@redhat.com, ryan.roberts@arm.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, wangweiyang2@huawei.com, xieym_ict@hotmail.com, Kairui Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 0172318001B X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: 8znhhuqcyyxxfpisesbkqgciimhh5i6x X-HE-Tag: 1732666635-617376 X-HE-Meta: U2FsdGVkX1+yrQ+XYsX+lplxIhjf9GQ8a2gTcI19XTi/KLrvTnLsmbMBVZTtwY4CFhmMxaRsQdsCeJ9sNvtB6jEh5Oc8Q9WuWm2secvZc/orsEFG5eSQnYpJHFWQR3YiDnPKwqEVbap/N4IFn1290/FY2/a22ss5doB2U+rwyRkDSud72YrgESelEtTMrYnzu0NlvcZUD3npnudSN63Qb9q6FFKw4S78sbvfbke1aIZu+pgbdPOkNB5XNOiy50hRzdtVJ5dwL09nvkQJDNImO7JyvUB2VoJLLlshZCLmNsJ1PuN5Es80FbFpwDwepavX6uoQzBJYOvHX8dIoj8DRqQB7f1eCQ6zUB8lAIln4IGahb+xpKHaRreJsQnRhlkIjkeAMM+b4Bo/icK8hWGVmuLqKYmWHIRg4itVL7u7LrPeXDZ2q8CdsPO3/4Sq8/8AyjwfG59LkR935J6qkscVnnVgjU56XopbqbOtLTVijYEjJ/8fRF+/2hw8oiEMIVT2wXV/BaIbf5rKg9xNmEHcaOt1ZYcCdjn1mhZ1khjFdQgW6R2u4G5s5Oi09J6pUMuipjiI5TBFHSxj82gZlJZK0CicBrbfKiQAeSFo1wuiI/GdR7+Hpspa2DDsoFakpjQA9TIKhJfuQePkRCqsQFPpw8/BqYqVHPvblguuhRZ9AG1g3iyE/Q4PIx1c59w/PEUPG2C1JqFSUBHWQHVjs0ytRxcWQpBs6ii3eFsyZNogqDeUzSV9CQ/Zh1eZT1WfOgIaP006fdDX5J5kIH9k0AsdUeNQz5hocc6Ul7chMpu4m8m4a7pG0+bQw1mw7jb3fhixrAWY5AiJWK5zLNAr9RHN1SlKsaPxsFzUxke/bAy8+ErCOPRvp/g9TAQUb7RhPZ6RfUJSc75urKByPWbtFUwN3WKvEJKVcpmrvkavXL4JBr1yk+ePMAINLH1UHrk1IK7D3xH7KlRni4MydKy8Sb4d Fcu/l2cs ROG3cG98xhidDJVdEKvMqNguwLr+e5x45wVrWB1aVExOh5Zk4PHDoKAvfRkH8pBvmvmW4MZe/lwpEFGILomdtIZljSzOzViqsykyD51fsASX9cyg7wBukjghsYHRsgxQVO3egdHhasVBznxB1URFnNF9Q2kqyH/K5meYjth4I+kK/CEOMpCr0qif0mcDBktmGSL1Y4IWCqfdBC+SWV/lptsxAljVE1qVM8p/xCT8KRna9FcMzNHb+6wyTNHHsHrSMcX1o3ii7yfRZeesphjI6PLFfhagiSXniK5O3WKHnHVpQU3xI1TBDN8uN/MwQOY8eMoUy/xYmb7XzJ8S+nHps8KAGfNQH5szbf60N/OvKVck0ZRz6MQPAB94/C3hH9CwjfOYOU98xC/xji+5HKhTcxM7jrUoIdvevqCMS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 18, 2024 at 1:56=E2=80=AFAM Barry Song <21cnbao@gmail.com> wrot= e: > > On Mon, Nov 18, 2024 at 10:41=E2=80=AFPM chenridong wrote: > > > > > > > > On 2024/11/18 12:14, Barry Song wrote: > > > On Mon, Nov 18, 2024 at 5:03=E2=80=AFPM Matthew Wilcox wrote: > > >> > > >> On Sat, Nov 16, 2024 at 09:16:58AM +0000, Chen Ridong wrote: > > >>> 2. In shrink_page_list function, if folioN is THP(2M), it may be sp= lited > > >>> and added to swap cache folio by folio. After adding to swap cac= he, > > >>> it will submit io to writeback folio to swap, which is asynchron= ous. > > >>> When shrink_page_list is finished, the isolated folios list will= be > > >>> moved back to the head of inactive lru. The inactive lru may jus= t look > > >>> like this, with 512 filioes have been move to the head of inacti= ve lru. > > >> > > >> I was hoping that we'd be able to stop splitting the folio when addi= ng > > >> to the swap cache. Ideally. we'd add the whole 2MB and write it bac= k > > >> as a single unit. > > > > > > This is already the case: adding to the swapcache doesn=E2=80=99t req= uire splitting > > > THPs, but failing to allocate 2MB of contiguous swap slots will. > > > > > >> > > >> This is going to become much more important with memdescs. We'd hav= e to > > >> allocate 512 struct folios to do this, which would be about 10 4kB p= ages, > > >> and if we're trying to swap out memory, we're probably low on memory= . > > >> > > >> So I don't like this solution you have at all because it doesn't hel= p us > > >> get to the solution we're going to need in about a year's time. > > >> > > > > > > Ridong might need to clarify why this splitting is occurring. If it= =E2=80=99s due to the > > > failure to allocate swap slots, we still need a solution to address i= t. > > > > > > Thanks > > > Barry > > > > shrink_folio_list > > add_to_swap > > folio_alloc_swap > > get_swap_pages > > scan_swap_map_slots > > /* > > * Swapfile is not block device or not using clusters so unable > > * to allocate large entries. > > */ > > if (!(si->flags & SWP_BLKDEV) || !si->cluster_info) > > return 0; > > > > In my test, I use a file as swap, which is not 'SWP_BLKDEV'. So it > > failed to get get_swap_pages. > > Alright, a proper non-rotating swap block device would be much > better. In your case, though, cluster allocation isn=E2=80=99t supported. Ah yes. The later part of the swap allocation series removes the non cluster allocation code path. It is not merged to mm-unstable yet. So even a swapfile not block device will get the cluster allocator. > > > > > I think this is a race issue between 'shrink_folio_list' executing and > > writing back asynchronously. In my test, 512 folios(THP split) were > > added to swap, only about 60 folios had not been written back when > > 'move_folios_to_lru' was invoked after 'shrink_folio_list'. What if > > writing back faster? Maybe this will happen even 32 folios(without THP) > > are in the 'folio_list' of shrink_folio_list's inputs. > > On a real non-rotate swap device, the race condition would occur only whe= n > contiguous 2MB swap slots are unavailable. > > Hi Chris, > I recall you mentioned unifying the code for swap devices and swap files,= or > for non-rotating and rotating devices. I assume a swap file (not a block = device) > would also be a practical user case? I assume you mean non-SSD vs SSD device. In this follow up series of the swap allocator from Kairui, the old non cluster allocator gets removed, the cluster allocator will be used all the time. https://lore.kernel.org/linux-mm/20241022192451.38138-4-ryncsn@gmail.com/ Chris