From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E81F1C10F1A for ; Tue, 7 May 2024 10:48:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7171C6B0099; Tue, 7 May 2024 06:48:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C7196B009A; Tue, 7 May 2024 06:48:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 567A76B009B; Tue, 7 May 2024 06:48:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 39C9A6B0099 for ; Tue, 7 May 2024 06:48:18 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D7F7780DB9 for ; Tue, 7 May 2024 10:48:17 +0000 (UTC) X-FDA: 82091275434.19.8FE50AA Received: from mail-ua1-f50.google.com (mail-ua1-f50.google.com [209.85.222.50]) by imf24.hostedemail.com (Postfix) with ESMTP id 12D7D18000D for ; Tue, 7 May 2024 10:48:15 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fXVarZV8; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.50 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715078896; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8LWXi1t+lE6KI0eavFtHLY+RQwXGx7aET3nURSVgDQU=; b=PhZa9WLPk+5x3l/eaFbE+nv04OcHdoqu78rg2jrueenUvtfbDe4ngWU+RmvcOrqCE5c/hE zke88gfFqCtubpHmCqY/XMezHsBzDtNHNjzgbPfie0obvoiFeJm0uAFp4eisHyDoA+Gyhv t/hXt+1/fS0jOZrUm4UZdV1Lhaq//zA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715078896; a=rsa-sha256; cv=none; b=gk8mJDLjqjAyGl9ZEuYvfTiondPYBqma6D4Z3caJGXLo3OOb8hSv57z2Xeab9UPqNfeWhh qRL0Ymnipzqmv1L4wpSVMkk9Y+S+a6N6Z/IrvCG5LVBurnK62rmshxBAGv/rqZ8W9Sjw3h vZQxXSn+zcaHaMSrjnjcrpzOX0qikwk= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fXVarZV8; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.50 as permitted sender) smtp.mailfrom=21cnbao@gmail.com Received: by mail-ua1-f50.google.com with SMTP id a1e0cc1a2514c-7f3ff632a51so2461088241.1 for ; Tue, 07 May 2024 03:48:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715078895; x=1715683695; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=8LWXi1t+lE6KI0eavFtHLY+RQwXGx7aET3nURSVgDQU=; b=fXVarZV8KR5C+u8tl8SyHcdmAWBdvCQHGQWWcK4HzDxqoM1zlTtG2fp2JKnGFl7udE nJZrbAhjAtBSSFLN8ceZ10jwQ3MXe9w8LlpUcYBbp45gPn5YBSAvme0xn06mlPFo+Gue rxm5lVFxeo3ySh96sZUTtWS9PswdjDVTex7nhX178f7IriHkXYFuQdvHzJQa7n1ALwGc QSaqsss8A41HqJpmJmBFh5nJ16cnxd6dgLqwdR/vcD4vqYotgqSUY7bUOZgYD2guJTHK B7bHsqgTcQ5Z/v+2NEaXX3MwEsur+KT8YHrhS4T2ZW+TROAKGrkdjtayLPON7vcIKNfk bI0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715078895; x=1715683695; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8LWXi1t+lE6KI0eavFtHLY+RQwXGx7aET3nURSVgDQU=; b=d+0mtWTkiDoVwpeuXiSeV0gahZfoIEsKmez2HKwP0sczCUMF0MB8z94fZ9ZWn5vh/j 2ABKCNN9y9Y8pOenapIF1j8yDgX++y+PynU6yYMZ01NiLhEV/x9uJhTL27CIR+KVC/nU 4iIMIthlL1pwsWHmViqYuR+aN5CSSSvzLBMO6unHsh3xsVI96xz4qbeM9gxb0QsDsBUi S1bnXAffTgLW0pzrkwzUHxshrRDVKaeUvSqOrZIBdp1tPp4EPrIf5hZm2MvTOEIKZeG2 rBcZeztEIGiW2+yZmHOWuGCpzqQCUApGvirsy7kejF2VfpXjZjB4QyjuXdbrcBEmJAzZ IY+Q== X-Forwarded-Encrypted: i=1; AJvYcCWZsJWyNj93sM+FsXfdhx2uBpElT/ML2n/c3Rh3sVbnCNU1QVBkt0w4igXLWXkuykRIT6OSVts19vvPVe87ch9Sge4= X-Gm-Message-State: AOJu0YwMq8tYb4PDlk3q2KzAfuunqElSretdAjaVyUKuCSyULrSllxWn lZK0vOJjbJkUC+lsWh1+MkuFmeEqF0PaeGPUOw93zrcD0ONXsHkeYJHASzxWy9iElUk+/8tKx5M +FbVEy4Ntjj/2q0u55FIilRAOlRk= X-Google-Smtp-Source: AGHT+IGIQBgQs0/0XUDNmD2dno2WAe2gtY1HXhGqpZCxiIFvub8VoZsh8lfP2ESItgpq0YzJYUMQuR9NE9/ZfyctA4I= X-Received: by 2002:a67:f658:0:b0:47b:bda4:c30d with SMTP id u24-20020a67f658000000b0047bbda4c30dmr1839721vso.3.1715078894913; Tue, 07 May 2024 03:48:14 -0700 (PDT) MIME-Version: 1.0 References: <20240503005023.174597-1-21cnbao@gmail.com> <20240503005023.174597-7-21cnbao@gmail.com> <0b4d4d4b-91d8-4fd5-af4e-aebe9ee08b89@arm.com> <5b770715-7516-42a8-9ea0-3f61572d92af@redhat.com> <7dc2084e-d8b1-42f7-b854-38981839f82e@redhat.com> <099a2c9e-f85e-4fe7-99dd-81b61115935c@redhat.com> <41c1bd1f-b1d7-4faf-a422-1eff7425b35c@redhat.com> In-Reply-To: <41c1bd1f-b1d7-4faf-a422-1eff7425b35c@redhat.com> From: Barry Song <21cnbao@gmail.com> Date: Tue, 7 May 2024 18:48:03 +0800 Message-ID: Subject: Re: [PATCH v3 6/6] mm: swap: entirely map large folios found in swapcache To: David Hildenbrand Cc: Ryan Roberts , akpm@linux-foundation.org, linux-mm@kvack.org, baolin.wang@linux.alibaba.com, chrisl@kernel.org, hanchuanhua@oppo.com, hannes@cmpxchg.org, hughd@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, ziy@nvidia.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 12D7D18000D X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: gxsxpjga3y5psu3ttc5ng1qrc9shr99r X-HE-Tag: 1715078895-470065 X-HE-Meta: U2FsdGVkX18Wz/NXLEP8Kj4lC1LWbAgo1gWyqAp2QEs5zCydH/fhxdD/guRjA/uHxXzPcrf83LtWSPW9wzYSbOC04F+SdVJoMUPiAFioO5hVAtbXdx4lia61tvw7ybO3dS0y+I/oyCCxhOnBG+hWeZ3bXz1fJubbK98p+j3KJeCSVt7TLKCUipLNQXtaUPmShkjjprRkm26rlEOET+3x6O/rNTqFpb3kemFUqU8Hx+eagQJaTkH71hGCj+84mnIIlW9I6s5UxsedvpTDNE9T7J2/Rio4IwAfpdv7QqaXdBpQfzpWhD4nXs2LNkCibekQf/Ou6YP2qzze08qfuqJjfUNv/SFe/uwml4hP+Tu5LNV2P7KXAJGpfsmBueJPBn+iXSW71UDsNoNAQL7uStWyxUWodeK1LQjHEiYZpC1XaOr92NSBFDz8liU89yeGAF6eknjfJR7zlQsEaZmnad37RD3VXYQfWRQTThoibA6elTgpJIzM1Hxuoine77KQfp/O+QEucouSQOGvaooax213qHf/jJSi74/ZHlP55RdrYYLgCJ6Fr8yWRrGPbYALMUK0Y5jn6cyd1R18y2y8HljlaKPcpu3uoVEB1+Ob13e2VZQw+/79RO6phaN/7NwhmLSkLLpYfgQuTGplH19LDGkQpRevwv1QYeVpqTeKVOkYeyVTtkaznLoDI5jrHhLsBEb0tBM4gdTHvuejkzZ60C8iYDW5Aess4uhcV5JwL02u4Gvw3LXMscPhEdSoJBByzM+/j/xy6WQAI+/4oy4GmcIIR0nQ+ZxHvKaYVx/oZD01b1o7Dsps7vjYcE+qTdF7Dtzc6b1Mgo8eVIoTlXFGnV/H4VFIYm8HcEmywrX6WgNTeKsyPKyEzl1IOwce1ep9ZwezAo1F3bhas9AUpKkkzxvx5S6oyt4oiKk47SlfZAUpaR6tNOJ054A5ZAqUJYRI/7kCTtV7oCymlQmi+UZ2kF/ VQF5b+Of tHtib2Aodc9auLY/oatkkd4rRCAPFT3elffrKzE6dDfxatmFoQbFN7iNNpV9n4uW2zT597O10MhOaWTZ866gJRq6Odut7eVYO8EuoQFvt/CfnI0DKMIhmGp2UWsaVeUR6/iUHFlDg65uywocx5TRKy/vpCME8kb1qg5l4d5CpOg7GxQ0Ajv90LqL3bBATrw3iGvZqaRsAh2KSemu+0wiCLkD+STh9ERSS/iHs/CLBh3Bxi/IBdfD24XThBjcAcNU2gA1mz0Cm0etppWH9jF4Fk86KcyAiXAvTXKhYhss4Hiyxv9fGS4bPtYT4VPEuSdvY/isZefMuqBkI0roOROduYoDEDZ87bgpGV5hfzn73IsvQ1oP65CniUUmJyg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 7, 2024 at 6:39=E2=80=AFPM David Hildenbrand = wrote: > > On 07.05.24 11:24, Barry Song wrote: > > On Tue, May 7, 2024 at 8:59=E2=80=AFPM David Hildenbrand wrote: > >> > >>>> Let's assume a single subpage of a large folio is no longer mapped. > >>>> Then, we'd have: > >>>> > >>>> nr_pages =3D=3D folio_nr_pages(folio) - 1. > >>>> > >>>> You could simply map+reuse most of the folio without COWing. > >>> > >>> yes. This is good but the pte which is no longer mapped could be > >>> anyone within the nr_pages PTEs. so it could be quite tricky for > >>> set_ptes. > >> > >> The swap batching logic should take care of that, otherwise it would b= e > >> buggy. > > > > When you mention "it would be buggy," are you also referring to the cur= rent > > fallback approach? or only refer to the future patch which might be abl= e > > to map/reuse "nr_pages - 1" pages? > > swap_pte_batch() should not skip any holes. So consequently, set_ptes() > should do the right thing. (regarding your comment "could be quite ricky > for set_ptes") > > So I think that should be working as expected. maybe not. take a look at my current code, I am goto check_folio with nr_pages =3D 1 if swap_pte_batch(folio_ptep, nr, folio_pte) !=3D folio_nr_pages(folio). + nr_pages =3D 1; + ... + if (folio_test_large(folio) && folio_test_swapcache(folio)) { + int nr =3D folio_nr_pages(folio); + ... + if (!pte_same(folio_pte, pte_move_swp_offset(vmf->orig_pte, -idx)) || + swap_pte_batch(folio_ptep, nr, folio_pte) !=3D nr) + goto check_folio; /* read here, i am falling back nr_pages =3D 1 */ + + + ... + nr_pages =3D nr; The fallback(=3D1) works. but it seems you are proposing set nr_pages =3D swap_pte_batch(folio_ptep, nr, folio_pte) if (swap_pte_batch(folio_ptep, nr, folio_pte) > 1 && swap_pte_batch(folio_ptep, nr, folio_pte) < nr_pages) ? > > > > > The current patch falls back to setting nr_pages =3D 1 without mapping = or > > reusing nr_pages - 1. I feel your concern doesn't refer to this fallbac= k? > > > >> > >>> > >>>> > >>>> Once we support COW reuse of PTE-mapped THP we'd do the same. Here, = it's > >>>> just easy to detect that the folio is exclusive (folio_ref_count(fol= io) > >>>> =3D=3D 1 before mapping anything). > >>>> > >>>> If you really want to mimic what do_wp_page() currently does, you sh= ould > >>>> have: > >>>> > >>>> exclusive || (folio_ref_count(folio) =3D=3D 1 && !folio_test_large(f= olio)) > >>> > >>> I actually dislike the part that do_wp_page() handles the reuse of a = large > >>> folio which is entirely mapped. For example, A forks B, B exit, we wr= ite > >>> A's large folio, we get nr_pages CoW of small folios. Ideally, we can > >>> reuse the whole folios for writing. > >> > >> Yes, see the link I shared to what I am working on. There isn't really= a > >> question if what we do right now needs to be improved and all these > >> scenarios are pretty obvious clear. > > > > Great! I plan to dedicate more time to reviewing your work. > > Nice! And there will be a lot of follow-up optimization work I won't > tackle immediately regarding COW (COW-reuse around, maybe sometimes we > want to COW bigger chunks). > > I still have making PageAnonExclusive a per-folio flag on my TODO list, > that will help the COW-reuse around case a lot. > > > > >> > >>> > >>>> > >>>> Personally, I think we should keep it simple here and use: > >>>> > >>>> exclusive || folio_ref_count(folio) =3D=3D 1 > >>> > >>> I feel this is still better than > >>> exclusive || (folio_ref_count(folio) =3D=3D 1 && !folio_test_large(fo= lio)) > >>> as we reuse the whole large folio. the do_wp_page() behaviour > >>> doesn't have this. > >> > >> Yes, but there is the comment "Same logic as in do_wp_page();". We > >> already ran into issues having different COW reuse logic all over the > >> place. For this case here, I don't care if we leave it as > >> > >> "exclusive || folio_ref_count(folio) =3D=3D 1" > > > > I'm perfectly fine with using the code for this patchset and maybe > > looking for other > > opportunities for potential optimization as an incremental patchset, > > for example, > > reusing the remaining PTEs as suggested by you - "simply map+reuse mos= t of > > the folio without COWing." > > > >> > >> But let's not try inventing new stuff here. > > > > It seems you ignored and snipped my "16 + 14" pages and "15" pages > > example though. but once we support "simply map+reuse most of the > > folio without COWing", the "16+14" problem can be resolved, instead, > > we consume 16 pages. > > > Oh, sorry for skipping that, for me it was rather clear: the partially > mapped folios will be on the deferred split list and the excess memory > can (and will be) reclaimed when there is need. So this temporary memory > consumption is usually not a problem in practice. But yes, something to > optimize (just like COW reuse in general). > > -- > Cheers, > > David / dhildenb >