From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC9FEC4345F for ; Mon, 6 May 2024 08:20:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3EC796B0087; Mon, 6 May 2024 04:20:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 39CB56B0088; Mon, 6 May 2024 04:20:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 28B226B0089; Mon, 6 May 2024 04:20:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 0B7D36B0087 for ; Mon, 6 May 2024 04:20:38 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 864F11A0802 for ; Mon, 6 May 2024 08:20:37 +0000 (UTC) X-FDA: 82087274514.18.D37D1AC Received: from mail-vk1-f175.google.com (mail-vk1-f175.google.com [209.85.221.175]) by imf22.hostedemail.com (Postfix) with ESMTP id BF6CEC0008 for ; Mon, 6 May 2024 08:20:35 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=k+d2rX38; spf=pass (imf22.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.175 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714983635; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nUZgBQtBX5ARqva0g5FbDYV/cJoa1D6v0dTRxQmOsoQ=; b=ZLl9kbTnrlDJ7/ZTJpAZ5y39zpvU6DBmcW8aIAZf165h6PaKXonCp0u6dGkHuDwkE2xD7a 2MwiR3h5OVSW3gQ1wzj0sJRncGWLzUk0o1KmOgEv+c/bLkIH8Yc9RvmDZuzSLNIYAUzmmY 8ItBiYbV4w1AI9NEEyriMTSZ8Ef785I= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=k+d2rX38; spf=pass (imf22.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.175 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714983635; a=rsa-sha256; cv=none; b=51nj56yc2EpEmLHGGjk+ActO8CcNziexi6FiSh/rVhbfP5K4LrpLrdzzPgR14j5IMPoR0T r8jbOL0dmK0tekMBhRSjmwKg/hMrcPJrTyhHV1aLaUZHhQwDaAe9Yfxo57TTP8p5P6T1A0 rSiTBVshG8rAEHXIzZKZk97SNFJ1VAA= Received: by mail-vk1-f175.google.com with SMTP id 71dfb90a1353d-4df3ad5520aso545287e0c.0 for ; Mon, 06 May 2024 01:20:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714983635; x=1715588435; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nUZgBQtBX5ARqva0g5FbDYV/cJoa1D6v0dTRxQmOsoQ=; b=k+d2rX38cdPjML1NMPc/cL/SWyu5NkUXdnjfZR5Z6rGMAB3Z3QOuANWvFKYbh06Jti 5AK/n9P4y8Z2HxxpXH4eIMLtZQ6DPNjhlQsy5/MjIVQRyuWRGqwmDpqw7NnEb14Q8ZKI qh7MaXnN+uD/p6t3xeYELa9mDtdl8r18bZW3sNVClN+JmvCsKul1x+/GgoiLjqRbbeVc bs5FHzbWuI3r/IimEx23abzbh/fQugkxUilZBzqaJnlc3o8bvuX2pT1/+Baq4BZp5I4h jVyA54ki6yRvR+cj0pT8sE58pLJiYuJr03J8FYVMLsVgBMywFC3Tsg72zjbk/KBbdcYk OrRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714983635; x=1715588435; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nUZgBQtBX5ARqva0g5FbDYV/cJoa1D6v0dTRxQmOsoQ=; b=btkvQ1J2D/A5sScnaPlFpNtb9by4XT9WmXlfdhjKJ7bYLbY+zb40oLEOfcXUehAD8R rtM0CAC+ztHO0ux5PitEir+eSczpJj89W3f0Bnqkz3eFHZRArJhhvd8wcD24hFWD5Gd0 caB8jdqih5iccB0pJWzklWYhWrkC2RIyRzuXOggeYQuRxb8zfOtwubRQkYEJx6aqKNLC b7HFxC5kHmPaIrGo6ghXsU9lUK6sKdCQIIWoLZoTR0uBb1LwbxAJOtqrm15PKJn0Z/sl Xf5FFNNK1ItUephZkrdBHhPC0UDOYpWgN/CrS6/wstfcZ/adX2QFYoHilF+KrcIsQwfw bkAw== X-Forwarded-Encrypted: i=1; AJvYcCVFcASu6XZqgauwepCgONGPB/RCQyU7j3MqUAnOtUd9ZAakMBLaoCbTOugZlJOcr2fwCbpDMnrtm+r4U+88/aL+UOM= X-Gm-Message-State: AOJu0YwqZvLC0Aoeu6EL4G/zXUttNeJ/Xfv32S7FZCBYTD5jzHnXhdRB XDULmABN/p4f+GfGcdHwnUVUjzQ0xmjbudb0V+YbKH0d5QtkY6Tn1Kj0bLt3Wz38K2TjAQrzXS7 IvvD0/dN1+YJ4KiikD9+2sRfapqE= X-Google-Smtp-Source: AGHT+IFPxFkhjBkhh9PvsldyACZ9XMzB9HXDMHrxqgbMKOEOCqvI6vpAeNzkhiZCllYzo4CBv5K+cxLJ/nFBII7swzw= X-Received: by 2002:a05:6122:a0f:b0:4da:704f:7fc6 with SMTP id 15-20020a0561220a0f00b004da704f7fc6mr7520629vkn.15.1714983634818; Mon, 06 May 2024 01:20:34 -0700 (PDT) MIME-Version: 1.0 References: <20240503005023.174597-1-21cnbao@gmail.com> <20240503005023.174597-4-21cnbao@gmail.com> <7548e30c-d56a-4a57-ab87-86c9c8e523b1@arm.com> <0d20d8af-e480-4eb8-8606-1e486b13fd7e@redhat.com> In-Reply-To: <0d20d8af-e480-4eb8-8606-1e486b13fd7e@redhat.com> From: Barry Song <21cnbao@gmail.com> Date: Mon, 6 May 2024 20:20:23 +1200 Message-ID: Subject: Re: [PATCH v3 3/6] mm: introduce pte_move_swp_offset() helper which can move offset bidirectionally To: David Hildenbrand Cc: Ryan Roberts , akpm@linux-foundation.org, linux-mm@kvack.org, baolin.wang@linux.alibaba.com, chrisl@kernel.org, hanchuanhua@oppo.com, hannes@cmpxchg.org, hughd@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, ziy@nvidia.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: mxs6sf1qwmds69fj14gg5f6t4i7srqp1 X-Rspamd-Queue-Id: BF6CEC0008 X-Rspamd-Server: rspam10 X-Rspam-User: X-HE-Tag: 1714983635-638290 X-HE-Meta: U2FsdGVkX193njzGZncb12x8D4hGrHRQaCstpNjvjuH5/HpLSaiPWPFjbdaBcSfy6k4IECKfDkzIojvHFZkQjy1PaYtKd1FcQDWM8sghJ+w521SAgp77fG75HpKEc5MKRC1GbpkhuOLE7dHmU6Ol6EVcSxrJ6KDe7JiL/l1bx93xHp2ojQPjf1pqCfFw5A0ot0izImiXyMvot1GQyuHb4TNoo7UtzcXIt+zng1erO7HVpUrLYV9fPjlNhnOTqLH4Llh+FaSuMmFJzoYkohenpkj+SZjTnzba4gC1V4pGGrzSJjutPWkG+q+xAUhYi7NAYglxergLFOK0dwXCCWh5p17U6dzgYMWzEttxA3v55TI1ThAkjoLrAk/iqaDd8TNVOnxisB7ET+ozUaF3Y6KHY7S1pHLDB9k55iHv0UXqJOSP4aQRsMXBfxaCl9tT829Vw8iti3OtA0X/8qFSdGmHJa5ecYDZxrsn3LvMQw3/ksgdG/vPkBJH6nzOdpy0j+uMmkE/4qvP2FaZqNFe5gxCg6y+JEh8fmD9Wh9fJkJ5m3oZuhAVhnRMiIIiTOvwXx5YaPUDSKkrnli1BEF3yTr+ulu4kB+lYtKswmq8O0RdcoQoSJxprkX4+Fl/QoTmZSoI6kOB8jmxv0SAmrvubBrnbmsRmeXbR4Va73z9P/QhKsqRLvK2xj+G3PFlw2uxmsgGuGZkECtbHo4yG5NGiswO8Z5JDF7qpkgeKX2VhhmnpG3bGe213HINfAneOU7xulYIMHtzLWYAL0DnrNb1uaG74k5IF9bHCdaV3bY5z9HmU5G1eyQWXneDza+GtbZ+J9GWID14N7cuHNaCHT9wu2IGHCfNUfrOS9Kl0LTRcras02S0kjmsKfIMMxTRyxtePdKMXpsjO/ButAOjKSsR9fV/tSXrFld/VXXW5XXQHuxTo3mjM2L177Ey4GfO/QrdPGfewrsONzUcgYQ9E/L97rO iD9ZLgsQ Og4RzB6Nsl75LipxMIn15S4sPNYwRzQQfSpQl0oknfdlmfjTcB+SaQcn216Po4uWfFZ5I1VdB4k8CeecTR24Maj9gYPGuxGy+NcyFXgF8UkJdITyc1yIjtLrMRVbsl8fwqXYDHAYhkyTJjQjapfrbB6+fT64V/aBwOKZJHoAhOzlofX22OpPz8BlicG1X/KCgTQ8d9lKd+Pn0l5Gnyy4hmr+7SFwBaJeG4sBq3n2F2MJS0yEePZ7SEY6xEyhNSwyll1E2UskHOOIb8ZCqvnWcPQhmMROnyR420uMDR2EeZcmztyJ80wYyX5XZsXaO5E2XibJdIsayCZXlAuiDTFktryHn3/qDUfoZQ4nv/7FgU5naABLQgjtdB5VAlrWJe0gH7ztbwmDDllRi7jW65ZQW9NvGiDLakZmoC67nO+Lmk/gdC8e7bLtwietWqsJh76p8bfA3xuJUkjALDHr5SjV1GOU0WZobQn+RAa8RV0pf4IC0XQLsk+jYGdCQwt4WVZGmLu3PAyhM1Ao6QCxb0vvopigSpgn9ft+ll6aCGVzEBTyGFBMlaJtEipkJxpyft1Uk8XfZ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000063, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, May 6, 2024 at 8:06=E2=80=AFPM David Hildenbrand = wrote: > > On 04.05.24 01:40, Barry Song wrote: > > On Fri, May 3, 2024 at 5:41=E2=80=AFPM Ryan Roberts wrote: > >> > >> On 03/05/2024 01:50, Barry Song wrote: > >>> From: Barry Song > >>> > >>> There could arise a necessity to obtain the first pte_t from a swap > >>> pte_t located in the middle. For instance, this may occur within the > >>> context of do_swap_page(), where a page fault can potentially occur i= n > >>> any PTE of a large folio. To address this, the following patch introd= uces > >>> pte_move_swp_offset(), a function capable of bidirectional movement b= y > >>> a specified delta argument. Consequently, pte_increment_swp_offset() > >> > >> You mean pte_next_swp_offset()? > > > > yes. > > > >> > >>> will directly invoke it with delta =3D 1. > >>> > >>> Suggested-by: "Huang, Ying" > >>> Signed-off-by: Barry Song > >>> --- > >>> mm/internal.h | 25 +++++++++++++++++++++---- > >>> 1 file changed, 21 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/mm/internal.h b/mm/internal.h > >>> index c5552d35d995..cfe4aed66a5c 100644 > >>> --- a/mm/internal.h > >>> +++ b/mm/internal.h > >>> @@ -211,18 +211,21 @@ static inline int folio_pte_batch(struct folio = *folio, unsigned long addr, > >>> } > >>> > >>> /** > >>> - * pte_next_swp_offset - Increment the swap entry offset field of a = swap pte. > >>> + * pte_move_swp_offset - Move the swap entry offset field of a swap = pte > >>> + * forward or backward by delta > >>> * @pte: The initial pte state; is_swap_pte(pte) must be true and > >>> * non_swap_entry() must be false. > >>> + * @delta: The direction and the offset we are moving; forward if de= lta > >>> + * is positive; backward if delta is negative > >>> * > >>> - * Increments the swap offset, while maintaining all other fields, i= ncluding > >>> + * Moves the swap offset, while maintaining all other fields, includ= ing > >>> * swap type, and any swp pte bits. The resulting pte is returned. > >>> */ > >>> -static inline pte_t pte_next_swp_offset(pte_t pte) > >>> +static inline pte_t pte_move_swp_offset(pte_t pte, long delta) > >> > >> We have equivalent functions for pfn: > >> > >> pte_next_pfn() > >> pte_advance_pfn() > >> > >> Although the latter takes an unsigned long and only moves forward curr= ently. I > >> wonder if it makes sense to have their naming and semantics match? i.e= . change > >> pte_advance_pfn() to pte_move_pfn() and let it move backwards too. > >> > >> I guess we don't have a need for that and it adds more churn. > > > > we might have a need in the below case. > > A forks B, then A and B share large folios. B unmap/exit, then large > > folios of process > > A become single-mapped. > > Right now, while writing A's folios, we are CoWing A's large folios > > into many small > > folios. I believe we can reuse the entire large folios instead of doing= nr_pages > > CoW and page faults. > > In this case, we might want to get the first PTE from vmf->pte. > > Once we have COW reuse for large folios in place (I think you know that > I am working on that), it might make sense to "COW-reuse around", TBH, I don't know if you are working on that. please Cc me next time :-) > meaning we look if some neighboring PTEs map the same large folio and > map them writable as well. But if it's really worth it, increasing page > fault latency, is to be decided separately. On the other hand, we eliminate latency for the remaining nr_pages - 1 PTEs= . Perhaps we can discover a more cost-effective method to signify that a larg= e folio is probably singly mapped? and only call "multi-PTEs" reuse while tha= t condition is true in PF and avoid increasing latency always? > > > > > > Another case, might be > > A forks B, and we write either A or B, we might CoW an entire large > > folios instead > > CoWing nr_pages small folios. > > > > case 1 seems more useful, I might have a go after some days. then we mi= ght > > see pte_move_pfn(). > pte_move_pfn() does sound odd to me. It might not be required to > implement the optimization described above. (it's easier to simply read > another PTE, check if it maps the same large folio, and to batch from the= re) > It appears that your proposal suggests potential reusability as follows: if= we have a large folio containing 16 PTEs, you might consider reusing only 4 by examining PTEs "around" but not necessarily all 16 PTEs. please correct me if my understanding is wrong. Initially, my idea was to obtain the first PTE using pte_move_pfn() and the= n utilize folio_pte_batch() with the first PTE as arguments to ensure consist= ency in nr_pages, thus enabling complete reuse of the whole folio. > -- > Cheers, > > David / dhildenb Thanks Barry