From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8BDAC25B10 for ; Mon, 6 May 2024 22:58:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C7D216B007B; Mon, 6 May 2024 18:58:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C2BE86B0082; Mon, 6 May 2024 18:58:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF3BF6B0083; Mon, 6 May 2024 18:58:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 92DDE6B007B for ; Mon, 6 May 2024 18:58:41 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D653EA0BC2 for ; Mon, 6 May 2024 22:58:40 +0000 (UTC) X-FDA: 82089487200.04.5203E0F Received: from mail-vs1-f45.google.com (mail-vs1-f45.google.com [209.85.217.45]) by imf18.hostedemail.com (Postfix) with ESMTP id 20B601C0003 for ; Mon, 6 May 2024 22:58:37 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=nVza6jrk; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715036318; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JEzMrpo9xpl1ECD3kombTAPF1tytR8JwyhRQEB9Ne7Q=; b=1XyTt+RMPKAsdiWHoqdaLlqAWFMvspoyWgBAfuiOa66oSz3Jlo//AVd6l78jcTav89B5w8 LVxngJeW4q8jllZ4ImF557tPkS0tF7dwSWOeVy9wLLQPAj5IJ8LcsklTRlhtnMsTbyOf3n z6OJjCflHkaX1GwSa/NxQQ0FCbUmPr0= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=nVza6jrk; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715036318; a=rsa-sha256; cv=none; b=wUFf/Asil3lakIP6+HEB1cZzCdrS2mrara3bA6JNZ0G3wvhp+8OPMzqK4RLocTB3i1agg9 TuGMvIf0mUq+O3T8rU5GmDyiACVQbAF53qcrc24ELwlxltw/6wHTX7Cg/aGkLexBJJzAOl ofchdah6VjcRqiBq5RHcBH5CUicL7ac= Received: by mail-vs1-f45.google.com with SMTP id ada2fe7eead31-47ef11b1a31so501527137.0 for ; Mon, 06 May 2024 15:58:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715036317; x=1715641117; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JEzMrpo9xpl1ECD3kombTAPF1tytR8JwyhRQEB9Ne7Q=; b=nVza6jrk7xVD4dM3/9ZTj5olABV1PIOe9YJHVe1+Csjq5P+T9JtS4PsKP5YFDhUTg9 tDCgoxembqsn0TnuDU1EnjEbx9/JtkfdcDqXi6sedZCgIK4XwoKkW51O3VUnOiPoFM/b rPaIGufb0mZFs138+gZHmf6CjRnCnIRc1EBZfYjQEt46CsGoH1xH465OYjVs5aMdXNp+ V0bLR7iXDCahNCtTjhEMumDzVwwswUD1eGSz84/GydjKN93EoHFNuI7A5qFWGbLoD1BF 9stu97eKOdYjpUp2xjVhgWe1nbLwGoGLlk2KAV6jdin4K9bZDYm3m1ksq47rGS9BHTqL u6UA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715036317; x=1715641117; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JEzMrpo9xpl1ECD3kombTAPF1tytR8JwyhRQEB9Ne7Q=; b=YUw/dB9nUUTM7MUM3r+/tOY3xYkGcPoaGazSnHxbYKeUDSRJ4U+b4DCcY5CP1ziY3M OIe48tuFJR911msaP42pxLqx8165MvghsdKoEHb/8YWzBRrpE2ntTrMYUMHMbZZm14Qp yVp0/l/Hw+ja8UfeciDz/GZiJSCP1z5jUs7Oq70o12qIBfdBUtsNJdp6rDsBFoOzeLwS dWyDgQeLVgKNvCGIqjWpB0blQWqNpoDk15GkOxzaDSaOZmROMYhzhI9u7gxwMRVP6MA4 7GfDunxuNutBxUj8cUy8qMlCvkkag/jmnKqKxMX3Q8EV14OstxHpEqEyJne5hLT2tMqg MUFQ== X-Forwarded-Encrypted: i=1; AJvYcCWEHhpF/Ml7XhCDXsfIAIwC7OCPbN+wPrLGsX/fuThQAP/uab3KrEeUIWy5fwgHAd1lsZndiY4dsGEkXFiQ1tnzEck= X-Gm-Message-State: AOJu0YyVUDfCJAJEaVcHDwNwszrevhluHqbnPb4OaAQxBpcyjYhJtEbr VJEFXKBoYNSv/yKbOiBi51xxzihZW196JZ8HT4X8mxSWheM2edIdDIMsm2VCamrIYXBwNM1cZD6 DcnmR/TKYw651IEbvE6kTQ/Xg9lc= X-Google-Smtp-Source: AGHT+IGolnkqh/0cQmoY1ihS/u6BKEbTqfLlttDItVi1BvrDJ55vkA4yD+lxtzH6jgs1GPeggzymYNaU6QvNglRJQeI= X-Received: by 2002:a67:f4d5:0:b0:47c:14eb:5fdd with SMTP id s21-20020a67f4d5000000b0047c14eb5fddmr12813837vsn.29.1715036316957; Mon, 06 May 2024 15:58:36 -0700 (PDT) MIME-Version: 1.0 References: <20240503005023.174597-1-21cnbao@gmail.com> <20240503005023.174597-7-21cnbao@gmail.com> <0b4d4d4b-91d8-4fd5-af4e-aebe9ee08b89@arm.com> <5b770715-7516-42a8-9ea0-3f61572d92af@redhat.com> In-Reply-To: <5b770715-7516-42a8-9ea0-3f61572d92af@redhat.com> From: Barry Song <21cnbao@gmail.com> Date: Tue, 7 May 2024 10:58:25 +1200 Message-ID: Subject: Re: [PATCH v3 6/6] mm: swap: entirely map large folios found in swapcache To: David Hildenbrand Cc: Ryan Roberts , akpm@linux-foundation.org, linux-mm@kvack.org, baolin.wang@linux.alibaba.com, chrisl@kernel.org, hanchuanhua@oppo.com, hannes@cmpxchg.org, hughd@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, ziy@nvidia.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam01 X-Stat-Signature: 6w8fae7e7ywduwtj9if4kmq4tmqotnof X-Rspam-User: X-Rspamd-Queue-Id: 20B601C0003 X-HE-Tag: 1715036317-125401 X-HE-Meta: U2FsdGVkX1/3zNxn9rlwHp3k1bw/6cedkW4f5YBphHyLiXB92S0PRie/zAcYZdBh6FPS6YwZu+A2vcXwpITY5WJCwJCxHjBeTETiOkW1tgiDwASen64+ju04so3YgLmx+gv/J2LX68HshL55c4ZMpHaDxye/SI7fZ9dCKpUKQSVIRSaznQ90fjhD4fKIyvP+plFEf+Mz/qBq7NMDRdkwKLq4SvQ7gFAY2ieGNmYk1yc3mYE5/gl7QHy3tQZreBsc+zgxEbjPZo+J3V8kpnRnlxRWvJO6NVwrQhEBxBX0VXMTaOuEi2ORp6KagXUrvhlZeTtUzB6LBVdiySsSxmIcidV4Z7ad+I84EAsy1w3RkwBE7tKIeMOuRwhaUQ2u99TJ5wMQs2wi70SHtubiLLRZdWtW5TbH3x0IvVrSg0LERxBT9hqxkxbpTkeUptooqde5mUFVxRiIExSyvcCZU3dKQBic9lK0aWqIpkhiahfmF/Y+mACmuf7mJoW6tOJo91DERPcflqjvGquBmAKoQAJ8ARsM4yXLhzhOm+pFB57MbQPpvdwbA7pbKgZ8gkCYKb640i35+A9OV9c01ndkVOo8Peqe6rSMGae/9kqQwvfadKcbNxy22wB2D0L1HuLa1jBzGzBL/fa0nkKfKsFRERVl7IKG5WLHAcUc3wzMtAdzuyjh8yeSiQ9vCfZWdKXhxefhUP4nnq4j5oIKrjiTCB2xJbo1y1VdhcCnYHNCOgDM0qbtBMC8VITuki9/dnpj1YfDmmXimTpktonFMWm3wqyy3vOf21ZZekOgQ3hkeXvshNb8XQMj0syfwns1BFm69RJTFYP8JFQ2T1ILREzvlXf/Uv6TXaEMVFLz/mh3YsF2F9h7zKrmoioaY/i0OryWnyVNkvarnkdTtye2VSsMFFVcv+gEnog2mRTDJTHMIkl905CAbdF9ZBsmwY/PIlpI8OWRtxsrHqCkwEDiNTwlbT6 aCOtwbpF Ya4gLh7nvEZd9n2tHfph54iTBmmEnzM0QIXbDXv04fDCZDEoxZMURiAhOYXa+5rGmC1X7SblZKqeaQP5fcqeX0Ilx6plZdYVxNYaMSMb9ENfOhSzPenzzrjajos/T3kf79remb1Fpe5IYfCcbVhKqgbU38v5rbS444s3rLSpmC6PDn1P6BAJ9v1dKT9iONcfDFYL0uJqj2b1+zdpR0psuQJ0PjCELefmkGhZ+wqM3pjGPnxpk7Svvp7JAU3o4E7/01DJYXmc+8FNR6eO7ls4VJMxsQLULbGzfXy4sFopELjmwikOrvjplGedQMrwLltToO+foCBpPp16ibhmX6eHSb+9R9JjuXden4QJrBjpoRepTlNqXYydPIg+q4hHWIGfo4cOEIZ452pZYImSOAZK0FTdxC9IUTbDXC5f2KnYI1m+PIhBnkdUPwYiMp6IL9I+KrnLe3fKnm6cAxRKMAGPF2bEPC9SLt1D6p1udXNa2O1wPMWcj5drf0jUINYGwsBai46NKc0T7Ne88gk0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 7, 2024 at 1:16=E2=80=AFAM David Hildenbrand = wrote: > > On 06.05.24 14:58, Barry Song wrote: > > On Tue, May 7, 2024 at 12:38=E2=80=AFAM Barry Song <21cnbao@gmail.com> = wrote: > >> > >> On Tue, May 7, 2024 at 12:07=E2=80=AFAM David Hildenbrand wrote: > >>> > >>> On 04.05.24 01:23, Barry Song wrote: > >>>> On Fri, May 3, 2024 at 6:50=E2=80=AFPM Ryan Roberts wrote: > >>>>> > >>>>> On 03/05/2024 01:50, Barry Song wrote: > >>>>>> From: Chuanhua Han > >>>>>> > >>>>>> When a large folio is found in the swapcache, the current implemen= tation > >>>>>> requires calling do_swap_page() nr_pages times, resulting in nr_pa= ges > >>>>>> page faults. This patch opts to map the entire large folio at once= to > >>>>>> minimize page faults. Additionally, redundant checks and early exi= ts > >>>>>> for ARM64 MTE restoring are removed. > >>>>>> > >>>>>> Signed-off-by: Chuanhua Han > >>>>>> Co-developed-by: Barry Song > >>>>>> Signed-off-by: Barry Song > >>>>> > >>>>> With the suggested changes below: > >>>>> > >>>>> Reviewed-by: Ryan Roberts > >>>>> > >>>>>> --- > >>>>>> mm/memory.c | 60 ++++++++++++++++++++++++++++++++++++++++++----= ------- > >>>>>> 1 file changed, 48 insertions(+), 12 deletions(-) > >>>>>> > >>>>>> diff --git a/mm/memory.c b/mm/memory.c > >>>>>> index 22e7c33cc747..940fdbe69fa1 100644 > >>>>>> --- a/mm/memory.c > >>>>>> +++ b/mm/memory.c > >>>>>> @@ -3968,6 +3968,10 @@ vm_fault_t do_swap_page(struct vm_fault *vm= f) > >>>>>> pte_t pte; > >>>>>> vm_fault_t ret =3D 0; > >>>>>> void *shadow =3D NULL; > >>>>>> + int nr_pages =3D 1; > >>>>>> + unsigned long page_idx =3D 0; > >>>>>> + unsigned long address =3D vmf->address; > >>>>>> + pte_t *ptep; > >>>>> > >>>>> nit: Personally I'd prefer all these to get initialised just before= the "if > >>>>> (folio_test_large()..." block below. That way it is clear they are = fresh (incase > >>>>> any logic between here and there makes an adjustment) and its clear= that they > >>>>> are only to be used after that block (the compiler will warn if usi= ng an > >>>>> uninitialized value). > >>>> > >>>> right. I agree this will make the code more readable. > >>>> > >>>>> > >>>>>> > >>>>>> if (!pte_unmap_same(vmf)) > >>>>>> goto out; > >>>>>> @@ -4166,6 +4170,36 @@ vm_fault_t do_swap_page(struct vm_fault *vm= f) > >>>>>> goto out_nomap; > >>>>>> } > >>>>>> > >>>>>> + ptep =3D vmf->pte; > >>>>>> + if (folio_test_large(folio) && folio_test_swapcache(folio)) = { > >>>>>> + int nr =3D folio_nr_pages(folio); > >>>>>> + unsigned long idx =3D folio_page_idx(folio, page); > >>>>>> + unsigned long folio_start =3D vmf->address - idx * P= AGE_SIZE; > >>>>>> + unsigned long folio_end =3D folio_start + nr * PAGE_= SIZE; > >>>>>> + pte_t *folio_ptep; > >>>>>> + pte_t folio_pte; > >>>>>> + > >>>>>> + if (unlikely(folio_start < max(vmf->address & PMD_MA= SK, vma->vm_start))) > >>>>>> + goto check_folio; > >>>>>> + if (unlikely(folio_end > pmd_addr_end(vmf->address, = vma->vm_end))) > >>>>>> + goto check_folio; > >>>>>> + > >>>>>> + folio_ptep =3D vmf->pte - idx; > >>>>>> + folio_pte =3D ptep_get(folio_ptep); > >>>>>> + if (!pte_same(folio_pte, pte_move_swp_offset(vmf->or= ig_pte, -idx)) || > >>>>>> + swap_pte_batch(folio_ptep, nr, folio_pte) !=3D n= r) > >>>>>> + goto check_folio; > >>>>>> + > >>>>>> + page_idx =3D idx; > >>>>>> + address =3D folio_start; > >>>>>> + ptep =3D folio_ptep; > >>>>>> + nr_pages =3D nr; > >>>>>> + entry =3D folio->swap; > >>>>>> + page =3D &folio->page; > >>>>>> + } > >>>>>> + > >>>>>> +check_folio: > >>>>> > >>>>> Is this still the correct label name, given the checks are now abov= e the new > >>>>> block? Perhaps "one_page" or something like that? > >>>> > >>>> not quite sure about this, as the code after one_page can be multipl= e_pages. > >>>> On the other hand, it seems we are really checking folio after "chec= k_folio" > >>>> :-) > >>>> > >>>> > >>>> BUG_ON(!folio_test_anon(folio) && folio_test_mappedtodisk(folio)); > >>>> BUG_ON(folio_test_anon(folio) && PageAnonExclusive(page)); > >>>> > >>>> /* > >>>> * Check under PT lock (to protect against concurrent fork() sharing > >>>> * the swap entry concurrently) for certainly exclusive pages. > >>>> */ > >>>> if (!folio_test_ksm(folio)) { > >>>> > >>>> > >>>>> > >>>>>> + > >>>>>> /* > >>>>>> * PG_anon_exclusive reuses PG_mappedtodisk for anon pages= . A swap pte > >>>>>> * must never point at an anonymous page in the swapcache = that is > >>>>>> @@ -4225,12 +4259,13 @@ vm_fault_t do_swap_page(struct vm_fault *v= mf) > >>>>>> * We're already holding a reference on the page but haven= 't mapped it > >>>>>> * yet. > >>>>>> */ > >>>>>> - swap_free_nr(entry, 1); > >>>>>> + swap_free_nr(entry, nr_pages); > >>>>>> if (should_try_to_free_swap(folio, vma, vmf->flags)) > >>>>>> folio_free_swap(folio); > >>>>>> > >>>>>> - inc_mm_counter(vma->vm_mm, MM_ANONPAGES); > >>>>>> - dec_mm_counter(vma->vm_mm, MM_SWAPENTS); > >>>>>> + folio_ref_add(folio, nr_pages - 1); > >>>>>> + add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); > >>>>>> + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -nr_pages); > >>>>>> pte =3D mk_pte(page, vma->vm_page_prot); > >>>>>> > >>>>>> /* > >>>>>> @@ -4240,34 +4275,35 @@ vm_fault_t do_swap_page(struct vm_fault *v= mf) > >>>>>> * exclusivity. > >>>>>> */ > >>>>>> if (!folio_test_ksm(folio) && > >>>>>> - (exclusive || folio_ref_count(folio) =3D=3D 1)) { > >>>>>> + (exclusive || (folio_ref_count(folio) =3D=3D nr_pages && > >>>>>> + folio_nr_pages(folio) =3D=3D nr_pages))) = { > >>>>> > >>>>> I think in practice there is no change here? If nr_pages > 1 then t= he folio is > >>>>> in the swapcache, so there is an extra ref on it? I agree with the = change for > >>>>> robustness sake. Just checking my understanding. > >>>> > >>>> This is the code showing we are reusing/(mkwrite) a folio either > >>>> 1. we meet a small folio and we are the only one hitting the small f= olio > >>>> 2. we meet a large folio and we are the only one hitting the large f= olio > >>>> > >>>> any corner cases besides the above two seems difficult. for example, > >>>> > >>>> while we hit a large folio in swapcache but if we can't entirely map= it > >>>> (nr_pages=3D=3D1) due to partial unmap, we will have folio_ref_count= (folio) > >>>> =3D=3D nr_pages =3D=3D 1 > >>> > >>> No, there would be other references from the swapcache and > >>> folio_ref_count(folio) > 1. See my other reply. > >> > >> right. can be clearer by: > > > > Wait, do we still need folio_nr_pages(folio) =3D=3D nr_pages even if we= use > > folio_ref_count(folio) =3D=3D 1 and moving folio_ref_add(folio, nr_page= s - 1)? > > I don't think that we will "need" it. > > > > > one case is that we have a large folio with 16 PTEs, and we unmap > > 15 swap PTE entries, thus we have only one swap entry left. Then > > we hit the large folio in swapcache. but we have only one PTE thus we = will > > map only one PTE. lacking folio_nr_pages(folio) =3D=3D nr_pages, we reu= se the > > large folio for one PTE. with it, do_wp_page() will migrate the large > > folio to a small one? > > We will set PAE bit and do_wp_page() will unconditionally reuse that page= . > > Note that this is the same as if we had pte_swp_exclusive() set and > would have run into "exclusive=3Dtrue" here. > > If we'd want a similar "optimization" as we have in > wp_can_reuse_anon_folio(), you'd want something like > > exclusive || (folio_ref_count(folio) =3D=3D 1 && > (!folio_test_large(folio) || nr_pages > 1) I feel like A : !folio_test_large(folio) || nr_pages > 1 equals B: folio_nr_pages(folio) =3D=3D nr_pages if folio is small, folio_test_large(folio) is false, both A and B will be = true; if folio is large, and we map the whole large folio, A will be true because of nr_pages > 1; B is also true; if folio is large, and we map single one PTE, A will be false; B is also false, because nr_pages =3D=3D 1 but folio_nr_pages(folio) > 1; right? However, I agree that delving into this complexity might not be necessary at the moment. > > ... but I am not sure if that is really worth the complexity here. > > > > > 1AM, tired and sleepy. not quite sure I am correct. > > I look forward to seeing your reply tomorrow morning :-) > > Heh, no need to dream about this ;) > > -- > Cheers, > > David / dhildenb Thanks Barry