From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D5B8C25B75 for ; Mon, 3 Jun 2024 05:29:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B39E86B0092; Mon, 3 Jun 2024 01:29:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AE9D96B0093; Mon, 3 Jun 2024 01:29:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B15B6B0095; Mon, 3 Jun 2024 01:29:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7E3476B0092 for ; Mon, 3 Jun 2024 01:29:03 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id DEC02C071A for ; Mon, 3 Jun 2024 05:29:02 +0000 (UTC) X-FDA: 82188448524.14.B29C1D7 Received: from mail-vk1-f178.google.com (mail-vk1-f178.google.com [209.85.221.178]) by imf25.hostedemail.com (Postfix) with ESMTP id 287FDA0006 for ; Mon, 3 Jun 2024 05:29:01 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PEKzgBph; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.178 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717392541; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A0C+djLy3MXJYIEa6rJxljDWo2CMo+1PKaqWv7bB1U8=; b=etKx4IUiunWh+36lEKPeUx7HNG7Q62exUWd2M7/XJALUs/1n3efeHZVUrOtptctxDCB0sr 4yCTq7sDy1DsV32ZRiSdRfe99eq+QnwxHP5j/7huJZnsogz6MEaNHSlT4bdsmq7BcSsbYt CmkFJPvjRtcrR5I/TlO6DBUQZevs2gk= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PEKzgBph; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.178 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717392541; a=rsa-sha256; cv=none; b=WkBggliE8yhZL+iTSvuFsCzMyaiDsnh+ogtCX6S3xWiLl6ItAELjBt7jyHrN6YaYS5HFPg 9fPggzsjYdOvL9Zxc5Cvr9xiogIxYFXdQqwVz+druHbRBwQHOq+1RfHPFTGP2agkN9kfAw DyNWX4+BC0BiwCxKWsjqwNZD0Daue/k= Received: by mail-vk1-f178.google.com with SMTP id 71dfb90a1353d-4eaf67ad82dso1097523e0c.1 for ; Sun, 02 Jun 2024 22:29:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717392540; x=1717997340; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=A0C+djLy3MXJYIEa6rJxljDWo2CMo+1PKaqWv7bB1U8=; b=PEKzgBph33GFw6NGsZ2PIcXzlU4W6g+ph64w610P5jAX6nt3q5pUzrNXGZEEjPE8hJ eZ68GvGsdx3FnvEnTT08auFipg3W8XF5DTy8hSVqH8Nj8KTtYPCBwCXvAbstVpO3F9YQ HHTWppEQo6H3GY94TPK5cV4PKSzSEUP3mAzWwfi+YM3Sw32ZWUYBgtiGyhDaiXvLVpgD EHn0NaFy0LmZ0TTpQ/qGzS/s21mAJqvQax9MV4P1XjE7jMsrKfV38y7ndzOEMOTwlLfn jUcOGUTnmx5XQ12IJd8YoncVG7n9kBBolih7ub0wW2xh8FLnUPpqryMV/JArYorWTnm/ 46jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717392540; x=1717997340; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=A0C+djLy3MXJYIEa6rJxljDWo2CMo+1PKaqWv7bB1U8=; b=r5Odpj8UUla6kUVX4QBg8XDejRRTMiRumRLry+n1DZSiSxEk5K3vrpMQ1eGpegScpQ qz16+ixI3foLovCbV6zb051Ppjh24soCpWsDAYYlqflU//OMQBcHcFXLiLEVFYCtiKux TVunUILFOfRjIGpASspIm0COhCxN0MDFHFuVxK5lADsVujOQ62zMwtQ2DmwmxuF75iYe 7MFFiU2NtvA8mKV605Ev5E1eno3/JKpa3D3MjU28KDlgnltBKhIdnjlzF+wrqzS6vesB DRV4nlL0X3XInnNIqh9abWLMDyGlbDG9PTdeXLzcwxUMrdKku4CnNgzWhPSP+901sOUT 36+Q== X-Forwarded-Encrypted: i=1; AJvYcCWYg607Sf6rfHYw1lBYbovQxjEmoI8NCXAlnTskEHGDXisyt0uMC6nJ7HzJFGkoQUDdlQLufuMYpBatl7kTfQNnKPM= X-Gm-Message-State: AOJu0YwhhoOAiEiaP+HIQuPpOUcTsf/mBNzE9KuGK6NJia8SrqugfkPg 1nH/uaqr4pj6pVctqRZXvdR8ozoZFAcvTu5//SVqGu5S73po5pUWCc/JaZfMLjjrXKarxFJF/WK OvP/zNcPRe36QXZRrZ4Pcae84XVc= X-Google-Smtp-Source: AGHT+IHLV4FrSQQfyZhSbixlBJVKE5WxGGSJ/AhirATYsQGGWJMwykhGKOe/jc+jkdTuhw5x/kNa7XiWtHv/d1osufc= X-Received: by 2002:a05:6122:e6f:b0:4e4:ed90:27cc with SMTP id 71dfb90a1353d-4eb02d7cb9dmr6068331e0c.2.1717392538696; Sun, 02 Jun 2024 22:28:58 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Mon, 3 Jun 2024 17:28:47 +1200 Message-ID: Subject: Re: [PATCH v3 1/6] mm: memory: extend finish_fault() to support large folio To: Baolin Wang Cc: akpm@linux-foundation.org, hughd@google.com, willy@infradead.org, david@redhat.com, wangkefeng.wang@huawei.com, ying.huang@intel.com, ryan.roberts@arm.com, shy828301@gmail.com, ziy@nvidia.com, ioworker0@gmail.com, da.gomez@samsung.com, p.raghav@samsung.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 287FDA0006 X-Stat-Signature: h66w9awssgsnkrx84mbu9t489igznutr X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1717392541-455534 X-HE-Meta: U2FsdGVkX18IKUr+n7CbgNqsG9pg6U6tjdM9w++FuEf8zpLUvQRJwk3bU/7prup1oGpZtN9knRcX2odPDb2e3Wro4Z1vdQrzk0y2wG7PfxlWOHQZsuvxHM8dfWd8LdFP3P5zSVv2aFRHnHKOhj4sUshlYlVAOUY9OjxuIgTe8slb16DOAnwa+KJDmsYVtpI40W/ekEhl7b1Gzxlx31ylGKX4Sgd7NPzUeqlmvsi4KXXDnjSxXPYxf3FBzjk0PvWHdanjm/cbH+ktJaH3wC1pYQZPFXoubI+yuDQqTe214H3ggfij2TLa1fR1tlYc1dEvNQMw/cjjrB3KxmVtPAM8j+b8nI+IpW0JQFVrMS9t/Gha8F8u4BrHHgXTkefCvAVbwQqmgwPa7IuYg963sf9FRWzqnBZ1pZ/OlJ9pddi1ReNsHsNAPboD2kJnBHKEwhrcpxxAI+2A7CUXSnzlhp92t41VBe57RmG9t774qTdcESOBqqyfmzBh+36wLo45AiiviI9ZP0Y+sCZm4c6rfLISePMpj59Nq/LAjHTHmFi0lkLU23v37NfVi/pHlkDpFoG3+csOj1OpS/m3mqXOtn+YstZr5/YZ1IFsUKDrrqQIBZ4yd1Jg+Fs8aPPgiOaNvNCEAQvmM/3G5Ll5vPEYfD3MUCRzsLCxavf7WRL0bJGZ6zQQ6OQ/y8yqQ3PJYYpm3o6bf3MYeNAAf7lCtE6W8Kta8wIhlxX4U26VYhquaT6mOUkxNqS/hO00f7PfisPJ0jyMjMl/rnXzs1hl0+mjLbC5BKcvgVrpN8buFLGT9ITc1LwsaDHc20vNpGpOBfgX4S6o6Xx0Ce/iuugA5XxDofdLDM07xl+p5KqJXJaJ3oB3mWQt3Uc7TLvdmbnJ1xT90VvUBLhF2o8O+amCelC4Z+HtEdi1/+/vek736VW/FkYfbgWuywP7lyLDQviIxud1so54Nm8QE8+VrBLnQek165O npluQtJI QzAzZR+y8J2+vimdvHwBvlZ7azG7n0GhR82cxpLgA+RsI1NzXpCgQnR0/kQk5/BxMse3Qk9WCsAtQiy+xzvATa/AjQiipDUtAvETsTd8S0gWj3HSMHZHAHm2CWBHy0Gt1+8tu7By6UwAPWfW/nkU5jhIure5p3l96SwMp22hKx4zUuZtXy+EOjnBvSmJFkkO9QesWIZ12fVj2ByZo2icZX/Z3+cfP5639NARwbTqJbmOU1MIFoegZyVvd2zakmXk3dKlVNh2XS/JMyND3liS+NEirP81lQ9+sBuMEs62ih7sWs0hZ0yLh2P6ug3xVMZsBuaAU/82S9XYjOIGls3FRjdVMrsREO+uIKNGdJuhk0OtJg+BzvgblUH59VRXmo1sKT4HsOYJphe5H4XOqnlEKkAnhIrdg1JoCEmvHVuhPhJLRi3OnLn2jGe+EJfUBxz8yuKWU4P34Ym/KFgeFEpzPedR3sA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, May 30, 2024 at 2:04=E2=80=AFPM Baolin Wang wrote: > > Add large folio mapping establishment support for finish_fault() as a pre= paration, > to support multi-size THP allocation of anonymous shmem pages in the foll= owing > patches. > > Signed-off-by: Baolin Wang > --- > mm/memory.c | 58 ++++++++++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 48 insertions(+), 10 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index eef4e482c0c2..435187ff7ea4 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4831,9 +4831,12 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > { > struct vm_area_struct *vma =3D vmf->vma; > struct page *page; > + struct folio *folio; > vm_fault_t ret; > bool is_cow =3D (vmf->flags & FAULT_FLAG_WRITE) && > !(vma->vm_flags & VM_SHARED); > + int type, nr_pages, i; > + unsigned long addr =3D vmf->address; > > /* Did we COW the page? */ > if (is_cow) > @@ -4864,24 +4867,59 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > return VM_FAULT_OOM; > } > > + folio =3D page_folio(page); > + nr_pages =3D folio_nr_pages(folio); > + > + /* > + * Using per-page fault to maintain the uffd semantics, and same > + * approach also applies to non-anonymous-shmem faults to avoid > + * inflating the RSS of the process. I don't feel the comment explains the root cause. For non-shmem, anyway we have allocated the memory? Avoiding inflating RSS seems not so useful as we have occupied the memory. the memory footprin= t is what we really care about. so we want to rely on read-ahead hints of sub= page to determine read-ahead size? that is why we don't map nr_pages for non-shm= em files though we can potentially reduce nr_pages - 1 page faults? > + */ > + if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma))) = { > + nr_pages =3D 1; > + } else if (nr_pages > 1) { > + pgoff_t idx =3D folio_page_idx(folio, page); > + /* The page offset of vmf->address within the VMA. */ > + pgoff_t vma_off =3D vmf->pgoff - vmf->vma->vm_pgoff; > + > + /* > + * Fallback to per-page fault in case the folio size in p= age > + * cache beyond the VMA limits. > + */ > + if (unlikely(vma_off < idx || > + vma_off + (nr_pages - idx) > vma_pages(vma))= ) { > + nr_pages =3D 1; > + } else { > + /* Now we can set mappings for the whole large fo= lio. */ > + addr =3D vmf->address - idx * PAGE_SIZE; > + page =3D &folio->page; > + } > + } > + > vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, > - vmf->address, &vmf->ptl); > + addr, &vmf->ptl); > if (!vmf->pte) > return VM_FAULT_NOPAGE; > > /* Re-check under ptl */ > - if (likely(!vmf_pte_changed(vmf))) { > - struct folio *folio =3D page_folio(page); > - int type =3D is_cow ? MM_ANONPAGES : mm_counter_file(foli= o); > - > - set_pte_range(vmf, folio, page, 1, vmf->address); > - add_mm_counter(vma->vm_mm, type, 1); > - ret =3D 0; > - } else { > - update_mmu_tlb(vma, vmf->address, vmf->pte); > + if (nr_pages =3D=3D 1 && unlikely(vmf_pte_changed(vmf))) { > + update_mmu_tlb(vma, addr, vmf->pte); > ret =3D VM_FAULT_NOPAGE; > + goto unlock; > + } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) { In what case we can't use !pte_range_none(vmf->pte, 1) for nr_pages =3D=3D = 1 then unify the code for nr_pages=3D=3D1 and nr_pages > 1? It seems this has been discussed before, but I forget the reason. > + for (i =3D 0; i < nr_pages; i++) > + update_mmu_tlb(vma, addr + PAGE_SIZE * i, vmf->pt= e + i); > + ret =3D VM_FAULT_NOPAGE; > + goto unlock; > } > > + folio_ref_add(folio, nr_pages - 1); > + set_pte_range(vmf, folio, page, nr_pages, addr); > + type =3D is_cow ? MM_ANONPAGES : mm_counter_file(folio); > + add_mm_counter(vma->vm_mm, type, nr_pages); > + ret =3D 0; > + > +unlock: > pte_unmap_unlock(vmf->pte, vmf->ptl); > return ret; > } > -- > 2.39.3 > Thanks Barry