From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44D49C27C53 for ; Wed, 12 Jun 2024 13:40:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B89B76B0098; Wed, 12 Jun 2024 09:40:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B387F6B0099; Wed, 12 Jun 2024 09:40:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A00C36B009A; Wed, 12 Jun 2024 09:40:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7F39F6B0098 for ; Wed, 12 Jun 2024 09:40:28 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1E4F6121784 for ; Wed, 12 Jun 2024 13:40:28 +0000 (UTC) X-FDA: 82222346136.05.A1CF381 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by imf14.hostedemail.com (Postfix) with ESMTP id 52BA610001F for ; Wed, 12 Jun 2024 13:40:23 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf14.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718199626; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8jD1q/nBWqm03WiyDgHv+3/XX+hK6IcLnWG4+K6azBQ=; b=CkwCRq2LDtaNpF0QdmLNUFtNP1DUmc5ummMKwn1NB5iOJXS0lnmG0R+Bcc8Jfcz6PBrhg4 wDS72Vvv1lLEI7byHguo2ngqIFGhl8nb85wFr4vAetstw+WDa25a50+WkAtNuzxyAwS05v XzaQnZF2hxG9i1JJotSRcgaJWfefiDI= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf14.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718199626; a=rsa-sha256; cv=none; b=FtLcok5Kq04BzlUTaV4x9gZRjCBI04DVUVA9QzIwh9EFzPi2XWP5f98GPqhp3zNMGN9j2G pDinpgSBUxWyA0OciXX0ywSoWd+Ye2ZC5zQxnGw7Np3Sfk0aZKPhdt/z2jTpBGajFuD9XY ZQvNxfm+Q0cpE8NxlKclRDUnFsLHHFY= Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4Vzmlf3M0sz1SBkY; Wed, 12 Jun 2024 21:36:14 +0800 (CST) Received: from dggpemf100008.china.huawei.com (unknown [7.185.36.138]) by mail.maildlp.com (Postfix) with ESMTPS id 4C69818007B; Wed, 12 Jun 2024 21:40:20 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemf100008.china.huawei.com (7.185.36.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 12 Jun 2024 21:40:19 +0800 Message-ID: Date: Wed, 12 Jun 2024 21:40:19 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5 1/6] mm: memory: extend finish_fault() to support large folio Content-Language: en-US To: Baolin Wang , , CC: , , , <21cnbao@gmail.com>, , , , , , , , References: <3a190892355989d42f59cf9f2f98b94694b0d24d.1718090413.git.baolin.wang@linux.alibaba.com> From: Kefeng Wang In-Reply-To: <3a190892355989d42f59cf9f2f98b94694b0d24d.1718090413.git.baolin.wang@linux.alibaba.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggpemf100008.china.huawei.com (7.185.36.138) X-Rspamd-Queue-Id: 52BA610001F X-Stat-Signature: c81pq6xo3ban3ajee7rc6nohnjow36n7 X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1718199623-297865 X-HE-Meta: U2FsdGVkX19HxVX2fFKpsA0X7EmhcLMU/axvgPzN4u3oqQEEuVx+QdqvNXMi/uY1oIp+wP85Ku7sM/abDhdMiJoNv1YTU8QUMpWwOCCG9ks/AWVHEv3jQHXFaI+gc7HFZMXqwScl5Wa7iLsBpxzYamvjsJZ0mSZMVYQCi99xpELHERZQHVl/kz27ckcSfSZJmFYBa/GMfXGzgKbBkacODC0lslec4MB6oQhTxE3gyJmf035SMxDh3uEeAS8y/kFcCE+DxwEemu/NPYN0SfzmY7mLONyIFji9J2/21yanNtFZni8Jzz1dccjBMMNExKCyDymVXmu311dm07GnpTrGcIsfVvS1ROnVv9v88PzNW0ky6s195pemhCWregWvFVSyyvaqd/1/VX9/hw0b73NUoewSVXjXCPnjx3Tn/6xOaNHANlBnKMCo6zkf2Hi0dOIHpnSEDKRWC9INSwHA5+0GPmYp2hDqtYEh+EF/FBqumq1Fk65AFUuoMAHc8uXbqEAHABuLO6MoKb6ywfBkLmRBNGD4QlEbGaT/JBjzzSSDnvZdcKtUG5KBn3OglVsPz6Aarcu2ROBiTSyA6E3kLBquFVlSvVUoBczTgUR8MLRcMmXB3if/hwnFlXJ8r0KNnxGLRM3fF7poZqr8X8UHjI823F132hlWqiHeu2oiPHDkJwdn/M3tQupGHFk+7l+R+3moTKFEEd5oF5m6Ra+uMvaRFBifTqDWRz/8h6n+5J0/wNvkW2dlKkCDmCgk4luGvuqlX6Qx3TvLtW7ou5kgOyJSAdCqM7/Bn4d4+9Hco4mIF4fTnM4OY+Th2Vmt1VMN5qjNLNBd5Whxn9WRv6ZIF1d6f4fN/9bUCRcnGW3pkRGXG6sn9goxasPMUOVB1JnxLxk/oBX0yyK6INV3KGYIUyodpgo4TDJgLSdlOoCa280sgugrtZzYREuUUVKK/9O65y43G6SVm7oXDy3+FfxFPRP fAexVdrt l9uf8DXg5ILWBFjZB9lOsvWa1WxqUfiEtZj2QVNxo5TlV7N3f/OU7Xe4CEy9BhHg5355VEGfHv8oGHaiv6SAvTIaE9m/Mo0W5aGEzwOGiaoWLqlARHNPFQmAwl5VvA2GHzaDyHcONrydPv9WHUIcx0OXG2yZYdPiZOj9au0PJUji53S7rRNp7cQiwlH8XU4xi3lte57Y7VlcwTyyQ6D4PJGbCXAUlDCeHyt4xUBr5ErVHpkegqqoNvu+sYmY9a+PbgKgsh9kCf5yFL6U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/6/11 18:11, Baolin Wang wrote: > Add large folio mapping establishment support for finish_fault() as a > preparation, to support multi-size THP allocation of anonymous shmem pages > in the following patches. > > Keep the same behavior (per-page fault) for non-anon shmem to avoid inflating > the RSS unintentionally, and we can discuss what size of mapping to build > when extending mTHP to control non-anon shmem in the future. > > Signed-off-by: Baolin Wang > --- > mm/memory.c | 57 +++++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 47 insertions(+), 10 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index eef4e482c0c2..72775ee99ff3 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4831,9 +4831,12 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > { > struct vm_area_struct *vma = vmf->vma; > struct page *page; > + struct folio *folio; > vm_fault_t ret; > bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) && > !(vma->vm_flags & VM_SHARED); > + int type, nr_pages; > + unsigned long addr = vmf->address; > > /* Did we COW the page? */ > if (is_cow) > @@ -4864,24 +4867,58 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > return VM_FAULT_OOM; > } > > + folio = page_folio(page); > + nr_pages = folio_nr_pages(folio); > + > + /* > + * Using per-page fault to maintain the uffd semantics, and same > + * approach also applies to non-anonymous-shmem faults to avoid > + * inflating the RSS of the process. > + */ > + if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma))) { > + nr_pages = 1; > + } else if (nr_pages > 1) { > + pgoff_t idx = folio_page_idx(folio, page); > + /* The page offset of vmf->address within the VMA. */ > + pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff; > + vma->vm_pgoff > + /* > + * Fallback to per-page fault in case the folio size in page > + * cache beyond the VMA limits. > + */ > + if (unlikely(vma_off < idx || > + vma_off + (nr_pages - idx) > vma_pages(vma))) { > + nr_pages = 1; > + } else { > + /* Now we can set mappings for the whole large folio. */ > + addr = vmf->address - idx * PAGE_SIZE; addr -= idx * PAGE_SIZE; > + page = &folio->page; > + } > + } > + > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, > - vmf->address, &vmf->ptl); > + addr, &vmf->ptl); no newline now, > if (!vmf->pte) > return VM_FAULT_NOPAGE; > > /* Re-check under ptl */ > - if (likely(!vmf_pte_changed(vmf))) { > - struct folio *folio = page_folio(page); > - int type = is_cow ? MM_ANONPAGES : mm_counter_file(folio); > - > - set_pte_range(vmf, folio, page, 1, vmf->address); > - add_mm_counter(vma->vm_mm, type, 1); > - ret = 0; > - } else { > - update_mmu_tlb(vma, vmf->address, vmf->pte); > + if (nr_pages == 1 && unlikely(vmf_pte_changed(vmf))) { > + update_mmu_tlb(vma, addr, vmf->pte); > + ret = VM_FAULT_NOPAGE; > + goto unlock; > + } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) { > + update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); > ret = VM_FAULT_NOPAGE; > + goto unlock; > } We may add a vmf_pte_range_changed(), but separate it. Some very small nits, up to you, Reviewed-by: Kefeng Wang > > + folio_ref_add(folio, nr_pages - 1); > + set_pte_range(vmf, folio, page, nr_pages, addr); > + type = is_cow ? MM_ANONPAGES : mm_counter_file(folio); > + add_mm_counter(vma->vm_mm, type, nr_pages); > + ret = 0; > + > +unlock: > pte_unmap_unlock(vmf->pte, vmf->ptl); > return ret; > }