From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7584FC18E7C for ; Wed, 26 Feb 2025 15:19:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 12D07280008; Wed, 26 Feb 2025 10:19:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0DAF7280003; Wed, 26 Feb 2025 10:19:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F0BD1280008; Wed, 26 Feb 2025 10:19:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D271A280003 for ; Wed, 26 Feb 2025 10:19:01 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 9E3AEC21F9 for ; Wed, 26 Feb 2025 15:17:30 +0000 (UTC) X-FDA: 83162449860.18.B0D7E34 Received: from out30-99.freemail.mail.aliyun.com (out30-99.freemail.mail.aliyun.com [115.124.30.99]) by imf14.hostedemail.com (Postfix) with ESMTP id 9C0E8100015 for ; Wed, 26 Feb 2025 15:17:26 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=dWXByOvv; spf=pass (imf14.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.99 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740583047; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bP9nSFdQETXvwlDryZ3TLOQHArbJ0X52SeLxUUzWmC8=; b=H+iFV0CHmAMCzfdpmC1FvijuFYqqXOfzgT+qUnYPGjsxImeKncYPL4T7EoplbZsCht+pKn Q/AZvP8Zs1Kfan2T8ncguRx1PX5DZBVbRMYrZ7y5u4uJUqQMH4eMw30K3cZRfXOfPm5h5X g7oQSS6mLBJaVeJwQnWkGN2W5WnaUDY= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=dWXByOvv; spf=pass (imf14.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.99 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740583047; a=rsa-sha256; cv=none; b=EkgoueSlmydJ9YrpwzoK1PItlT7WZ3wKN7guKOQyhhalORnlsAXOjIMlnPlaqsxShwI3KO Dl7O8cAGJyg8te0nmG9HnylQKhZkJqTt7KAjshSVyxQX3llYuK32zNs5YiQO+3q6QzpVZu BIwIKdE9mf9La8aOTFDOMpNol7AgShw= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1740583042; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=bP9nSFdQETXvwlDryZ3TLOQHArbJ0X52SeLxUUzWmC8=; b=dWXByOvv0Wd2cTGoKTMo5A4GEoIL/a6nKyxdRbgU5Y3UXviGfdO4tNj5of3ypoBwdGQGL8Z86BGIlYQlYU6hJuq8k4UiteeW1LDQc2nlkGdVUDkP9IVMUhkj3dyRxYHraWfnL0h2I1Fl0nbgtbXxmvM2vzqD7RiIbqxTr7unfWo= Received: from 30.39.248.98(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WQJBj1w_1740583040 cluster:ay36) by smtp.aliyun-inc.com; Wed, 26 Feb 2025 23:17:21 +0800 Message-ID: <1a1bd8ed-1204-4ca4-82ed-cdba689c06c5@linux.alibaba.com> Date: Wed, 26 Feb 2025 23:17:20 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm: fix finish_fault() handling for large folios To: Brian Geffon , Andrew Morton Cc: Zi Yan , Kefeng Wang , Suren Baghdasaryan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, Hugh Dickins , Marek Maslanka References: <20250226114815.758217-1-bgeffon@google.com> From: Baolin Wang In-Reply-To: <20250226114815.758217-1-bgeffon@google.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 9C0E8100015 X-Stat-Signature: osdjroh6ucdiftfjfnmnpakc6h9184mn X-HE-Tag: 1740583046-485289 X-HE-Meta: U2FsdGVkX1+B5lBoLgHrUgQeV4J47L6V90fOp8hkhoIFzv5YVWcKW9wOB9MNqBJ2PKPQrVTdWZDb7zndbow7go1ZNcb3HGvNqa2+cG9lrL7W3M1yVa2yIoT8oxZ1ewqd3ituKkz6kmpTmilR/LEKQNMP77eIF08/DGXwr2R7UPnKEera17HlO8DRCnkX61KPCo/Hx6N88YnHZBwVb8JTxZna3sxYPII9DyKd2K+EQh77S++teamoIOJ80jvHtqh+GA1gcEMfkdWK67U9ZpOud/AcJAd8Z4k9zb5J7LSMF0TJjzgcUVqpybvVvwNdspCvmAPDwAnPnUuAfd+Q7tAVlVYZdL30cMn/7aUcP8MZmhMtcl7FCdQGEGGJKlwZHqdh3QnyKsmPO27a+cALkmP8G1iYfVG+ZHXkJYkzAbXRp8OgtC1izLW/zisNcWsRY5+2z6oOpoB6g9FmgvO/hGjo48M6J16wZVy3vfU9SHxc4gOP7uP43XJE+UDeDwNOCM9KY5JC+LEghk4eolzLDyRT0MY07QbYeBir0QUmfD1ckhtP+x+C1epC+Vl/GKKEVOeGkmua7l+rpYJC1y+8ZedTRqjEK0jZ41u2TseK7B4albmcvcx06NlO1Vq2WHNEpycc9TAJGbkANkLx/wVHJtMyAPzmZfdBy8ZdXn97+LlbYgUWIWg2TSflEWM/eUF5pfQyVpnIFRYXUsolbRcKmsm8j4orJ14Rp+3WQZZ/Aq/1tkzJDR4U8QdrgCQSeExWOwHnfB3Vjs3mlGil22xEXgRlwinfD7SIsNRpqwuGFj+M5k2REQkLkV20ZyR1vZ/wRUzEWKrLCaQ6eRrWInoq/UjCwuEUV2GBNVInRx8fpdzpYMETZ6AcBR35vpSPtqpzB+LlQY+TbaUjfbgT7eY5LKY80+0V1XuPdo582mCESwuGi+LrMYZE5DJiB4TMyxVLdnChrl4NO5gs3a3yVBvHcEQ zwzUzzdX DYKl64sEDCpkp1F8HuoM/SiTseOsajYXPj3XcFhW6mzuJuTNBuYrjiDZxveRMM4+yJpjz3AtSsI4bc1Gxlgrb98uYH8mApCOl9yEHZn5UpU34Nz4DmYt19DQnwYIumgSP6QGjkO/vxy6vwHqBq4SyAc6jqj/RiHgeDmIvjAl9MGKjakyw4Cic/Cd4Kq3GT8oYs6g0zUgmU76qSmCxAn1QFyfmhy/0+pVHPhQUM/CLGBBa0+TtoApYP/hM7GQR6Y9yui0eTfuR6KM76LCwqCmrfpWjqqGiFX3aetcBKus2o+tkdR4prF80o4IXVX42TNRHoi5TPywEiswk2GmovXTPBRmHmvKOfl5jlta/Mz2y2g0AeGqeDNcPACEkzw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/2/26 19:48, Brian Geffon wrote: > When handling faults for anon shmem finish_fault() will attempt to install > ptes for the entire folio. Unfortunately if it encounters a single > non-pte_none entry in that range it will bail, even if the pte that > triggered the fault is still pte_none. When this situation happens the > fault will be retried endlessly never making forward progress. > > This patch fixes this behavior and if it detects that a pte in the range > is not pte_none it will fall back to setting just the pte for the > address that triggered the fault. Could you describe in detail how this situation occurs? How is the none pte inserted within the range of the large folio? Because we have checks in shmem to determine if a large folio is suitable. Anyway, if we find the pte_range_none() is false, we can fallback to per-page fault as the following code shows (untested), which seems more simple? diff --git a/mm/memory.c b/mm/memory.c index a8196ae72e9a..8a2a9fda5410 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5219,7 +5219,12 @@ vm_fault_t finish_fault(struct vm_fault *vmf) bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED); int type, nr_pages; - unsigned long addr = vmf->address; + unsigned long addr; + bool fallback_per_page = false; + + +fallback: + addr = vmf->address; /* Did we COW the page? */ if (is_cow) @@ -5258,7 +5263,8 @@ vm_fault_t finish_fault(struct vm_fault *vmf) * approach also applies to non-anonymous-shmem faults to avoid * inflating the RSS of the process. */ - if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma))) { + if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma)) + || unlikely(fallback_per_page)) { nr_pages = 1; } else if (nr_pages > 1) { pgoff_t idx = folio_page_idx(folio, page); @@ -5294,9 +5300,9 @@ vm_fault_t finish_fault(struct vm_fault *vmf) ret = VM_FAULT_NOPAGE; goto unlock; } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) { - update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); - ret = VM_FAULT_NOPAGE; - goto unlock; + fallback_per_page = true; + pte_unmap_unlock(vmf->pte, vmf->ptl); + goto fallback; } folio_ref_add(folio, nr_pages - 1); > > Cc: stable@vger.kernel.org > Cc: Baolin Wang > Cc: Hugh Dickins > Fixes: 43e027e41423 ("mm: memory: extend finish_fault() to support large folio") > Reported-by: Marek Maslanka > Signed-off-by: Brian Geffon > --- > mm/memory.c | 19 ++++++++++++++++--- > 1 file changed, 16 insertions(+), 3 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index b4d3d4893267..32de626ec1da 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5258,9 +5258,22 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > ret = VM_FAULT_NOPAGE; > goto unlock; > } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) { > - update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); > - ret = VM_FAULT_NOPAGE; > - goto unlock; > + /* > + * We encountered a set pte, let's just try to install the > + * pte for the original fault if that pte is still pte none. > + */ > + pgoff_t idx = (vmf->address - addr) / PAGE_SIZE; > + > + if (!pte_none(ptep_get_lockless(vmf->pte + idx))) { > + update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); > + ret = VM_FAULT_NOPAGE; > + goto unlock; > + } > + > + vmf->pte = vmf->pte + idx; > + page = folio_page(folio, idx); > + addr = vmf->address; > + nr_pages = 1; > } > > folio_ref_add(folio, nr_pages - 1);