From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 442DAC27C4F for ; Thu, 13 Jun 2024 15:27:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA2246B00A1; Thu, 13 Jun 2024 11:27:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B51A46B00A2; Thu, 13 Jun 2024 11:27:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A40D16B00A3; Thu, 13 Jun 2024 11:27:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 855A06B00A1 for ; Thu, 13 Jun 2024 11:27:42 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E6E6F160233 for ; Thu, 13 Jun 2024 15:27:41 +0000 (UTC) X-FDA: 82226245122.06.5F4B750 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf11.hostedemail.com (Postfix) with ESMTP id 8D10A40017 for ; Thu, 13 Jun 2024 15:27:39 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=ZvsXJYxP; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=kernel.org (policy=none); spf=none (imf11.hostedemail.com: domain of mcgrof@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=mcgrof@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718292458; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LGtsgFmy+c+3jY1klBtEAn+SFOvN7+PibxqK++aoOQw=; b=2l8CsanjmNzaS0Guo1tfjN2dYHfUh/pZkC2MeBvBLJ1lrgPEX9LNbdfKqyTJISsEXMlGnR Z0qMcBrKcvnXCUOtsIGPLvxq7y0jIMbLDoOQ5icLPTlOMrTerPFCItM/HK1LDV/qZ2Yko6 nVfNjuqHc5Hal097KE0LUxEiC9Uj6Dw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718292458; a=rsa-sha256; cv=none; b=nWiaPvYHJhiIXjvrH0vouwfjF9DNOzv5G80RaDDXftPq2httcQmflnMRxDywtONtlVoSGH X+VKEusJzClaaAkLZAAnYgOW5ALeAenBgpUJMf6lCV5ushCrKd8UQQ0U+KFeJkiFKu/lh1 wmYtROZN0R5QZXRZvt/k9nWQPPewsJs= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=ZvsXJYxP; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=kernel.org (policy=none); spf=none (imf11.hostedemail.com: domain of mcgrof@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=mcgrof@infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=LGtsgFmy+c+3jY1klBtEAn+SFOvN7+PibxqK++aoOQw=; b=ZvsXJYxP8UkMA2jQImJwq9hj8k lnMacW7gxkiWWjInImcqZHz6qwzqOxkqqK5ifqaleu1y5CPmgGsrjqjzXgNmwhEr/fBbfxqnMIpQd vza3Ug5c25Bgj8dwzQerlRaE5w/hl0FC2kppP1KTGuZXYymgQPndNEfAvFQiAaGeLJcIwqTntKwj6 VimCBobGx4Zds43cmQCEG5jPxWtqeS9hSyj1se5p1KfvJ7+qNL1FUQJHyHu3W6tPH0mUX4D1MrVTM tbBBKbOLZmpABQNmBS0kJSUb+UpJOQeO8Wx8fIPqKWH7nLGgtoGoXk2anaroCu+J7rqqmauuZ8bJn PL3MtXiA==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1sHmMJ-0000000HAMB-4A70; Thu, 13 Jun 2024 15:27:27 +0000 Date: Thu, 13 Jun 2024 08:27:27 -0700 From: Luis Chamberlain To: David Hildenbrand Cc: Matthew Wilcox , Hugh Dickins , yang@os.amperecomputing.com, linmiaohe@huawei.com, muchun.song@linux.dev, osalvador@suse.de, "Pankaj Raghav (Samsung)" , david@fromorbit.com, djwong@kernel.org, chandan.babu@oracle.com, brauner@kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, hare@suse.de, linux-kernel@vger.kernel.org, Zi Yan , linux-xfs@vger.kernel.org, p.raghav@samsung.com, linux-fsdevel@vger.kernel.org, hch@lst.de, gost.dev@samsung.com, cl@os.amperecomputing.com, john.g.garry@oracle.com Subject: Re: [PATCH v7 06/11] filemap: cap PTE range to be created to allowed zero fill in folio_map_range() Message-ID: References: <20240607145902.1137853-1-kernel@pankajraghav.com> <20240607145902.1137853-7-kernel@pankajraghav.com> <818f69fa-9dc7-4ca0-b3ab-a667cd1fb16d@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 8D10A40017 X-Stat-Signature: 4a13zhod6cg499ecnzzicaunhy4pa68a X-Rspam-User: X-HE-Tag: 1718292459-340591 X-HE-Meta: U2FsdGVkX19C4Pp1cudCaYV81AIWlVpC0MSwP/bzID4+7z/s/tx92UpMnkv4Sr2fsH7g5GIe3PEW2KYiPHKJyjunqTXMGVSwZhssHhzGUtZaRmVFPf0aMtzK9NNFJzrdubWBSAlsHbLRDx9BWWuiPVzBerZye2WWs4dwr/PHDItLpTJF498Ybl7K2gxAS79pHsv/hAl/N8OoDtQrK8bOlYTDJzXWu8hYJaHjPuzvkcMNI4xGmlJuIQONITCaZFmFYIAVmFSD/QAS0ZIk0nbz+K/UxZo1K2cJpdv64AW6eUg7ABy62ks2KznT/Nq7hmAiIbIFJv3d64qF5k92GDpB4JA1fT3qS2gdurgpEwOE9I7CfZti/PwXyYW0EJNNWZRuK0VVLABfzz1q1caJXh8+d+jfNPQZST4yqcYZnbf0oH0mRcdtvq7SiBt4fsn6KMzPkklvFdmgp7376WJsk2pW+ESDtvSK9GhOPs+Bw7DXLr0rk80AGGzIPmExNKkwpg3EQ16zLDZdoz8XoLP9cQ79STfz2YPGLiXxrFRVC+SrsqxndBRcuvkmMSo66y+bluxGNoXfggjC0zT02K2jKgh0+hcM9MYYnvksLlCGTYEIQnedOxhWmozrnYLUlXcaBpfMWA+6P7T0A3hwUjLifZkfm/K2wChuiM1Zro+Yzvvs5n7EBoc15JQhzOqoERo6F/HV5Dm1WSQF0FxYE7AmUhk0F45C1XCpgAKS0cD6ErKroC1L4XNbx2OvIRmPA7gALftXAVFBqhT56p5XhamGbUl2VdG8kh0PkoOLuU849qls6MbKxABGWfy9dxSjNJtwzOUjuvMWQNOfQj1XnUddaUyK+dP3Vr/sbsioJPexulx9PJRsRHc8MlNIjtLoVNcZrxbSiCR8S4HEGYtuABmgADQVr73PdHeFxIIV70RlL+cMnxUTpE2Z35Bt+dpXn7Z71MexpypApxFP00jjfMD00Ur m20zd7Yl bY+DzSYgSsozta1ANAE0rgBlZRX4mWH+1qLkA46nwodmDMga1GLFqZJVAqFCUO7eFLqvoxSUfu9Y6ZU5T8fK2bP2lC76rbDPghoIQNVZAHjv88jgBnCJ3Pu5Y2N0sEfnASeol07p45neVL+h2oXw40SgmnCVq446shm1oqKNWLxr1MOx9D5s/GXuoTVIoWF+k7dphNuEv8LTtpaAEdZPI77P8OSDXIysm8xBLte0yxYXE6qS3qVd68kDA7CQYnySMfqkMplsRo27bIa3QQ13giDAZsS+Y5woIR4NLdMWcGa9ettPxXYwUk/ha+PyNC5OKagXN7eYZxSB75wxSRUZNSu1zvkbkj/YQr3xWaPH60uHPqgQ8FmFf4/efKQ595So3G2W2mH55M0VCNOk7c0Bi98k2XP47kDWHRD+XvbHukIH+9RsMiE65P8dt7uzUVvHs8RGuE3OY4mAM85gxgHjc4cAiuqE9C6Jt5wmL9t0jus4NUb+cTBC7zTefvMgJ0LuKIuxc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 13, 2024 at 10:16:10AM +0200, David Hildenbrand wrote: > On 13.06.24 10:13, Luis Chamberlain wrote: > > On Thu, Jun 13, 2024 at 10:07:15AM +0200, David Hildenbrand wrote: > > > On 13.06.24 09:57, Luis Chamberlain wrote: > > > > On Wed, Jun 12, 2024 at 08:08:15PM +0100, Matthew Wilcox wrote: > > > > > On Fri, Jun 07, 2024 at 02:58:57PM +0000, Pankaj Raghav (Samsung) wrote: > > > > > > From: Pankaj Raghav > > > > > > > > > > > > Usually the page cache does not extend beyond the size of the inode, > > > > > > therefore, no PTEs are created for folios that extend beyond the size. > > > > > > > > > > > > But with LBS support, we might extend page cache beyond the size of the > > > > > > inode as we need to guarantee folios of minimum order. Cap the PTE range > > > > > > to be created for the page cache up to the max allowed zero-fill file > > > > > > end, which is aligned to the PAGE_SIZE. > > > > > > > > > > I think this is slightly misleading because we might well zero-fill > > > > > to the end of the folio. The issue is that we're supposed to SIGBUS > > > > > if userspace accesses pages which lie entirely beyond the end of this > > > > > file. Can you rephrase this? > > > > > > > > > > (from mmap(2)) > > > > > SIGBUS Attempted access to a page of the buffer that lies beyond the end > > > > > of the mapped file. For an explanation of the treatment of the > > > > > bytes in the page that corresponds to the end of a mapped file > > > > > that is not a multiple of the page size, see NOTES. > > > > > > > > > > > > > > > The code is good though. > > > > > > > > > > Reviewed-by: Matthew Wilcox (Oracle) > > > > > > > > Since I've been curating the respective fstests test to test for this > > > > POSIX corner case [0] I wanted to enable the test for tmpfs instead of > > > > skipping it as I originally had it, and that meant also realizing mmap(2) > > > > specifically says this now: > > > > > > > > Huge page (Huge TLB) mappings > > > > > > Confusion alert: this likely talks about hugetlb (MAP_HUGETLB), not THP and > > > friends. > > > > > > So it might not be required for below changes. > > > > Thanks, I had to ask as we're dusting off this little obscure corner of > > the universe. Reason I ask, is the test fails for tmpfs with huge pages, > > and this patch fixes it, but it got me wondering the above applies also > > to tmpfs with huge pages. > > Is it tmpfs with THP/large folios or shmem with hugetlb? I assume the tmpfs > with THP. There are not really mmap/munmap restrictions to THP and friends > (because it's supposed to be "transparent" :) ). The case I tested that failed the test was tmpfs with huge pages (not large folios). So should we then have this: diff --git a/mm/filemap.c b/mm/filemap.c index ea78963f0956..649beb9bbc6b 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3617,6 +3617,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, vm_fault_t ret = 0; unsigned long rss = 0; unsigned int nr_pages = 0, mmap_miss = 0, mmap_miss_saved, folio_type; + unsigned int align = PAGE_SIZE; rcu_read_lock(); folio = next_uptodate_folio(&xas, mapping, end_pgoff); @@ -3636,7 +3637,16 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, goto out; } - file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; + /* + * As per the mmap(2) mmap(), the offset must be a multiple of the + * underlying huge page size. The system automatically aligns length to + * be a multiple of the underlying huge page size. + */ + if (folio_test_pmd_mappable(folio) && + (shmem_mapping(mapping) || folio_test_hugetlb(folio))) + align = 1 << folio_order(folio); + + file_end = DIV_ROUND_UP(i_size_read(mapping->host), align) - 1; if (end_pgoff > file_end) end_pgoff = file_end;