From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D06A0EE4997 for ; Fri, 18 Aug 2023 20:23:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4E30B940012; Fri, 18 Aug 2023 16:23:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 46BE3280067; Fri, 18 Aug 2023 16:23:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30E29940065; Fri, 18 Aug 2023 16:23:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 2076C940012 for ; Fri, 18 Aug 2023 16:23:43 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id ACEE580D40 for ; Fri, 18 Aug 2023 20:23:43 +0000 (UTC) X-FDA: 81138351126.09.C14CE63 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf07.hostedemail.com (Postfix) with ESMTP id 1EFE440005 for ; Fri, 18 Aug 2023 20:23:41 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="D/314gsG"; dmarc=none; spf=none (imf07.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692390222; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eMSOQeab9BMo/9wblj2pEdrCD6yXcLeZBSZb6jVLwvY=; b=m7Qy5COpzwXZVgNeH29jQNsMIhAOFNa2gumyC2ih+5mROELdVA2CVBmH7UyVyO2l9HdgDR FX/uoyKjJbL5/XYTDOpcrd2e2k4PdQoglA0noMjCEY1sIStO2xsr6fStYzZqlxyb1a4ePG mi9UqAk8ijCzYi5671PpZ00YWbH466Y= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="D/314gsG"; dmarc=none; spf=none (imf07.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692390222; a=rsa-sha256; cv=none; b=E+A5jkIlZh/PbcdDRlW9XITtoJBwG549m6juPmFz0Kb/r1uYdZQ4ptkZRbtL0IN6rIfYZo Frj3oX/BcnkvIA8bLCQZYEIG3cJrZcdTqDrsMTpgRxKh3FRCO4TDWhZmSk1LEMGImHdh+h DLBEqxngdyW5R4ZiEePVrE/kld7r8ms= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=eMSOQeab9BMo/9wblj2pEdrCD6yXcLeZBSZb6jVLwvY=; b=D/314gsGws9n8E4pcG8TaD4Upb B/aeVE9EnQEU5U5bIS+9Wx7NMB1uw9QfrUAFgHVm8LNa6YVQhDwtQwfTw+6VL7fWIJ6FlsBedVVuM APDQUCF9Ww47RzqxQhpemYDKcdQ/PgyJ0zpEF5AOaYlRu9oODngH45xoPmSx/tMLeFmEGf5yVRq+g 1L/2ihXHtKbVfmnpPvpoWg2KKKjW+vK3sJFij/G2mZAa7Vn3HwNYjNsfwKtHBtJMWFtR/tYvkITQb fC9pzJwH8DBjXdKhHuNo4VsivOx5Q7cUam7Bd/6qO0BAxoHpkSpMo2Z0hEOnl2xG2UI2ey42Vxdh8 lBBbRcdQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1qX60P-00BUiU-3u; Fri, 18 Aug 2023 20:23:37 +0000 From: "Matthew Wilcox (Oracle)" To: Andrew Morton Cc: "Matthew Wilcox (Oracle)" , linux-mm@kvack.org Subject: [PATCH v2 2/3] mm: Allow ->huge_fault() to be called without the mmap_lock held Date: Fri, 18 Aug 2023 21:23:34 +0100 Message-Id: <20230818202335.2739663-3-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230818202335.2739663-1-willy@infradead.org> References: <20230818202335.2739663-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Stat-Signature: 8tu5syks6tzirximbagrjoqgh1wify4d X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 1EFE440005 X-HE-Tag: 1692390221-897257 X-HE-Meta: U2FsdGVkX188X180g3SDDV1CgD8ixKQL39NJ4xx5LPbnHywAmsYpNidjJRsbAubqNHk3Gqllof8SyIOP9AsSjDHrgJd38piCoU2lWYmWTmU4jfxrDg66wvImOy1iTekgwRArkA372cAAyV9dlGq4bHMx06/hBPTUfRgvicj48Bn7TqlT/uS5qtsvByMbqgkO3pWKHIzvnmMOI1dfRoyLZo3Jl2s9tQVuyfZPV6pVK1EQdZ8DH6g4tD8I1Y++7/t54r7xzC9N4Giau1Qy3xrpJr4tOtkhABv00Z6s15t0AiVUWEV6CYCMStZA+J4yCRKsKbfe8tNm5HO61iwRZfs9YanI+7Bc+zm6++7ZrT+F+ORZwcUlPMdTsG+L+qBvARhmTOCcZCc0E5ramIvULYEodyqIK0YsxhjXl/cnwwwgTQmBXvVaPy5InG2uLqKQIEYiEgKsjGXka0ere0Mx5lqOQTBuKvs+gGA1QPQtPW9RHQyVbWpRmxVadACyktc+2sk7S0D0+P7FeFKvtas3Nly/pqANMUH6uTRl1A3craD82KhiynvNHEAlHzbWSgf74VoYBTCM2nrl++k5RiFZLgSlaHL/Zxsve06s6GAYAT0PQDGKBiivvNwp4HsL8rDfFs1GtmJjev6lHo+Joi8auuZn1UP27EL+PF2dcQIx+QwMqU7JlOqfN6mWnQKnXnFwp3bAL81SfjtTTUZFnd8fbjo+MBCkUwnVX8XC1l80wXO20pQdfBrc/30jpyVdq1bZffadEf+m4Rt4jBPRLyuJ8DH7tBcK7BKTuhbUos9cadSV6/hBiKDy+3oUyyafNA0LTBBaz0Z++GmRMilgG2xNYD80LAnKUiCBKsv+FRn3wzKEKYwOPa0rXXJ/pbhE3mw07ic5H3ghNnqC0ZrEPW0CLAGjkGgwYbEiU+hOg3+3QRHCwt+YkiqfWxlVTXTXM59Jk/WY/8NYcQSTkOlEnZH9ELU qAkM4cBv gcNkxhDF/FBAwT97W0U3v+tgL6Gvu6uCndBiab/RgYDu4otdKur5L4PfN3Mqud0IxJWhKRXgS61KdFVJxMy4YnEM9et+DUhS2mIN9UI9jCeNd8ZQQlaGl4YDOsHSE3RBm7IpG8khRmht5QnA073qMDPWFy5lQLDwO9/FnjmuGBipaK8x+rekIT06xAy7wMpVQbaFrRExH0lIhw2wePgddnW3C3URx09VIbJcF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Remove the checks for the VMA lock being held, allowing the page fault path to call into the filesystem instead of retrying with the mmap_lock held. This will improve scalability for DAX page faults. Also update the documentation to match (and fix some other changes that have happened recently). Signed-off-by: Matthew Wilcox (Oracle) --- Documentation/filesystems/locking.rst | 36 +++++++++++++++++---------- Documentation/filesystems/porting.rst | 11 ++++++++ mm/memory.c | 22 ++-------------- 3 files changed, 36 insertions(+), 33 deletions(-) diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index ab64356eff1a..7be2900806c8 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -635,26 +635,29 @@ vm_operations_struct prototypes:: - void (*open)(struct vm_area_struct*); - void (*close)(struct vm_area_struct*); - vm_fault_t (*fault)(struct vm_area_struct*, struct vm_fault *); + void (*open)(struct vm_area_struct *); + void (*close)(struct vm_area_struct *); + vm_fault_t (*fault)(struct vm_fault *); + vm_fault_t (*huge_fault)(struct vm_fault *, unsigned int order); + vm_fault_t (*map_pages)(struct vm_fault *, pgoff_t start, pgoff_t end); vm_fault_t (*page_mkwrite)(struct vm_area_struct *, struct vm_fault *); vm_fault_t (*pfn_mkwrite)(struct vm_area_struct *, struct vm_fault *); int (*access)(struct vm_area_struct *, unsigned long, void*, int, int); locking rules: -============= ========= =========================== +============= ========== =========================== ops mmap_lock PageLocked(page) -============= ========= =========================== -open: yes -close: yes -fault: yes can return with page locked -map_pages: read -page_mkwrite: yes can return with page locked -pfn_mkwrite: yes -access: yes -============= ========= =========================== +============= ========== =========================== +open: write +close: read/write +fault: read can return with page locked +huge_fault: maybe-read +map_pages: maybe-read +page_mkwrite: read can return with page locked +pfn_mkwrite: read +access: read +============= ========== =========================== ->fault() is called when a previously not present pte is about to be faulted in. The filesystem must find and return the page associated with the passed in @@ -664,6 +667,13 @@ then ensure the page is not already truncated (invalidate_lock will block subsequent truncate), and then return with VM_FAULT_LOCKED, and the page locked. The VM will unlock the page. +->huge_fault() is called when there is no PUD or PMD entry present. This +gives the filesystem the opportunity to install a PUD or PMD sized page. +Filesystems can also use the ->fault method to return a PMD sized page, +so implementing this function may not be necessary. In particular, +filesystems should not call filemap_fault() from ->huge_fault(). +The mmap_lock may not be held when this method is called. + ->map_pages() is called when VM asks to map easy accessible pages. Filesystem should find and map pages associated with offsets from "start_pgoff" till "end_pgoff". ->map_pages() is called with the RCU lock held and must diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst index 0f5da78ef4f9..98969d713e2e 100644 --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -938,3 +938,14 @@ file pointer instead of struct dentry pointer. d_tmpfile() is similarly changed to simplify callers. The passed file is in a non-open state and on success must be opened before returning (e.g. by calling finish_open_simple()). + +--- + +**mandatory** + +Calling convention for ->huge_fault has changed. It now takes a page +order instead of an enum page_entry_size, and it may be called without the +mmap_lock held. All in-tree users have been audited and do not seem to +depend on the mmap_lock being held, but out of tree users should verify +for themselves. If they do need it, they can return VM_FAULT_RETRY to +be called with the mmap_lock held. diff --git a/mm/memory.c b/mm/memory.c index 3b4aaa0d2fff..254ee9c0e8c4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4873,13 +4873,8 @@ static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf) struct vm_area_struct *vma = vmf->vma; if (vma_is_anonymous(vma)) return do_huge_pmd_anonymous_page(vmf); - if (vma->vm_ops->huge_fault) { - if (vmf->flags & FAULT_FLAG_VMA_LOCK) { - vma_end_read(vma); - return VM_FAULT_RETRY; - } + if (vma->vm_ops->huge_fault) return vma->vm_ops->huge_fault(vmf, PE_SIZE_PMD); - } return VM_FAULT_FALLBACK; } @@ -4899,10 +4894,6 @@ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf) if (vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) { if (vma->vm_ops->huge_fault) { - if (vmf->flags & FAULT_FLAG_VMA_LOCK) { - vma_end_read(vma); - return VM_FAULT_RETRY; - } ret = vma->vm_ops->huge_fault(vmf, PE_SIZE_PMD); if (!(ret & VM_FAULT_FALLBACK)) return ret; @@ -4923,13 +4914,8 @@ static vm_fault_t create_huge_pud(struct vm_fault *vmf) /* No support for anonymous transparent PUD pages yet */ if (vma_is_anonymous(vma)) return VM_FAULT_FALLBACK; - if (vma->vm_ops->huge_fault) { - if (vmf->flags & FAULT_FLAG_VMA_LOCK) { - vma_end_read(vma); - return VM_FAULT_RETRY; - } + if (vma->vm_ops->huge_fault) return vma->vm_ops->huge_fault(vmf, PE_SIZE_PUD); - } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ return VM_FAULT_FALLBACK; } @@ -4946,10 +4932,6 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud) goto split; if (vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) { if (vma->vm_ops->huge_fault) { - if (vmf->flags & FAULT_FLAG_VMA_LOCK) { - vma_end_read(vma); - return VM_FAULT_RETRY; - } ret = vma->vm_ops->huge_fault(vmf, PE_SIZE_PUD); if (!(ret & VM_FAULT_FALLBACK)) return ret; -- 2.40.1