From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, linux-mm@kvack.org
Subject: [PATCH v2 2/3] mm: Allow ->huge_fault() to be called without the mmap_lock held
Date: Fri, 18 Aug 2023 21:23:34 +0100 [thread overview]
Message-ID: <20230818202335.2739663-3-willy@infradead.org> (raw)
In-Reply-To: <20230818202335.2739663-1-willy@infradead.org>
Remove the checks for the VMA lock being held, allowing the page
fault path to call into the filesystem instead of retrying with the
mmap_lock held. This will improve scalability for DAX page faults.
Also update the documentation to match (and fix some other changes that
have happened recently).
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
Documentation/filesystems/locking.rst | 36 +++++++++++++++++----------
Documentation/filesystems/porting.rst | 11 ++++++++
mm/memory.c | 22 ++--------------
3 files changed, 36 insertions(+), 33 deletions(-)
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index ab64356eff1a..7be2900806c8 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -635,26 +635,29 @@ vm_operations_struct
prototypes::
- void (*open)(struct vm_area_struct*);
- void (*close)(struct vm_area_struct*);
- vm_fault_t (*fault)(struct vm_area_struct*, struct vm_fault *);
+ void (*open)(struct vm_area_struct *);
+ void (*close)(struct vm_area_struct *);
+ vm_fault_t (*fault)(struct vm_fault *);
+ vm_fault_t (*huge_fault)(struct vm_fault *, unsigned int order);
+ vm_fault_t (*map_pages)(struct vm_fault *, pgoff_t start, pgoff_t end);
vm_fault_t (*page_mkwrite)(struct vm_area_struct *, struct vm_fault *);
vm_fault_t (*pfn_mkwrite)(struct vm_area_struct *, struct vm_fault *);
int (*access)(struct vm_area_struct *, unsigned long, void*, int, int);
locking rules:
-============= ========= ===========================
+============= ========== ===========================
ops mmap_lock PageLocked(page)
-============= ========= ===========================
-open: yes
-close: yes
-fault: yes can return with page locked
-map_pages: read
-page_mkwrite: yes can return with page locked
-pfn_mkwrite: yes
-access: yes
-============= ========= ===========================
+============= ========== ===========================
+open: write
+close: read/write
+fault: read can return with page locked
+huge_fault: maybe-read
+map_pages: maybe-read
+page_mkwrite: read can return with page locked
+pfn_mkwrite: read
+access: read
+============= ========== ===========================
->fault() is called when a previously not present pte is about to be faulted
in. The filesystem must find and return the page associated with the passed in
@@ -664,6 +667,13 @@ then ensure the page is not already truncated (invalidate_lock will block
subsequent truncate), and then return with VM_FAULT_LOCKED, and the page
locked. The VM will unlock the page.
+->huge_fault() is called when there is no PUD or PMD entry present. This
+gives the filesystem the opportunity to install a PUD or PMD sized page.
+Filesystems can also use the ->fault method to return a PMD sized page,
+so implementing this function may not be necessary. In particular,
+filesystems should not call filemap_fault() from ->huge_fault().
+The mmap_lock may not be held when this method is called.
+
->map_pages() is called when VM asks to map easy accessible pages.
Filesystem should find and map pages associated with offsets from "start_pgoff"
till "end_pgoff". ->map_pages() is called with the RCU lock held and must
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 0f5da78ef4f9..98969d713e2e 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -938,3 +938,14 @@ file pointer instead of struct dentry pointer. d_tmpfile() is similarly
changed to simplify callers. The passed file is in a non-open state and on
success must be opened before returning (e.g. by calling
finish_open_simple()).
+
+---
+
+**mandatory**
+
+Calling convention for ->huge_fault has changed. It now takes a page
+order instead of an enum page_entry_size, and it may be called without the
+mmap_lock held. All in-tree users have been audited and do not seem to
+depend on the mmap_lock being held, but out of tree users should verify
+for themselves. If they do need it, they can return VM_FAULT_RETRY to
+be called with the mmap_lock held.
diff --git a/mm/memory.c b/mm/memory.c
index 3b4aaa0d2fff..254ee9c0e8c4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4873,13 +4873,8 @@ static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)
struct vm_area_struct *vma = vmf->vma;
if (vma_is_anonymous(vma))
return do_huge_pmd_anonymous_page(vmf);
- if (vma->vm_ops->huge_fault) {
- if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
- vma_end_read(vma);
- return VM_FAULT_RETRY;
- }
+ if (vma->vm_ops->huge_fault)
return vma->vm_ops->huge_fault(vmf, PE_SIZE_PMD);
- }
return VM_FAULT_FALLBACK;
}
@@ -4899,10 +4894,6 @@ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf)
if (vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) {
if (vma->vm_ops->huge_fault) {
- if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
- vma_end_read(vma);
- return VM_FAULT_RETRY;
- }
ret = vma->vm_ops->huge_fault(vmf, PE_SIZE_PMD);
if (!(ret & VM_FAULT_FALLBACK))
return ret;
@@ -4923,13 +4914,8 @@ static vm_fault_t create_huge_pud(struct vm_fault *vmf)
/* No support for anonymous transparent PUD pages yet */
if (vma_is_anonymous(vma))
return VM_FAULT_FALLBACK;
- if (vma->vm_ops->huge_fault) {
- if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
- vma_end_read(vma);
- return VM_FAULT_RETRY;
- }
+ if (vma->vm_ops->huge_fault)
return vma->vm_ops->huge_fault(vmf, PE_SIZE_PUD);
- }
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
return VM_FAULT_FALLBACK;
}
@@ -4946,10 +4932,6 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud)
goto split;
if (vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) {
if (vma->vm_ops->huge_fault) {
- if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
- vma_end_read(vma);
- return VM_FAULT_RETRY;
- }
ret = vma->vm_ops->huge_fault(vmf, PE_SIZE_PUD);
if (!(ret & VM_FAULT_FALLBACK))
return ret;
--
2.40.1
next prev parent reply other threads:[~2023-08-18 20:23 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-18 20:23 [PATCH v2 0/3] Change calling convention for ->huge_fault Matthew Wilcox (Oracle)
2023-08-18 20:23 ` [PATCH v2 1/3] mm: Move PMD_ORDER to pgtable.h Matthew Wilcox (Oracle)
2023-08-18 20:23 ` Matthew Wilcox (Oracle) [this message]
2023-08-18 20:23 ` [PATCH v2 3/3] mm: Remove enum page_entry_size Matthew Wilcox (Oracle)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230818202335.2739663-3-willy@infradead.org \
--to=willy@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox