From: Suren Baghdasaryan <surenb@google.com>
To: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Cc: akpm@linux-foundation.org, willy@infradead.org, david@kernel.org,
ziy@nvidia.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, lorenzo.stoakes@oracle.com,
baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com,
npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
baohua@kernel.org, lance.yang@linux.dev, vbabka@suse.cz,
jannh@google.com, rppt@kernel.org, mhocko@suse.com,
pfalcato@suse.de, kees@kernel.org, maddy@linux.ibm.com,
npiggin@gmail.com, mpe@ellerman.id.au, chleroy@kernel.org,
borntraeger@linux.ibm.com, frankja@linux.ibm.com,
imbrenda@linux.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com,
agordeev@linux.ibm.com, svens@linux.ibm.com,
gerald.schaefer@linux.ibm.com, linux-mm@kvack.org,
linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org
Subject: Re: [PATCH v4 4/4] mm: use vma_start_write_killable() in process_vma_walk_lock()
Date: Mon, 23 Mar 2026 12:29:14 -0700 [thread overview]
Message-ID: <CAJuCfpGk3sTw2_owiRZM=V2e0UGDad5MrDbCsPbY3GUvT4u63A@mail.gmail.com> (raw)
In-Reply-To: <1c44dd0b-4f5a-4fc1-983f-f728b31c9e4d@lucifer.local>
On Mon, Mar 23, 2026 at 11:04 AM Lorenzo Stoakes (Oracle)
<ljs@kernel.org> wrote:
>
> On Sat, Mar 21, 2026 at 10:43:08PM -0700, Suren Baghdasaryan wrote:
> > Replace vma_start_write() with vma_start_write_killable() when
> > process_vma_walk_lock() is used with PGWALK_WRLOCK option.
> > Adjust its direct and indirect users to check for a possible error
> > and handle it. Ensure users handle EINTR correctly and do not ignore
> > it.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > ---
> > fs/proc/task_mmu.c | 5 ++++-
> > mm/mempolicy.c | 1 +
> > mm/pagewalk.c | 20 ++++++++++++++------
> > 3 files changed, 19 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index e091931d7ca1..2fe3d11aad03 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -1797,6 +1797,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
> > struct clear_refs_private cp = {
> > .type = type,
> > };
> > + int err;
>
> Maybe better to make it a ssize_t given return type of function?
Ack.
>
> >
> > if (mmap_write_lock_killable(mm)) {
> > count = -EINTR;
> > @@ -1824,7 +1825,9 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
> > 0, mm, 0, -1UL);
> > mmu_notifier_invalidate_range_start(&range);
> > }
> > - walk_page_range(mm, 0, -1, &clear_refs_walk_ops, &cp);
> > + err = walk_page_range(mm, 0, -1, &clear_refs_walk_ops, &cp);
> > + if (err)
> > + count = err;
>
> Hmm this is gross, but it's an established pattern here, ugh.
>
> Now we have an err though, could we update:
>
> if (mmap_write_lock_killable(mm)) {
> - count = -EINTR;
> + err = -EINTR;
> goto out_mm;
> }
>
> Then we can just do:
>
> + err = walk_page_range(mm, 0, -1, &clear_refs_walk_ops, &cp);
>
> And at the end do:
>
> return err ?: count;
>
> Which possibly _necessitates_ err being a ssize_t.
Sounds doable. Let me try that.
>
> > if (type == CLEAR_REFS_SOFT_DIRTY) {
> > mmu_notifier_invalidate_range_end(&range);
> > flush_tlb_mm(mm);
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 929e843543cf..bb5b0e83ce0f 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -969,6 +969,7 @@ static const struct mm_walk_ops queue_pages_lock_vma_walk_ops = {
> > * (a hugetlbfs page or a transparent huge page being counted as 1).
> > * -EIO - a misplaced page found, when MPOL_MF_STRICT specified without MOVEs.
> > * -EFAULT - a hole in the memory range, when MPOL_MF_DISCONTIG_OK unspecified.
> > + * -EINTR - walk got terminated due to pending fatal signal.
>
> Thanks!
>
> > */
> > static long
> > queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end,
> > diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> > index eda74273c8ec..a42cd6a6d812 100644
> > --- a/mm/pagewalk.c
> > +++ b/mm/pagewalk.c
> > @@ -438,14 +438,13 @@ static inline void process_mm_walk_lock(struct mm_struct *mm,
> > mmap_assert_write_locked(mm);
> > }
> >
> > -static inline void process_vma_walk_lock(struct vm_area_struct *vma,
> > +static inline int process_vma_walk_lock(struct vm_area_struct *vma,
>
> NIT: Don't need this to be an inline any longer. May as well fix up while we're
> here.
Ack.
>
> > enum page_walk_lock walk_lock)
> > {
> > #ifdef CONFIG_PER_VMA_LOCK
> > switch (walk_lock) {
> > case PGWALK_WRLOCK:
> > - vma_start_write(vma);
> > - break;
> > + return vma_start_write_killable(vma);
>
> LGTM
>
> > case PGWALK_WRLOCK_VERIFY:
> > vma_assert_write_locked(vma);
> > break;
> > @@ -457,6 +456,7 @@ static inline void process_vma_walk_lock(struct vm_area_struct *vma,
> > break;
> > }
> > #endif
> > + return 0;
> > }
> >
> > /*
> > @@ -500,7 +500,9 @@ int walk_page_range_mm_unsafe(struct mm_struct *mm, unsigned long start,
> > if (ops->pte_hole)
> > err = ops->pte_hole(start, next, -1, &walk);
> > } else { /* inside vma */
> > - process_vma_walk_lock(vma, ops->walk_lock);
> > + err = process_vma_walk_lock(vma, ops->walk_lock);
> > + if (err)
> > + break;
>
> In every other case we set walk.vma = vma or NULL. Is it a problem not setting
> it at all in this code path?
IIUC the other cases set walk.vma because they later call
ops->pte_hole(..., walk). In our case we immediately break out of the
loop and exit the function, which pushes "walk" variable out of scope.
So, walk.vma won't be used and setting it would achieve nothing.
>
> > walk.vma = vma;
> > next = min(end, vma->vm_end);
> > vma = find_vma(mm, vma->vm_end);
> > @@ -717,6 +719,7 @@ int walk_page_range_vma_unsafe(struct vm_area_struct *vma, unsigned long start,
> > .vma = vma,
> > .private = private,
> > };
> > + int err;
> >
> > if (start >= end || !walk.mm)
> > return -EINVAL;
> > @@ -724,7 +727,9 @@ int walk_page_range_vma_unsafe(struct vm_area_struct *vma, unsigned long start,
> > return -EINVAL;
> >
> > process_mm_walk_lock(walk.mm, ops->walk_lock);
> > - process_vma_walk_lock(vma, ops->walk_lock);
> > + err = process_vma_walk_lock(vma, ops->walk_lock);
> > + if (err)
> > + return err;
>
> LGTM
>
> > return __walk_page_range(start, end, &walk);
> > }
> >
> > @@ -747,6 +752,7 @@ int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *ops,
> > .vma = vma,
> > .private = private,
> > };
> > + int err;
> >
> > if (!walk.mm)
> > return -EINVAL;
> > @@ -754,7 +760,9 @@ int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *ops,
> > return -EINVAL;
> >
> > process_mm_walk_lock(walk.mm, ops->walk_lock);
> > - process_vma_walk_lock(vma, ops->walk_lock);
> > + err = process_vma_walk_lock(vma, ops->walk_lock);
> > + if (err)
> > + return err;
>
> LGTM
>
> > return __walk_page_range(vma->vm_start, vma->vm_end, &walk);
> > }
> >
> > --
> > 2.53.0.1018.g2bb0e51243-goog
> >
>
> Thanks, Lorenzo
next prev parent reply other threads:[~2026-03-23 19:29 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-22 5:43 [PATCH v4 0/4] Use killable vma write locking in most places Suren Baghdasaryan
2026-03-22 5:43 ` [PATCH v4 1/4] mm/vma: cleanup error handling path in vma_expand() Suren Baghdasaryan
2026-03-22 7:49 ` Barry Song
2026-03-22 5:43 ` [PATCH v4 2/4] mm: replace vma_start_write() with vma_start_write_killable() Suren Baghdasaryan
2026-03-23 17:49 ` Lorenzo Stoakes (Oracle)
2026-03-23 19:22 ` Suren Baghdasaryan
2026-03-22 5:43 ` [PATCH v4 3/4] KVM: s390: avoid kvm_s390_handle_pv() error overwrite Suren Baghdasaryan
2026-03-23 4:41 ` Suren Baghdasaryan
2026-03-23 17:53 ` Lorenzo Stoakes (Oracle)
2026-03-23 18:48 ` Suren Baghdasaryan
2026-03-22 5:43 ` [PATCH v4 4/4] mm: use vma_start_write_killable() in process_vma_walk_lock() Suren Baghdasaryan
2026-03-23 18:04 ` Lorenzo Stoakes (Oracle)
2026-03-23 19:29 ` Suren Baghdasaryan [this message]
2026-03-22 16:17 ` [PATCH v4 0/4] Use killable vma write locking in most places Andrew Morton
2026-03-23 4:29 ` Suren Baghdasaryan
2026-03-24 15:59 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJuCfpGk3sTw2_owiRZM=V2e0UGDad5MrDbCsPbY3GUvT4u63A@mail.gmail.com' \
--to=surenb@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=borntraeger@linux.ibm.com \
--cc=byungchul@sk.com \
--cc=chleroy@kernel.org \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=frankja@linux.ibm.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=gourry@gourry.net \
--cc=hca@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=jannh@google.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kees@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=lance.yang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=ljs@kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=maddy@linux.ibm.com \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=mpe@ellerman.id.au \
--cc=npache@redhat.com \
--cc=npiggin@gmail.com \
--cc=pfalcato@suse.de \
--cc=rakie.kim@sk.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=svens@linux.ibm.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=ying.huang@linux.alibaba.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox