From: Michal Hocko <mhocko@suse.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: alexjlzheng@gmail.com,
"Eric W. Biederman" <ebiederm@xmission.com>,
akpm@linux-foundation.org, brauner@kernel.org, axboe@kernel.dk,
tandersen@netflix.com, willy@infradead.org, mjguzik@gmail.com,
alexjlzheng@tencent.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH v2] mm: optimize the redundant loop of mm_update_next_owner()
Date: Fri, 21 Jun 2024 10:50:10 +0200 [thread overview]
Message-ID: <ZnU-wlFE5usvo9ah@tiehlicka> (raw)
In-Reply-To: <20240620172958.GA2058@redhat.com>
On Thu 20-06-24 19:30:19, Oleg Nesterov wrote:
> Can't review, I forgot everything about mm_update_next_owner().
> So I am sorry for the noise I am going to add, feel free to ignore.
> Just in case, I see nothing wrong in this patch.
>
> On 06/20, alexjlzheng@gmail.com wrote:
> >
> > When mm_update_next_owner() is racing with swapoff (try_to_unuse()) or /proc or
> > ptrace or page migration (get_task_mm()), it is impossible to find an
> > appropriate task_struct in the loop whose mm_struct is the same as the target
> > mm_struct.
> >
> > If the above race condition is combined with the stress-ng-zombie and
> > stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in
> > write_lock_irq() for tasklist_lock.
> >
> > Recognize this situation in advance and exit early.
>
> But this patch won't help if (say) ptrace_access_vm() sleeps while
> for_each_process() tries to find another owner, right?
>
> > @@ -484,6 +484,8 @@ void mm_update_next_owner(struct mm_struct *mm)
> > * Search through everything else, we should not get here often.
> > */
> > for_each_process(g) {
> > + if (atomic_read(&mm->mm_users) <= 1)
> > + break;
>
> I think this deserves a comment to explain that this is optimization
> for the case we race with the pending mmput(). mm_update_next_owner()
> checks mm_users at the start.
>
> And. Can we drop tasklist and use rcu_read_lock() before for_each_process?
> Yes, this will probably need more changes even if possible...
>
>
> Or even better. Can't we finally kill mm_update_next_owner() and turn the
> ugly mm->owner into mm->mem_cgroup ?
Yes, dropping the mm->owner should be a way to go. Replacing that by
mem_cgroup sounds like an improvemnt. I have a vague recollection that
this has some traps on the way. E.g. tasks sharing the mm but living in
different cgroups. Things have changes since the last time I've checked
and for example memcg charge migration on task move will be deprecated
soon so chances are that there are less roadblocks on the way.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2024-06-21 8:50 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-20 15:27 alexjlzheng
2024-06-20 17:30 ` Oleg Nesterov
2024-06-21 8:50 ` Michal Hocko [this message]
2024-06-25 22:21 ` Andrew Morton
2024-06-26 6:43 ` Jinliang Zheng
2024-06-26 15:23 ` Oleg Nesterov
2024-06-27 7:44 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZnU-wlFE5usvo9ah@tiehlicka \
--to=mhocko@suse.com \
--cc=akpm@linux-foundation.org \
--cc=alexjlzheng@gmail.com \
--cc=alexjlzheng@tencent.com \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=ebiederm@xmission.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mjguzik@gmail.com \
--cc=oleg@redhat.com \
--cc=tandersen@netflix.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox