From: Nico Pache <npache@redhat.com>
To: Waiman Long <longman@redhat.com>, Michal Hocko <mhocko@suse.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
akpm@linux-foundation.org, jsavitz@redhat.com,
peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com,
dvhart@infradead.org, dave@stgolabs.net,
andrealmeid@collabora.com
Subject: Re: [PATCH v3] mm/oom: do not oom reap task with an unresolved robust futex
Date: Mon, 17 Jan 2022 17:56:28 -0500 [thread overview]
Message-ID: <43a6c470-9fc2-6195-9a25-5321d17540e5@redhat.com> (raw)
In-Reply-To: <ad639326-bea8-9bfb-23e3-4e2b216d9645@redhat.com>
On 1/17/22 11:05, Waiman Long wrote:
> On 1/17/22 03:52, Michal Hocko wrote:
>> On Fri 14-01-22 13:01:35, Nico Pache wrote:
>>> In the case that two or more processes share a futex located within
>>> a shared mmaped region, such as a process that shares a lock between
>>> itself and child processes, we have observed that when a process holding
>>> the lock is oom killed, at least one waiter is never alerted to this new
>>> development and simply continues to wait.
>>>
>>> This is visible via pthreads by checking the __owner field of the
>>> pthread_mutex_t structure within a waiting process, perhaps with gdb.
>>>
>>> We identify reproduction of this issue by checking a waiting process of
>>> a test program and viewing the contents of the pthread_mutex_t, taking note
>>> of the value in the owner field, and then checking dmesg to see if the
>>> owner has already been killed.
>> I believe we really need to find out why the original holder of the
>> futex is not woken up to release the lock when exiting.
>
> For a robust futex lock holder or waiter that is to be killed, it is not the
> responsibility of the task itself to wake up and release the lock. It is the
> kernel that recognizes that the task is holding or waiting for the robust futex
> and clean thing up.
>
>
>>> As mentioned by Michal in his patchset introducing the oom reaper,
>>> commit aac4536355496 ("mm, oom: introduce oom reaper"), the purpose of the
>>> oom reaper is to try and free memory more quickly; however, In the case
>>> that a robust futex is being used, we want to avoid utilizing the
>>> concurrent oom reaper. This is due to a race that can occur between the
>>> SIGKILL handling the robust futex, and the oom reaper freeing the memory
>>> needed to maintain the robust list.
>> OOM reaper is only unmapping private memory. It doesn't touch a shared
>> mappings. So how could it interfere?
>>
> The futex itself may be in shared memory, however the robust list entry can be
> in private memory. So when the robust list is being scanned in this case, we can
> be in a use-after-free situation.
I believe this is true. The userspace allocation for the pthread occurs as a
private mapping:
https://elixir.bootlin.com/glibc/latest/source/nptl/allocatestack.c#L368
>>> In the case that the oom victim is utilizing a robust futex, and the
>>> SIGKILL has not yet handled the futex death, the tsk->robust_list should
>>> be non-NULL. This issue can be tricky to reproduce, but with the
>>> modifications of this patch, we have found it to be impossible to
>>> reproduce.
>> We really need a deeper analysis of the udnerlying problem because right
>> now I do not really see why the oom reaper should interfere with shared
>> futex.
> As I said above, the robust list processing can involve private memory.
>>
>>> Add a check for tsk->robust_list is non-NULL in wake_oom_reaper() to return
>>> early and prevent waking the oom reaper.
>>>
>>> Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer
>>>
>>> Co-developed-by: Joel Savitz <jsavitz@redhat.com>
>>> Signed-off-by: Joel Savitz <jsavitz@redhat.com>
>>> Signed-off-by: Nico Pache <npache@redhat.com>
>>> ---
>>> mm/oom_kill.c | 15 +++++++++++++++
>>> 1 file changed, 15 insertions(+)
>>>
>>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>>> index 1ddabefcfb5a..3cdaac9c7de5 100644
>>> --- a/mm/oom_kill.c
>>> +++ b/mm/oom_kill.c
>>> @@ -667,6 +667,21 @@ static void wake_oom_reaper(struct task_struct *tsk)
>>> if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags))
>>> return;
>>> +#ifdef CONFIG_FUTEX
>>> + /*
>>> + * If the ooming task's SIGKILL has not finished handling the
>>> + * robust futex it is not correct to reap the mm concurrently.
>>> + * Do not wake the oom reaper when the task still contains a
>>> + * robust list.
>>> + */
>>> + if (tsk->robust_list)
>>> + return;
>>> +#ifdef CONFIG_COMPAT
>>> + if (tsk->compat_robust_list)
>>> + return;
>>> +#endif
>>> +#endif
>> If this turns out to be really needed, which I do not really see at the
>> moment, then this is not the right way to handle this situation. The oom
>> victim could get stuck and the oom killer wouldn't be able to move
>> forward. If anything the victim would need to get MMF_OOM_SKIP set.
I will try this, but I don't immediately see any difference between this return
case and setting the bit, passing the oom_reaper_list, then skipping it based on
the flag. Do you mind explaining how this could lead to the oom killer getting
stuck?
Cheers,
-- Nico
>
> There can be other way to do that, but letting the normal kill signal processing
> finishing its job and properly invoke futex_cleanup() is certainly one possible
> solution.
>
> Cheers,
> Longman
>
next prev parent reply other threads:[~2022-01-17 22:56 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-14 18:01 Nico Pache
2022-01-17 8:52 ` Michal Hocko
2022-01-17 16:05 ` Waiman Long
2022-01-17 22:56 ` Nico Pache [this message]
2022-01-18 8:51 ` Michal Hocko
2022-02-14 20:39 ` Nico Pache
2022-03-02 14:24 ` Michal Hocko
2022-03-02 17:26 ` Nico Pache
2022-03-03 7:48 ` Michal Hocko
2022-03-09 0:24 ` Nico Pache
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43a6c470-9fc2-6195-9a25-5321d17540e5@redhat.com \
--to=npache@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andrealmeid@collabora.com \
--cc=dave@stgolabs.net \
--cc=dvhart@infradead.org \
--cc=jsavitz@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox