From: gaoxu <gaoxu2@hihonor.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
yipengxiang <yipengxiang@hihonor.com>
Subject: 回复: [PATCH] mm,oom_reaper: avoid run queue_oom_reaper if task is not oom
Date: Fri, 24 Nov 2023 03:15:46 +0000 [thread overview]
Message-ID: <242025e9a8c84f6b96ba3f180ea01be9@hihonor.com> (raw)
In-Reply-To: <ZV8SenfRYnkKwqu6@tiehlicka>
On Thu, 24 Nov 2023 08:51 Michal Hocko <mhocko@suse.com> wrote:
> On Wed 22-11-23 12:46:44, gaoxu wrote:
>> The function queue_oom_reaper tests and sets tsk->signal->oom_mm->flags.
>> However, it is necessary to check if 'tsk' is an OOM victim before
>> executing 'queue_oom_reaper' because the variable may be NULL.
>>
>> We encountered such an issue, and the log is as follows:
>> [3701:11_see]Out of memory: Killed process 3154 (system_server)
>> total-vm:23662044kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB,
>> UID:1000 pgtables:4056kB oom_score_adj:-900
>
>> [3701:11_see][RB/E]rb_sreason_str_set: sreason_str set null_pointer
>> [3701:11_see][RB/E]rb_sreason_str_set: sreason_str set unknown_addr
>
> What are these?
This is a log message that we added ourselves.
>> [3701:11_see]Unable to handle kernel NULL pointer dereference at
>> virtual address 0000000000000328 [3701:11_see]user pgtable: 4k pages,
>> 39-bit VAs, pgdp=00000000821de000 [3701:11_see][0000000000000328]
>> pgd=0000000000000000,
>> p4d=0000000000000000,pud=0000000000000000
>> [3701:11_see]tracing off
>> [3701:11_see]Internal error: Oops: 96000005 [#1] PREEMPT SMP
>> [3701:11_see]Call trace:
>> [3701:11_see] queue_oom_reaper+0x30/0x170
>
> Could you resolve this offset into the code line please?
Due to the additional code we added for log purposes, the line numbers may not correspond to the original Linux code.
static void queue_oom_reaper(struct task_struct *tsk)
{
/* mm is already queued? */
if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags)) //a null pointer exception occurred
return;
...
}
>> [3701:11_see] __oom_kill_process+0x590/0x860 [3701:11_see]
>> oom_kill_process+0x140/0x274 [3701:11_see] out_of_memory+0x2f4/0x54c
>> [3701:11_see] __alloc_pages_slowpath+0x5d8/0xaac
>> [3701:11_see] __alloc_pages+0x774/0x800 [3701:11_see]
>> wp_page_copy+0xc4/0x116c [3701:11_see] do_wp_page+0x4bc/0x6fc
>> [3701:11_see] handle_pte_fault+0x98/0x2a8 [3701:11_see]
>> __handle_mm_fault+0x368/0x700 [3701:11_see]
>> do_handle_mm_fault+0x160/0x2cc [3701:11_see] do_page_fault+0x3e0/0x818
>> [3701:11_see] do_mem_abort+0x68/0x17c [3701:11_see] el0_da+0x3c/0xa0
>> [3701:11_see] el0t_64_sync_handler+0xc4/0xec [3701:11_see]
>> el0t_64_sync+0x1b4/0x1b8 [3701:11_see]tracing off
>>
>> Signed-off-by: Gao Xu <gaoxu2@hihonor.com>
>> ---
>> mm/oom_kill.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 9e6071fde..3754ab4b6
>> 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -984,7 +984,7 @@ static void __oom_kill_process(struct task_struct *victim, const char *message)
>> }
>> rcu_read_unlock();
>>
>> - if (can_oom_reap)
>> + if (can_oom_reap && tsk_is_oom_victim(victim))
>> queue_oom_reaper(victim);
>
> I do not understand. We always do send SIGKILL and call mark_oom_victim(victim); on victim task when reaching out here. How can tsk_is_oom_victim can ever be false?
This is a low-probability issue, as it only occurred once during the monkey testing.
I haven't been able to find the root cause either.
>>
>> mmdrop(mm);
>> --
>> 2.17.1
>>
>>
>
>--
> Michal Hocko
> SUSE Labs
next prev parent reply other threads:[~2023-11-24 3:15 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-22 12:46 gaoxu
2023-11-22 21:47 ` Andrew Morton
2023-11-24 2:52 ` 回复: " gaoxu
2023-11-24 9:33 ` Michal Hocko
2023-11-23 8:51 ` Michal Hocko
2023-11-24 3:15 ` gaoxu [this message]
2023-11-24 9:30 ` 回复: " Michal Hocko
2023-11-25 6:46 ` 回复: " gaoxu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=242025e9a8c84f6b96ba3f180ea01be9@hihonor.com \
--to=gaoxu2@hihonor.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=surenb@google.com \
--cc=yipengxiang@hihonor.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox