From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=JbZq=OK=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.3 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,
	NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 05419C433EF
	for <linux-mm@archiver.kernel.org>; Mon, 20 Sep 2021 10:59:44 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id A8A3760F56
	for <linux-mm@archiver.kernel.org>; Mon, 20 Sep 2021 10:59:43 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A8A3760F56
Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=virtuozzo.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org
Received: by kanga.kvack.org (Postfix)
	id 29C83900003; Mon, 20 Sep 2021 06:59:43 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 24CE6900002; Mon, 20 Sep 2021 06:59:43 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 13B58900003; Mon, 20 Sep 2021 06:59:43 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0096.hostedemail.com [216.40.44.96])
	by kanga.kvack.org (Postfix) with ESMTP id 069B6900002
	for <linux-mm@kvack.org>; Mon, 20 Sep 2021 06:59:43 -0400 (EDT)
Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id 26B0916476
	for <linux-mm@kvack.org>; Mon, 20 Sep 2021 10:59:42 +0000 (UTC)
X-FDA: 78607656204.01.F950AE7
Received: from relay.sw.ru (relay.sw.ru [185.231.240.75])
	by imf26.hostedemail.com (Postfix) with ESMTP id 8A4D620019F9
	for <linux-mm@kvack.org>; Mon, 20 Sep 2021 10:59:41 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=virtuozzo.com; s=relay; h=Content-Type:MIME-Version:Date:Message-ID:From:
	Subject; bh=fcfmlYi3oAy4mMPZc/zpifYS2k4Mu/wKqEuxn9NCLX4=; b=EhkzUTTiPeP0y6QIE
	tS5yZapeEnmboz9k8rcjYQVLemv3ZB7BlHR7WxOxSQlcuirn39nyxvtPU6OpMxggj32Oxd8GkTELH
	iBig6CzhI8Yf4h49HvGp/nyMcA4lKhB3UwLQtv2FemHd4TnAPw9+ItZfEPuTGGNEldqb4uv1Nay6o
	=;
Received: from [10.93.0.56]
	by relay.sw.ru with esmtp (Exim 4.94.2)
	(envelope-from <vvs@virtuozzo.com>)
	id 1mSH1M-002ZlW-7i; Mon, 20 Sep 2021 13:59:36 +0300
Subject: Re: [PATCH mm] vmalloc: back off when the current task is OOM-killed
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
 Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>, Johannes Weiner <hannes@cmpxchg.org>,
 Vladimir Davydov <vdavydov.dev@gmail.com>, cgroups@vger.kernel.org,
 linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel@openvz.org,
 "Uladzislau Rezki (Sony)" <urezki@gmail.com>
References: <YT8PEBbYZhLixEJD@dhcp22.suse.cz>
 <d07a5540-3e07-44ba-1e59-067500f024d9@virtuozzo.com>
 <20210919163126.431674722b8db218453dc18c@linux-foundation.org>
 <bb5616b0-faa6-e12a-102b-b9c402e27ec1@i-love.sakura.ne.jp>
From: Vasily Averin <vvs@virtuozzo.com>
Message-ID: <c9d43874-138e-54a9-3222-a08c269eeeb5@virtuozzo.com>
Date: Mon, 20 Sep 2021 13:59:35 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.13.0
MIME-Version: 1.0
In-Reply-To: <bb5616b0-faa6-e12a-102b-b9c402e27ec1@i-love.sakura.ne.jp>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Stat-Signature: bfuidhswrzfet4abfy1j8b5yfm3c4czg
Authentication-Results: imf26.hostedemail.com;
	dkim=pass header.d=virtuozzo.com header.s=relay header.b=EhkzUTTi;
	dmarc=pass (policy=quarantine) header.from=virtuozzo.com;
	spf=pass (imf26.hostedemail.com: domain of vvs@virtuozzo.com designates 185.231.240.75 as permitted sender) smtp.mailfrom=vvs@virtuozzo.com
X-Rspamd-Server: rspam03
X-Rspamd-Queue-Id: 8A4D620019F9
X-HE-Tag: 1632135581-440559
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On 9/20/21 4:22 AM, Tetsuo Handa wrote:
> On 2021/09/20 8:31, Andrew Morton wrote:
>> On Fri, 17 Sep 2021 11:06:49 +0300 Vasily Averin <vvs@virtuozzo.com> wrote:
>>
>>> Huge vmalloc allocation on heavy loaded node can lead to a global
>>> memory shortage. A task called vmalloc can have the worst badness
>>> and be chosen by OOM-killer, however received fatal signal and
>>> oom victim mark does not interrupt allocation cycle. Vmalloc will
>>> continue allocating pages over and over again, exacerbating the crisis
>>> and consuming the memory freed up by another killed tasks.
>>>
>>> This patch allows OOM-killer to break vmalloc cycle, makes OOM more
>>> effective and avoid host panic.
>>>
>>> Unfortunately it is not 100% safe. Previous attempt to break vmalloc
>>> cycle was reverted by commit b8c8a338f75e ("Revert "vmalloc: back off when
>>> the current task is killed"") due to some vmalloc callers did not handled
>>> failures properly. Found issues was resolved, however, there may
>>> be other similar places.
>>
>> Well that was lame of us.
>>
>> I believe that at least one of the kernel testbots can utilize fault
>> injection.  If we were to wire up vmalloc (as we have done with slab
>> and pagealloc) then this will help to locate such buggy vmalloc callers.

Andrew, could you please clarify how we can do it?
Do you mean we can use exsiting allocation fault injection infrastructure to trigger
such kind of issues? Unfortunately I found no ways to reach this goal.
It  allows to emulate single faults with small probability, however it is not enough,
we need to completely disable all vmalloc allocations. 
I've tried to extend fault injection infrastructure however found that it is not trivial.

That's why I've added direct fatal_signal_pending() check into my patch.
 
> __alloc_pages_bulk() has three callers.
> 
>   alloc_pages_bulk_list() => No in-tree users.
> 
>   alloc_pages_bulk_array() => Used by xfs_buf_alloc_pages(), __page_pool_alloc_pages_slow(), svc_alloc_arg().
> 
>     xfs_buf_alloc_pages() => Might retry forever until all pages are allocated (i.e. effectively __GFP_NOFAIL). This patch can cause infinite loop problem.

You are right, I've missed it.
However __alloc_pages_bulk() can return no new pages without my patch too:
- due to fault injection inside  prepare_alloc_pages()
- if __rmqueue_pcplist() returns NULL and if array already had some assigned pages,
- if both __rmqueue_pcplist() and following __alloc_pages(0) cannot get any page.
On the other hand I cannot say that it is 100% xfs-related issue, it looks strange
but they have some chance to get page after few attemps.

So I think I can change 'break' to 'goto failed_irq', call __alloc_pages(0) and
return 1 page. It seems is handled correctly in all callers too.

>     __page_pool_alloc_pages_slow() => Will not retry if allocation failed. This patch might help.
> 
>     svc_alloc_arg() => Will not retry if signal pending. This patch might help only if allocating a lot of pages.
> 
>   alloc_pages_bulk_array_node() => Used by vm_area_alloc_pages().
> 
> vm_area_alloc_pages() => Used by __vmalloc_area_node() from __vmalloc_node_range() from vmalloc functions. Needs !__GFP_NOFAIL check?

Comments in description of __vmalloc_node() and kvmalloc() claim that __GFP_NOFAIL is not supported,
I did not found any other callers used this flag.