From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B29B1C433ED for ; Sat, 3 Apr 2021 00:44:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 22FC1611AB for ; Sat, 3 Apr 2021 00:44:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 22FC1611AB Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 704156B0074; Fri, 2 Apr 2021 20:44:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B3776B0075; Fri, 2 Apr 2021 20:44:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A1A66B0078; Fri, 2 Apr 2021 20:44:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3FCF16B0074 for ; Fri, 2 Apr 2021 20:44:51 -0400 (EDT) Received: from smtpin36.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id E9F94184793F1 for ; Sat, 3 Apr 2021 00:44:50 +0000 (UTC) X-FDA: 77989210740.36.F45A7F6 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf29.hostedemail.com (Postfix) with ESMTP id 4E1E012E for ; Sat, 3 Apr 2021 00:44:49 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id AAE4B610D0; Sat, 3 Apr 2021 00:44:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1617410687; bh=1Q7F1ojdeMzO59hw12eAM9A/b0MF0hsRE/tvKSlwQ4Q=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=uqsMJJGTVvznaZh8ne4y3j21MwL5iis0hsMjX4dlPmuWPHJnQqCNlL7uFoASBRbqn QlFagg+S1ZA+z0o5z9SZOSagE5/8nA17EkIMqBouPf012Grth7shpsFhS5NZAjVfVP f1LI3TDtCqhfgslHbqPAL6lFL3gOlPrn8rLvYxPo= Date: Fri, 2 Apr 2021 17:44:47 -0700 From: Andrew Morton To: Stillinux Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, liuzhengyuan@kylinos.cn, liuyun01@kylinos.cn, Johannes Weiner , Hugh Dickins Subject: Re: [RFC PATCH] mm/swap: fix system stuck due to infinite loop Message-Id: <20210402174447.2abccc77cdca5cad67756d55@linux-foundation.org> In-Reply-To: References: X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Stat-Signature: nom9fze6x1aupe651gntffyfij9ec5hn X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 4E1E012E Received-SPF: none (linux-foundation.org>: No applicable sender policy available) receiver=imf29; identity=mailfrom; envelope-from=""; helo=mail.kernel.org; client-ip=198.145.29.99 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617410689-223259 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 2 Apr 2021 15:03:37 +0800 Stillinux wrote: > In the case of high system memory and load pressure, we ran ltp test > and found that the system was stuck, the direct memory reclaim was > all stuck in io_schedule, the waiting request was stuck in the blk_plug > flow of one process, and this process fell into an infinite loop. > not do the action of brushing out the request. > > The call flow of this process is swap_cluster_readahead. > Use blk_start/finish_plug for blk_plug operation, > flow swap_cluster_readahead->__read_swap_cache_async->swapcache_prepare. > When swapcache_prepare return -EEXIST, it will fall into an infinite loop, > even if cond_resched is called, but according to the schedule, > sched_submit_work will be based on tsk->state, and will not flash out > the blk_plug request, so will hang io, causing the overall system hang. > > For the first time involving the swap part, there is no good way to fix > the problem from the fundamental problem. In order to solve the > engineering situation, we chose to make swap_cluster_readahead aware of > the memory pressure situation as soon as possible, and do io_schedule to > flush out the blk_plug request, thereby changing the allocation flag in > swap_readpage to GFP_NOIO , No longer do the memory reclaim of flush io. > Although system operating normally, but not the most fundamental way. > Thanks. I'm not understanding why swapcache_prepare() repeatedly returns -EEXIST in this situation? And how does the switch to GFP_NOIO fix this? Simply by avoiding direct reclaim altogether? > --- > mm/page_io.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/page_io.c b/mm/page_io.c > index c493ce9ebcf5..87392ffabb12 100644 > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -403,7 +403,7 @@ int swap_readpage(struct page *page, bool synchronous) > } > > ret = 0; > - bio = bio_alloc(GFP_KERNEL, 1); > + bio = bio_alloc(GFP_NOIO, 1); > bio_set_dev(bio, sis->bdev); > bio->bi_opf = REQ_OP_READ; > bio->bi_iter.bi_sector = swap_page_sector(page);