From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 166F6C433ED for ; Fri, 30 Apr 2021 05:34:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 83C4761452 for ; Fri, 30 Apr 2021 05:34:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 83C4761452 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B1EF06B006C; Fri, 30 Apr 2021 01:34:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ACE7C6B006E; Fri, 30 Apr 2021 01:34:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 96F176B0070; Fri, 30 Apr 2021 01:34:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0086.hostedemail.com [216.40.44.86]) by kanga.kvack.org (Postfix) with ESMTP id 7E84A6B006C for ; Fri, 30 Apr 2021 01:34:05 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 1ECDA8249980 for ; Fri, 30 Apr 2021 05:34:05 +0000 (UTC) X-FDA: 78087917250.02.7914347 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf20.hostedemail.com (Postfix) with ESMTP id CB24FFC for ; Fri, 30 Apr 2021 05:33:55 +0000 (UTC) IronPort-SDR: E1EEA7JAaIUyl7wd+JUNR8s6C1KCKMuV+2RAwKd5H/uqz4DNWg4o9KVL8JLgcpgctUBGSwDAps fIdi37J2mLfw== X-IronPort-AV: E=McAfee;i="6200,9189,9969"; a="258486550" X-IronPort-AV: E=Sophos;i="5.82,260,1613462400"; d="scan'208";a="258486550" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2021 22:34:02 -0700 IronPort-SDR: NThRSyfq+B2eXPSttFzjjTOpmSIxzk5cZuVRR5CwVID8dBXoTNMHFbMNSYc012rlcMg3XSvyjX 5KwPkxazOq+w== X-IronPort-AV: E=Sophos;i="5.82,260,1613462400"; d="scan'208";a="424694985" Received: from xingzhen-mobl.ccr.corp.intel.com (HELO [10.238.4.46]) ([10.238.4.46]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2021 22:33:59 -0700 Subject: Re: [RFC] mm/vmscan.c: avoid possible long latency caused by too_many_isolated() To: Hillf Danton Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ying.huang@intel.com, tim.c.chen@linux.intel.com, Shakeel Butt , Michal Hocko , yuzhao@google.com, wfg@mail.ustc.edu.cn References: <20210416023536.168632-1-zhengjun.xing@linux.intel.com> <20210422102325.1332-1-hdanton@sina.com> From: Xing Zhengjun Message-ID: <9795a050-12a4-55c6-13e1-969cd4bbf795@linux.intel.com> Date: Fri, 30 Apr 2021 13:33:57 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: <20210422102325.1332-1-hdanton@sina.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: CB24FFC Authentication-Results: imf20.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none); spf=none (imf20.hostedemail.com: domain of zhengjun.xing@linux.intel.com has no SPF policy when checking 134.134.136.31) smtp.mailfrom=zhengjun.xing@linux.intel.com X-Stat-Signature: kqpdn8i3fhk8zs9hsejfhqiontgno4b3 Received-SPF: none (linux.intel.com>: No applicable sender policy available) receiver=imf20; identity=mailfrom; envelope-from=""; helo=mga06.intel.com; client-ip=134.134.136.31 X-HE-DKIM-Result: none/none X-HE-Tag: 1619760835-540761 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Hillf, On 4/22/2021 6:23 PM, Hillf Danton wrote: > Hi Zhengjun >=20 > On Thu, 22 Apr 2021 16:36:19 +0800 Zhengjun Xing wrote: >> In the system with very few file pages (nr_active_file + >> nr_inactive_file < 100), it is easy to reproduce "nr_isolated_file > >> nr_inactive_file", then too_many_isolated return true, >> shrink_inactive_list enter "msleep(100)", the long latency will happen= . >=20 > We should skip reclaiming page cache in this case. >> >> The test case to reproduce it is very simple: allocate many huge >> pages(near the DRAM size), then do free, repeat the same operation man= y >> times. >> In the test case, the system with very few file pages (nr_active_file = + >> nr_inactive_file < 100), I have dumpped the numbers of >> active/inactive/isolated file pages during the whole test(see in the >> attachments) , in shrink_inactive_list "too_many_isolated" is very eas= y >> to return true, then enter "msleep(100)",in "too_many_isolated" >> sc->gfp_mask is 0x342cca ("_GFP_IO" and "__GFP_FS" is masked) , it is >> also very easy to enter =E2=80=9Cinactive >>=3D3=E2=80=9D, then =E2=80= =9Cisolated > inactive=E2=80=9D will >> be true. >> >> So I have a proposal to set a threshold number for the total file pag= es >> to ignore the system with very few file pages, and then bypass the 100= ms >> sleep. >> It is hard to set a perfect number for the threshold, so I just give a= n >> example of "256" for it. >=20 > Another option seems like we take a nap at the second time of lru tmi > with some allocators in your case served without the 100ms delay. >=20 > +++ x/mm/vmscan.c > @@ -118,6 +118,9 @@ struct scan_control { > /* The file pages on the current node are dangerously low */ > unsigned int file_is_tiny:1; > =20 > + unsigned int file_tmi:1; /* too many isolated */ > + unsigned int anon_tmi:1; > + > /* Allocation order */ > s8 order; > =20 > @@ -1905,6 +1908,21 @@ static int current_may_throttle(void) > bdi_write_congested(current->backing_dev_info); > } > =20 > +static void update_sc_tmi(struct scan_control *sc, bool file, int set) > +{ > + if (file) > + sc->file_tmi =3D set; > + else > + sc->anon_tmi =3D set; > +} > +static bool is_sc_tmi(struct scan_control *sc, bool file) > +{ > + if (file) > + return sc->file_tmi !=3D 0; > + else > + return sc->anon_tmi !=3D 0; > +} > + > /* > * shrink_inactive_list() is a helper for shrink_node(). It returns = the number > * of reclaimed pages > @@ -1927,6 +1945,11 @@ shrink_inactive_list(unsigned long nr_to > if (stalled) > return 0; > =20 > + if (!is_sc_tmi(sc, file)) { > + update_sc_tmi(sc, file, 1); > + return 0; > + } > + > /* wait a bit for the reclaimer. */ > msleep(100); > stalled =3D true; > @@ -1936,6 +1959,9 @@ shrink_inactive_list(unsigned long nr_to > return SWAP_CLUSTER_MAX; > } > =20 > + if (is_sc_tmi(sc, file)) > + update_sc_tmi(sc, file, 0); > + > lru_add_drain(); > =20 > spin_lock_irq(&lruvec->lru_lock); >=20 I use my compaction test case to test it, 1/10 ratio can reproduce 100ms=20 sleep. 60) @ 103942.6 us | shrink_node(); 60) @ 103795.8 us | shrink_node(); --=20 Zhengjun Xing