From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B189C4167B for ; Thu, 30 Nov 2023 08:09:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 954C26B041D; Thu, 30 Nov 2023 03:09:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9051B6B041E; Thu, 30 Nov 2023 03:09:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7805C6B0448; Thu, 30 Nov 2023 03:09:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 675D76B041D for ; Thu, 30 Nov 2023 03:09:58 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2AB741C0048 for ; Thu, 30 Nov 2023 08:09:58 +0000 (UTC) X-FDA: 81513897276.13.170541B Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.88]) by imf28.hostedemail.com (Postfix) with ESMTP id CE09FC0019 for ; Thu, 30 Nov 2023 08:09:54 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=aIq8M6nS; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701331795; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MuR/FawMj/EptK3t2OJoiIep8fpRGbQvqrY6od7WitM=; b=RCTTlrXTsS9f3/IpGgdgeZzObid6UjljCsK/c7CJPdpQ88DJ3TxgpfSF+uz0fDaYzN4i68 r0Vw7C2wR8kehApof0t8iln2NbmnqRU1sfhrigbAbVrVWi9ID6TJIcwnXSpEq6mVYSNyok 8by+GONHj5iHOOTVryke6jwJAyreONc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701331795; a=rsa-sha256; cv=none; b=c3cB1w7gv0SlPU67XY7rejjz2mJ3qzRUt98FuJavKjdQ14e/BAjrxlDMvUB8K2h7qcYAql 50fhH4isf5uC5z/h8d0z3r7Mnz7KZW+VulGAUu9U83qgs3b8UFpdI6bIRXFUeZv4LLyWOM wlqDyPsZNM8CgOAZ5B5k7EUy90x6J8I= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=aIq8M6nS; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701331794; x=1732867794; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=PKVoZ9djdZtpGvrMsD7HbjOpC4RtvE9RcR4+08QTQtI=; b=aIq8M6nSMrWLyQHAiVAmnPG+Uos0vuhI6znj6+01WH1X3BV0ylas0D7L R85Kxi0ePyCpLIzaCIVkljiCaEYahKK0ArslnF54VlmmJ/cgK0pyi8ges HGJmx31B9NYSWz5trqtRGMtMlxGmrFu3bDqmUBd68lS+6uykl9qZWlba0 Jw2KrJ1gIPpn1ciVA+JzZbrJQkRt/8m6HwTj5evjxIcjKlQUy9hXKae17 aBHsa/DfNm7TJZtXT6GCIF9C5tsjt4wk7qt3/yeKfhy0w/A2PeqKz2/z/ QEnXm2pLRZhDXmBZNwQl+z4XC6A+EzUCNwEP4pyyruPHwcW9gU029qe+H g==; X-IronPort-AV: E=McAfee;i="6600,9927,10909"; a="424437361" X-IronPort-AV: E=Sophos;i="6.04,237,1695711600"; d="scan'208";a="424437361" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2023 00:09:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10909"; a="798199887" X-IronPort-AV: E=Sophos;i="6.04,237,1695711600"; d="scan'208";a="798199887" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2023 00:09:49 -0800 From: "Huang, Ying" To: Michal Hocko Cc: Yosry Ahmed , Minchan Kim , Chris Li , Liu Shixin , Yu Zhao , Andrew Morton , Sachin Sant , Johannes Weiner , Kefeng Wang , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v10] mm: vmscan: try to reclaim swapcache pages if no swap space In-Reply-To: (Michal Hocko's message of "Wed, 29 Nov 2023 11:22:23 +0100") References: <87msv58068.fsf@yhuang6-desk2.ccr.corp.intel.com> <87h6l77wl5.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkbf7gz6.fsf@yhuang6-desk2.ccr.corp.intel.com> <87msuy5zuv.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Thu, 30 Nov 2023 16:07:48 +0800 Message-ID: <875y1j4qaz.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: CE09FC0019 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: giefy9ywg7orobt8mi7meooybnei51e6 X-HE-Tag: 1701331794-427780 X-HE-Meta: U2FsdGVkX1/jaOl8j6AVX5c8Pq8wx7CdQyPDi9d+BSia8sGRrUzuTbKWkhhSJUpMhu3LLxwjrxoZswWD/TdfT2ospd7D7VXLg4YAgp3x7HrOUHQgNXtw/qD0oEptEbJmxtl9s2c510Xp19MIIxJ0VFlIgAcqAsJnFYngnMVbZU4WkJ8J8PlK2dn55zZ+wBFeJDawP+rUhUms9VA3yMmOZ1m6frwRl4oOPfnAd3R0uJEOGgVjMqko0rrwnL9OCaWKqDDL9Y59qoLYWYaB8tRselq2vSYClELldOLiPWZgMPn5K6TwEnjjtkIqR4C8118Hw61Z0dDazIoVvwbc3uq4DBlIWVOT3U4GKyuYjjgP/dCNmM7ymG0I4xEtnPzycq9YjttVzFVPz+PGmggX9cALvWc6p6Veoj5lbvv6Vq3cQwZZ9owgvXXLDu0SwLHm/emFNjGsuVgAwGnT+Pn0vRZ+DrKex7IHoMzHjkV09lAHv/C1I2ZrJTyg46bSf+iLwlg5t/vAAcpaTMOpXoRpDLbsRuKsSdiLgFYlOq+SntrPbEU5A6OGgVRmwYTaKtAg7fnq0aHDsUXGiSjpEd7zWmNon9ciKR8V3lXKf21HS8FK6nBcyqxpe0a/dXmL6ERv8o8/1gOOYMsE3CufWvrm1N6MzL6S3WqP5fNVi7XvyGLM8xojmNxtju3hpyqLNA2qAbhC0Rze8G4cost4HE5qX++DzEcWULieARGMRV897TrgnTfuCw+my7DSCTZxq/4OqjcFtdBgtsszJoaU2daTyOHOI0gjcokNgV7uZy5XM3zEdeoqOlrdSvdiItbP55O3MQOZGex/IPA6Kre+4LSlcpJdScmE1597eMrpwWecNcS5bGePiaq0iSQj/r1FNZM19ddmKIv4qhoav7G1ifzNch34hSrKDIgKai+a0PO2GxiOHMlwbUdUY9ckKexMfk9Ee+Sp80cxXwbaqauf+2GMSyK GDw9ZTWk /R96lniH5+grlJtQcZqi6jjrzY/v/XJvXBJA+PJ1XO2vk+dQU9+bwH6a3XpewyO1L+PDTTY5ZmMXjwQYzLe4KAZyT3CMMa1KHJIkuvqRDZXJOiqRXDUqdyZSQ28XM5zAKPwWw/3kmlM4uHANhDDnIW7MDU4KZkjYVgv0fxcbr/TNKysFdNGE7Mf7qcmuePmQlHqWKUjFi8JVjbUJD84/FfMzBacFxGbNyJTHR+8TmXC7qiHBa3k1hpw58NnCmIGL1td6GbWj7wNsFIQb4Gwad9ssZkYJdZBvXruv7bCrW46FsmiAg9krxOBJtvw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Michal Hocko writes: > On Tue 28-11-23 11:19:20, Huang, Ying wrote: >> Yosry Ahmed writes: >>=20 >> > On Mon, Nov 27, 2023 at 1:32=E2=80=AFPM Minchan Kim wrote: >> >> >> >> On Mon, Nov 27, 2023 at 12:22:59AM -0800, Chris Li wrote: >> >> > On Mon, Nov 27, 2023 at 12:14=E2=80=AFAM Huang, Ying wrote: >> >> > > > I agree with Ying that anonymous pages typically have differen= t page >> >> > > > access patterns than file pages, so we might want to treat them >> >> > > > differently to reclaim them effectively. >> >> > > > One random idea: >> >> > > > How about we put the anonymous page in a swap cache in a differ= ent LRU >> >> > > > than the rest of the anonymous pages. Then shrinking against th= ose >> >> > > > pages in the swap cache would be more effective.Instead of havi= ng >> >> > > > [anon, file] LRU, now we have [anon not in swap cache, anon in = swap >> >> > > > cache, file] LRU >> >> > > >> >> > > I don't think that it is necessary. The patch is only for a spec= ial use >> >> > > case. Where the swap device is used up while some pages are in s= wap >> >> > > cache. The patch will kill performance, but it is used to avoid = OOM >> >> > > only, not to improve performance. Per my understanding, we will = not use >> >> > > up swap device space in most cases. This may be true for ZRAM, b= ut will >> >> > > we keep pages in swap cache for long when we use ZRAM? >> >> > >> >> > I ask the question regarding how many pages can be freed by this pa= tch >> >> > in this email thread as well, but haven't got the answer from the >> >> > author yet. That is one important aspect to evaluate how valuable is >> >> > that patch. >> >> >> >> Exactly. Since swap cache has different life time with page cache, th= ey >> >> would be usually dropped when pages are unmapped(unless they are shar= ed >> >> with others but anon is usually exclusive private) so I wonder how mu= ch >> >> memory we can save. >> > >> > I think the point of this patch is not saving memory, but rather >> > avoiding an OOM condition that will happen if we have no swap space >> > left, but some pages left in the swap cache. Of course, the OOM >> > avoidance will come at the cost of extra work in reclaim to swap those >> > pages out. >> > >> > The only case where I think this might be harmful is if there's plenty >> > of pages to reclaim on the file LRU, and instead we opt to chase down >> > the few swap cache pages. So perhaps we can add a check to only set >> > sc->swapcache_only if the number of pages in the swap cache is more >> > than the number of pages on the file LRU or similar? Just make sure we >> > don't chase the swapcache pages down if there's plenty to scan on the >> > file LRU? >>=20 >> The swap cache pages can be divided to 3 groups. >>=20 >> - group 1: pages have been written out, at the tail of inactive LRU, but >> not reclaimed yet. >>=20 >> - group 2: pages have been written out, but were failed to be reclaimed >> (e.g., were accessed before reclaiming) >>=20 >> - group 3: pages have been swapped in, but were kept in swap cache. The >> pages may be in active LRU. >>=20 >> The main target of the original patch should be group 1. And the pages >> may be cheaper to reclaim than file pages. > > Thanks this is really useful summary. And it begs question. How are we > telling those different types from each other? vmstat counter is > certainly not sufficient and that means we might be scanning a lot > without actually making any progress. And doing that repeatedly. We don't have counters for pages in individual groups. Pages in group 1 and some pages in group 2 are always at the tail of inactive LRU. So, we can identify them relatively easily. So a simple method could be, if there are swap cache, try scan the tail of inactive LRU to free pages in group 1 and move pages in group 2. If we found some pages aren't in swap cache, there may be no pages in group 1. Then, we may give up scanning if the memory pressure isn't too large. One possible issue is that some pages (map count =3D=3D 1) may be swapped out, swapped in, then deleted from swap cache. So, some pages not in swap cache may be at the end of inactive LRU too. -- Best Regards, Huang, Ying