From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9FB23C4167B for ; Tue, 28 Nov 2023 05:39:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F20096B0290; Tue, 28 Nov 2023 00:39:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ED0046B02A2; Tue, 28 Nov 2023 00:39:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D98056B02EE; Tue, 28 Nov 2023 00:39:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C61506B0290 for ; Tue, 28 Nov 2023 00:39:15 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9735AA0166 for ; Tue, 28 Nov 2023 05:39:15 +0000 (UTC) X-FDA: 81506259870.13.F66B466 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by imf27.hostedemail.com (Postfix) with ESMTP id 7175F4000A for ; Tue, 28 Nov 2023 05:39:12 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=b6jx8piO; spf=pass (imf27.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701149952; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BWvgiCBShAojG5uXubK0G4reC4cLxUcM0j/7+IsxW8k=; b=jbuF1+KCrxoYmaa+zb2Q46XEhF0v1DQRBCvGaBd28oP3QDgK9XQnoVSbUfQ/MH2U6I9ep7 J22g5e/5jQLoLYCYWgDzsO62PU7lfd6Y5+ppdzrTB0ldef6H3cSg0gKkRaEkISUIRl1DAM yCCe/cTUg4ySwWNTgiXNCPGMW3eNUUg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701149952; a=rsa-sha256; cv=none; b=AmCsj+DyzlqLJ+kSeTtUf58ame9qCfsMwcUzBISyRitJYN4SHABm1iXjFhVedjce8eVit8 rsSSRfjjk/PW2aJGt+KM47ggybgmiDa4kGFD+M3yz11ZhufPlPwAI2qVsr/gqw1LRlxj1A cweQ+xqUauCQKs1MJ8O9C+XvscDzK7Y= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=b6jx8piO; spf=pass (imf27.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701149952; x=1732685952; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=kSB8jAGzkfz946UtyNz1cb0xEsgJtUvO9yZaeDIA+C8=; b=b6jx8piOxcp9ly7Z4am2SPGLsVAtJ6Zl/+/MttAkGjoiwltOabQHnx/t 7hbtEQyUMRBgklqmdFlcAhsbncfVVx0nHA4DKQnjascSmkAJOGJYIxf6r X2hRfSg8hwaM4dJrFLi92/Kfde7bpZSLKmRCLW39LjEXmxY+J1ndovIYd 2rdFGbQGktrZ0CTSypHaw48vAbztO9Tai+icSPZ+iuqwOXBod3kEwRfLX J24gkza2IkNYwfcMKAquwjEYA3clM+mgNrFj6e2wSYwPyDztnAYd6X0Rd kF0NVTlGMdbbuyBOXu5DaJlNwh02Rhdt2OpQWA32Oy1bkvZ7Z5I65ne5/ g==; X-IronPort-AV: E=McAfee;i="6600,9927,10907"; a="457184904" X-IronPort-AV: E=Sophos;i="6.04,233,1695711600"; d="scan'208";a="457184904" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Nov 2023 21:39:09 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,233,1695711600"; d="scan'208";a="9837544" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Nov 2023 21:39:07 -0800 From: "Huang, Ying" To: Yosry Ahmed Cc: Minchan Kim , Chris Li , Michal Hocko , Liu Shixin , Yu Zhao , Andrew Morton , Sachin Sant , Johannes Weiner , Kefeng Wang , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v10] mm: vmscan: try to reclaim swapcache pages if no swap space In-Reply-To: (Yosry Ahmed's message of "Mon, 27 Nov 2023 20:13:38 -0800") References: <87msv58068.fsf@yhuang6-desk2.ccr.corp.intel.com> <87h6l77wl5.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkbf7gz6.fsf@yhuang6-desk2.ccr.corp.intel.com> <87msuy5zuv.fsf@yhuang6-desk2.ccr.corp.intel.com> <87fs0q5xsq.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Tue, 28 Nov 2023 13:37:05 +0800 Message-ID: <87bkbe5tha.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Stat-Signature: mxu8dbym3bpqk7aapycf8ehq8sqgjy6t X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 7175F4000A X-Rspam-User: X-HE-Tag: 1701149952-564379 X-HE-Meta: U2FsdGVkX1/5HGQfATidtdUsCZbz94axUTQlXwJsasjNZv3QsSFljnskGa8pnPpCOzOs1WOUxBdx+lRLvTA9wwzCldJ+qQLM7W/nc7evL+pgkEuUzLSfYyWFfVjjQl/elSfonDJMlywqHFh1iVRTpcS5jGkfIdAgnDMocKjOfXWqy1P4kmiQx8o5eudlIdJEgigdi2WZO4eI0bbRPFsAROPz0CdW7ZPjkGRo3DLA1h945tQ//qH0+kS+sEtm7zCePiUDK6LnHnsGG2Y90/CxcvI+5fPZ3zR0rvdT9nvDjDLflKrdi8QDMoQiG0KW8sZiHjMSUM08zEKDukDWGKXJ1vN4CosVc0YIUtjTbtjKBD1PW0dbXZgDpsezBOAmxiEEn/qJg2/JT3g30Iif3ItDXije3TZxrG7UzvfHLVaDYE6gScoVHpcSsbOmU6AbsgsFR1qNDP8r4nIc8++yPCdHIPaX8PY9LyHHyoew3a6SpKJe0KxRqGKhIj4xS+y1sjVvHa1zGJuJyfsWN021imuIh3+2ozSU650ihpfLoJSi4sT58Qg9jgJwKOt7seMeU3YcUwyvMrap0X7i2dDA06zxSFYeeDfH7gngmdItF7Uem+ZmeJGi/tR55mwyPanwT5Gr79HpdbjLdW57Mmd/kb3FpqfP+FCoSIOHi/QYT+vUsNzcgZIK96lVKho7yGQHNFWKuP3+u0sQa6Stx6v/u8XDbbV0YD8eFl2+m6Z33sQBdy8UAQGc/7oBdG8zoHRnipASc+gXW0a/FbKkxrsAbB9/coaDBYpY89S90k8pik8+q0MO6yzwh1pNTURXpSQdGF+lIeNQKU3o6voiaOLntQ+I8vvmuEbC8jGkkI8Of+/Ph91Z8Fqs/Y/eTgtt+ZWF9MCJGJQvxzZyW/rAgQXi/4xRWpwif23XmVryQFwKcA6Hd9nxKDg0B2BJcqrbhDzwoC5qZ4GWKYJUfxG15M8K2oz nVxI1oZ4 atyfTXNsIGNWrZ1HH8W4XU5Zt+59D+BWp8BRSlT/LGSWvntju7dbWCsvmgIkS4tGHod4wrqDmF8q4j5MnMOMuyGCkbxEX7hlAwhVcmueFqShpBuKW0jl01DTN0Ta8E52QCibo4UQHlKen49g8OrmhyLSXOMf+Ydwy/PxipW/9MRsoKXrgrdRkla5y+lNxWWAmsJwC25vMcdJMpoFZq47aj4iil9F+04FMg+Mz2CBYkzvm4zqZd5AiD9/Q0aoQ7qD1Tc9U X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Yosry Ahmed writes: > On Mon, Nov 27, 2023 at 8:05=E2=80=AFPM Huang, Ying wrote: >> >> Yosry Ahmed writes: >> >> > On Mon, Nov 27, 2023 at 7:21=E2=80=AFPM Huang, Ying wrote: >> >> >> >> Yosry Ahmed writes: >> >> >> >> > On Mon, Nov 27, 2023 at 1:32=E2=80=AFPM Minchan Kim wrote: >> >> >> >> >> >> On Mon, Nov 27, 2023 at 12:22:59AM -0800, Chris Li wrote: >> >> >> > On Mon, Nov 27, 2023 at 12:14=E2=80=AFAM Huang, Ying wrote: >> >> >> > > > I agree with Ying that anonymous pages typically have diffe= rent page >> >> >> > > > access patterns than file pages, so we might want to treat t= hem >> >> >> > > > differently to reclaim them effectively. >> >> >> > > > One random idea: >> >> >> > > > How about we put the anonymous page in a swap cache in a dif= ferent LRU >> >> >> > > > than the rest of the anonymous pages. Then shrinking against= those >> >> >> > > > pages in the swap cache would be more effective.Instead of h= aving >> >> >> > > > [anon, file] LRU, now we have [anon not in swap cache, anon = in swap >> >> >> > > > cache, file] LRU >> >> >> > > >> >> >> > > I don't think that it is necessary. The patch is only for a s= pecial use >> >> >> > > case. Where the swap device is used up while some pages are i= n swap >> >> >> > > cache. The patch will kill performance, but it is used to avo= id OOM >> >> >> > > only, not to improve performance. Per my understanding, we wi= ll not use >> >> >> > > up swap device space in most cases. This may be true for ZRAM= , but will >> >> >> > > we keep pages in swap cache for long when we use ZRAM? >> >> >> > >> >> >> > I ask the question regarding how many pages can be freed by this= patch >> >> >> > in this email thread as well, but haven't got the answer from the >> >> >> > author yet. That is one important aspect to evaluate how valuabl= e is >> >> >> > that patch. >> >> >> >> >> >> Exactly. Since swap cache has different life time with page cache,= they >> >> >> would be usually dropped when pages are unmapped(unless they are s= hared >> >> >> with others but anon is usually exclusive private) so I wonder how= much >> >> >> memory we can save. >> >> > >> >> > I think the point of this patch is not saving memory, but rather >> >> > avoiding an OOM condition that will happen if we have no swap space >> >> > left, but some pages left in the swap cache. Of course, the OOM >> >> > avoidance will come at the cost of extra work in reclaim to swap th= ose >> >> > pages out. >> >> > >> >> > The only case where I think this might be harmful is if there's ple= nty >> >> > of pages to reclaim on the file LRU, and instead we opt to chase do= wn >> >> > the few swap cache pages. So perhaps we can add a check to only set >> >> > sc->swapcache_only if the number of pages in the swap cache is more >> >> > than the number of pages on the file LRU or similar? Just make sure= we >> >> > don't chase the swapcache pages down if there's plenty to scan on t= he >> >> > file LRU? >> >> >> >> The swap cache pages can be divided to 3 groups. >> >> >> >> - group 1: pages have been written out, at the tail of inactive LRU, = but >> >> not reclaimed yet. >> >> >> >> - group 2: pages have been written out, but were failed to be reclaim= ed >> >> (e.g., were accessed before reclaiming) >> >> >> >> - group 3: pages have been swapped in, but were kept in swap cache. = The >> >> pages may be in active LRU. >> >> >> >> The main target of the original patch should be group 1. And the pag= es >> >> may be cheaper to reclaim than file pages. >> >> >> >> Group 2 are hard to be reclaimed if swap_count() isn't 0. >> >> >> >> Group 3 should be reclaimed in theory, but the overhead may be high. >> >> And we may need to reclaim the swap entries instead of pages if the p= ages >> >> are hot. But we can start to reclaim the swap entries before the swap >> >> space is run out. >> >> >> >> So, if we can count group 1, we may use that as indicator to scan anon >> >> pages. And we may add code to reclaim group 3 earlier. >> >> >> > >> > My point was not that reclaiming the pages in the swap cache is more >> > expensive that reclaiming the pages in the file LRU. In a lot of >> > cases, as you point out, the pages in the swap cache can just be >> > dropped, so they may be as cheap or cheaper to reclaim than the pages >> > in the file LRU. >> > >> > My point was that scanning the anon LRU when swap space is exhausted >> > to get to the pages in the swap cache may be much more expensive, >> > because there may be a lot of pages on the anon LRU that are not in >> > the swap cache, and hence are not reclaimable, unlike pages in the >> > file LRU, which should mostly be reclaimable. >> > >> > So what I am saying is that maybe we should not do the effort of >> > scanning the anon LRU in the swapcache_only case unless there aren't a >> > lot of pages to reclaim on the file LRU (relatively). For example, if >> > we have a 100 pages in the swap cache out of 10000 pages in the anon >> > LRU, and there are 10000 pages in the file LRU, it's probably not >> > worth scanning the anon LRU. >> >> For group 1 pages, they are at the tail of the anon inactive LRU, so the >> scan overhead is low too. For example, if number of group 1 pages is >> 100, we just need to scan 100 pages to reclaim them. We can choose to >> stop scanning when the number of the non-group-1 pages reached some >> threshold. >> > > We should still try to reclaim pages in groups 2 & 3 before OOMing > though. Maybe the motivation for this patch is group 1, but I don't > see why we should special case them. Pages in groups 2 & 3 should be > roughly equally cheap to reclaim. They may have higher refault cost, > but IIUC we should still try to reclaim them before OOMing. The scan cost of group 3 may be high, you may need to scan all anonymous pages to identify them. The reclaim cost of group 2 may be high, it may just cause trashing (shared pages that are accessed by just one process). So I think that we can allow reclaim group 1 in all cases. Try to reclaim swap entries for group 3 during normal LRU scanning after more than half of swap space of limit is used. As a last resort before OOM, try to reclaim group 2 and group 3. Or, limit scan count for group 2 and group 3. BTW, in some situation, OOM is not the worst situation. For example, trashing may kill interaction latency, while killing the memory hog (may be caused by memory leak) saves system response time. -- Best Regards, Huang, Ying