From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 194B0C4167B for ; Tue, 28 Nov 2023 04:14:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A4C486B02F6; Mon, 27 Nov 2023 23:14:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9FC0F6B02F9; Mon, 27 Nov 2023 23:14:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C3BE6B02FA; Mon, 27 Nov 2023 23:14:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7D98D6B02F6 for ; Mon, 27 Nov 2023 23:14:18 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 5BA39A016B for ; Tue, 28 Nov 2023 04:14:18 +0000 (UTC) X-FDA: 81506045796.05.E241BF3 Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42]) by imf18.hostedemail.com (Postfix) with ESMTP id 752DA1C0003 for ; Tue, 28 Nov 2023 04:14:16 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=AcAYG1DT; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701144856; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wPUfY3E34s0ybkWQp9ENvHW5Xeh7l+OsGOithDOSDmA=; b=nz6rKbp1FHv3hjhKxo9I+oaOHmUKshN+oyLDxRNPkOmUNz+BQ3CTEoUMDR5wS6+6sI4/I/ /hGEPY8R1GuPEMWyJiJwqV2hd08HotEgRzcJkLl988V4w/fXOmexC+UVFsvR4u4kHTlmSt OL+mFeIVTCKHc1hhjF1VZEjPrVBS8jQ= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=AcAYG1DT; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701144856; a=rsa-sha256; cv=none; b=p6qVucFpCe9yid9hun/CCakTD0OoDK7auf1Tdscf/hHFtEnhOfLzbrO/K41m9lXBrJvM8O GsU6o+J2/2f4ykCZ4tCmx+lF/+zy3qFayoY+o+CTO17r7rVQyFRXp605CWYpvzODzeAeXA cGDhdFfY4rSKE5qoGynwLvUtgbowRjU= Received: by mail-lf1-f42.google.com with SMTP id 2adb3069b0e04-50abbb23122so6507155e87.3 for ; Mon, 27 Nov 2023 20:14:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701144854; x=1701749654; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=wPUfY3E34s0ybkWQp9ENvHW5Xeh7l+OsGOithDOSDmA=; b=AcAYG1DThjlAw8JuKCxkhuRaiWoagR4FIo8C0ruwi5JNlSrfVCdrdLi1xnrY32OxWO wsorwrZzQ2LYGDS1GHv8F5QBDLulFlprykkrLzsN/MmHLeaGWoKjNolu8Np3k1r/HyGM O9KPUdYdTtYCkCu4OXu5cAuZKop2eRaV0dbUW1ELYIwEJRzlcH1a1sOKZu8cvQsprO8X 84m/nV2UJBnwIylKIRHzL/DOhSi7WD3z6YzSWaGvD8ZpLt3ww8f0QN0u91ze34u146vp TNcqwW9HpcyYSW9+W9YiESuRYVlAEIbzYve2ynWpOzVUAWlsf4uzuWlY+eJR0NKUcMAM B2JQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701144854; x=1701749654; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wPUfY3E34s0ybkWQp9ENvHW5Xeh7l+OsGOithDOSDmA=; b=f0ydRI8jSSgEnO7/dnaCz+CdRpUFnHyD8Xm/fPz1ylGA3oNjQnp+u/4Ie0IYFRRnfE HR5I/fs71810x9yTNSszbCPBVDAksoazj/tg8xCLGp59M5O5lp/rUH5jQArX5Td1HIAV LcqokrLHZfGGTbuwMTtepjcfLXH3YtDMNnazIa3KIQH/uZPUQMYM24fsqlomMhiFQfxb nWvPFhxVv7YDGDeww85Km8LUxTGDMQEbtF8BFnPTz3Ch4LVV5JzaH1YdVgyHlXMuMRen 4NvdyQD6qvPi9qVStfIHMn/zB8irtPNsTNrrK3LGAI8dKGu1egPLL4qh7J9W27dQ7FCO RYDA== X-Gm-Message-State: AOJu0YzmmebALlxvhOOXop/00xI+AdivOpBenyAd4mAOn68LnqmzK8x8 +43ZHmkbz3PzjK9o3+G3krzz5AcS0phDSaqV4Oi2ug== X-Google-Smtp-Source: AGHT+IHEIv7cBGSSied89o9tUhDwnR5lkDHe4fDdCb7SnRdwwbQ2me11W50eRVokucGbM1QosN4V8YbLx/LHI9U/sMw= X-Received: by 2002:a05:6512:3d03:b0:509:8a7e:4d0d with SMTP id d3-20020a0565123d0300b005098a7e4d0dmr8172857lfv.0.1701144854489; Mon, 27 Nov 2023 20:14:14 -0800 (PST) MIME-Version: 1.0 References: <87msv58068.fsf@yhuang6-desk2.ccr.corp.intel.com> <87h6l77wl5.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkbf7gz6.fsf@yhuang6-desk2.ccr.corp.intel.com> <87msuy5zuv.fsf@yhuang6-desk2.ccr.corp.intel.com> <87fs0q5xsq.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87fs0q5xsq.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Yosry Ahmed Date: Mon, 27 Nov 2023 20:13:38 -0800 Message-ID: Subject: Re: [PATCH v10] mm: vmscan: try to reclaim swapcache pages if no swap space To: "Huang, Ying" Cc: Minchan Kim , Chris Li , Michal Hocko , Liu Shixin , Yu Zhao , Andrew Morton , Sachin Sant , Johannes Weiner , Kefeng Wang , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 752DA1C0003 X-Stat-Signature: gq7gsc55bamyj1tzz39asq8bzzyq53mw X-Rspam-User: X-HE-Tag: 1701144856-219988 X-HE-Meta: U2FsdGVkX18361KTxHWIvwtNQKH0Xwtk1KnLz+p+Kn7EIReJe/PX/iqr7iB1pprlFDw0VEwvI3a7fiMTLEMJmzM5HfwGeBryFlROP+6pcuVaNLckghIxpOXTWc4sNDa5dw5kwkC+kCtEZRIlqLRQ2gq3GE26WVtQY0aEdiNccRVf1k6P8TgNC/g7NeEFPtblwBWToUCuUL833S4VoQuszuJSbOTzR7EoxyjDSwb3nUMsa1www4+tZgCRwnAU8mYERAWueqTM1k+StfBrmlp36eJTisANhLu0VhemGmGQi3WDmNmApVhOQyyre2agVBEzjaZoa+brPuYuc6Wv46rSSXzNbnfVujL77cDjmXl3HiGuJGR304LdUse7lNZFMyoA1GUXpMVCU18ZtE7PSFgnM9vpkL9nv/3Df3Yps+v8NF1clUDUGRCniFlK3Mn96LCZ/bJ3U88lzezit9XOZ26R9MGZY1VHDugX5o8oxujxiSISYvaCurFZHUUAXVsLFDn094lEzM+MPQXz5c5gH/u1X29aGUnb9bxAU67mMdANMGE9HQWM0XaBFwOT9TEeFoXAi3ldqoxGyQOU1ianA4rtZXQlRv/womsNwHAmyQNgoM3VuQ9HdCS8WAcnBdIqeNDJcsnELBWJgi/PA2wevyztHi2oyr6Gl3lxgGJ4AM/AWK+C1j1cuGKP1W5Uga4Hzaf8eKezjQaRwd5m1okQBr94nUkbiie5JSsdsQbB9sVcu/b4Y0dZR01loXm8gjAKOC8D98QoIdD4HDqOTWQKjJhQatP0okli6fkPIYBBkVR8FnXV/qBBX5z0AN1VP5Oca4pl/fYNymj76AV7/AcIkC5NUcETUoN2S472WYaIqiHnTFQvMZuXNLB5MSKB2Kyzsk+l4HIx/QC81UiOhMhnWQ4Sx8Bd0Z3K52xF8+cPAkf60z1Tifero7946YzmxYrrhsK7QOVJqTFRDLocb+fsy8f dGNBBUTP mJYAD0vGWYc6xPZp9IHitA6CiaEXSWBofJlwzUO6dSxhpsOO52WvqrnXnSy+lrsgZvB0fWoj8d0/jPhMKsrLVoOFRV1kV1vUJnINFRPlNuiKtAZ7K0e21iapxwB1sFVKe1VOVpl4xBJb9wG5zeDUEviOotVzqal4RGDBoTPHmHoUqgjEudZOwlG5JLtW8fkuWDKJx41vEosLTj3lVhMLoYWxB78DScWj1eTcZ7cXg3rY5ADddXVY7oZkK4R1OD1PxwiEoIL8ZQ2tQlT6irKMcUi6umFIliDgqBPSYU9TT+6coEuazqo1opB2dLCy/xhZyOz2xSh+bhD5lvHc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 27, 2023 at 8:05=E2=80=AFPM Huang, Ying = wrote: > > Yosry Ahmed writes: > > > On Mon, Nov 27, 2023 at 7:21=E2=80=AFPM Huang, Ying wrote: > >> > >> Yosry Ahmed writes: > >> > >> > On Mon, Nov 27, 2023 at 1:32=E2=80=AFPM Minchan Kim wrote: > >> >> > >> >> On Mon, Nov 27, 2023 at 12:22:59AM -0800, Chris Li wrote: > >> >> > On Mon, Nov 27, 2023 at 12:14=E2=80=AFAM Huang, Ying wrote: > >> >> > > > I agree with Ying that anonymous pages typically have differ= ent page > >> >> > > > access patterns than file pages, so we might want to treat th= em > >> >> > > > differently to reclaim them effectively. > >> >> > > > One random idea: > >> >> > > > How about we put the anonymous page in a swap cache in a diff= erent LRU > >> >> > > > than the rest of the anonymous pages. Then shrinking against = those > >> >> > > > pages in the swap cache would be more effective.Instead of ha= ving > >> >> > > > [anon, file] LRU, now we have [anon not in swap cache, anon i= n swap > >> >> > > > cache, file] LRU > >> >> > > > >> >> > > I don't think that it is necessary. The patch is only for a sp= ecial use > >> >> > > case. Where the swap device is used up while some pages are in= swap > >> >> > > cache. The patch will kill performance, but it is used to avoi= d OOM > >> >> > > only, not to improve performance. Per my understanding, we wil= l not use > >> >> > > up swap device space in most cases. This may be true for ZRAM,= but will > >> >> > > we keep pages in swap cache for long when we use ZRAM? > >> >> > > >> >> > I ask the question regarding how many pages can be freed by this = patch > >> >> > in this email thread as well, but haven't got the answer from the > >> >> > author yet. That is one important aspect to evaluate how valuable= is > >> >> > that patch. > >> >> > >> >> Exactly. Since swap cache has different life time with page cache, = they > >> >> would be usually dropped when pages are unmapped(unless they are sh= ared > >> >> with others but anon is usually exclusive private) so I wonder how = much > >> >> memory we can save. > >> > > >> > I think the point of this patch is not saving memory, but rather > >> > avoiding an OOM condition that will happen if we have no swap space > >> > left, but some pages left in the swap cache. Of course, the OOM > >> > avoidance will come at the cost of extra work in reclaim to swap tho= se > >> > pages out. > >> > > >> > The only case where I think this might be harmful is if there's plen= ty > >> > of pages to reclaim on the file LRU, and instead we opt to chase dow= n > >> > the few swap cache pages. So perhaps we can add a check to only set > >> > sc->swapcache_only if the number of pages in the swap cache is more > >> > than the number of pages on the file LRU or similar? Just make sure = we > >> > don't chase the swapcache pages down if there's plenty to scan on th= e > >> > file LRU? > >> > >> The swap cache pages can be divided to 3 groups. > >> > >> - group 1: pages have been written out, at the tail of inactive LRU, b= ut > >> not reclaimed yet. > >> > >> - group 2: pages have been written out, but were failed to be reclaime= d > >> (e.g., were accessed before reclaiming) > >> > >> - group 3: pages have been swapped in, but were kept in swap cache. T= he > >> pages may be in active LRU. > >> > >> The main target of the original patch should be group 1. And the page= s > >> may be cheaper to reclaim than file pages. > >> > >> Group 2 are hard to be reclaimed if swap_count() isn't 0. > >> > >> Group 3 should be reclaimed in theory, but the overhead may be high. > >> And we may need to reclaim the swap entries instead of pages if the pa= ges > >> are hot. But we can start to reclaim the swap entries before the swap > >> space is run out. > >> > >> So, if we can count group 1, we may use that as indicator to scan anon > >> pages. And we may add code to reclaim group 3 earlier. > >> > > > > My point was not that reclaiming the pages in the swap cache is more > > expensive that reclaiming the pages in the file LRU. In a lot of > > cases, as you point out, the pages in the swap cache can just be > > dropped, so they may be as cheap or cheaper to reclaim than the pages > > in the file LRU. > > > > My point was that scanning the anon LRU when swap space is exhausted > > to get to the pages in the swap cache may be much more expensive, > > because there may be a lot of pages on the anon LRU that are not in > > the swap cache, and hence are not reclaimable, unlike pages in the > > file LRU, which should mostly be reclaimable. > > > > So what I am saying is that maybe we should not do the effort of > > scanning the anon LRU in the swapcache_only case unless there aren't a > > lot of pages to reclaim on the file LRU (relatively). For example, if > > we have a 100 pages in the swap cache out of 10000 pages in the anon > > LRU, and there are 10000 pages in the file LRU, it's probably not > > worth scanning the anon LRU. > > For group 1 pages, they are at the tail of the anon inactive LRU, so the > scan overhead is low too. For example, if number of group 1 pages is > 100, we just need to scan 100 pages to reclaim them. We can choose to > stop scanning when the number of the non-group-1 pages reached some > threshold. > We should still try to reclaim pages in groups 2 & 3 before OOMing though. Maybe the motivation for this patch is group 1, but I don't see why we should special case them. Pages in groups 2 & 3 should be roughly equally cheap to reclaim. They may have higher refault cost, but IIUC we should still try to reclaim them before OOMing.