From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2DFAC4167B for ; Tue, 28 Nov 2023 05:42:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C29C6B02F9; Tue, 28 Nov 2023 00:42:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 44B386B02FA; Tue, 28 Nov 2023 00:42:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 29E766B02FB; Tue, 28 Nov 2023 00:42:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0C4746B02F9 for ; Tue, 28 Nov 2023 00:42:37 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D3F62C01AB for ; Tue, 28 Nov 2023 05:42:36 +0000 (UTC) X-FDA: 81506268312.29.F7AB663 Received: from mail-ej1-f45.google.com (mail-ej1-f45.google.com [209.85.218.45]) by imf28.hostedemail.com (Postfix) with ESMTP id F087DC0007 for ; Tue, 28 Nov 2023 05:42:34 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=e3MWosmN; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701150155; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/q5Ao6kkaSp6ntRr5jYTJ0UF0vBlRA6OurkdKyp1VkQ=; b=j6bE0GckZnQwkj6fULAQiUA4xkNavkKdGK4qOLSsG2MVNzAqTkXI09wBmjRtSFZJUHKHnK pyV3D3Cbdum5bwYFS4WraJMu4bk27f+/7y+cULiI+E4nlNvARv7cbWVT0HUNYdE4NFM6wQ lcPqD3A+I2UpMa8KLdU+LF+a1sSHUxM= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=e3MWosmN; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701150155; a=rsa-sha256; cv=none; b=BkmQl1em3mUncqGNhHh7Yhpku+9DTcrx2EX1Mxm6u8Yph+kR7G+BWGl7YfwSYMOR7E69Ds NRfi8SHjCsywgy9tEIvZx2fPGNGaVnyUL70be+gYwhBAkewT/iPSeMm5AZ9WzZZ+KioDtJ vRhPLY1Rfli2pK+iRmNqi5QO4JHWdi4= Received: by mail-ej1-f45.google.com with SMTP id a640c23a62f3a-a00a9c6f1e9so716742366b.3 for ; Mon, 27 Nov 2023 21:42:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701150153; x=1701754953; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/q5Ao6kkaSp6ntRr5jYTJ0UF0vBlRA6OurkdKyp1VkQ=; b=e3MWosmNb3EqU5qOnNMerYW8wfUm3gwlWhtOj6SDTuZo3lcIcbtxNxLgC3zUMObezL 5rcoKcsR4urQqsIsa5FN+viVOuVwrxAnLm5mSzxIogZnXCtU+/OyOoRdFP8U9mGqzKeJ TgcGTciLJTm+DJvYHonxQzmJ4I6cgG8AJe0n4RYZQ0XddJcvKvNadIWX14j2WuxIqtPd DrY5emblsqP5LLDohI75CJpxlo9WWeQxTX+qAsaKEuz+6jmpr6RjrJKmxSmNFyCg3maJ CBWnqwlpUO/d5bOotnT27c/fJfl2DzMeH0dSvn9cinEZtoPcuyCi6H4ALLYHVyCT8wXJ KZuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701150153; x=1701754953; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/q5Ao6kkaSp6ntRr5jYTJ0UF0vBlRA6OurkdKyp1VkQ=; b=GQnEuasUU4vRC/HAzrL1fvtXz3DGBkVJL6aM1+fCJ4NuJecw5XQlpE/XYqvG4QLukT do5zGDOJzeq7qSy0hDr6YckamOYJyWXF32KmPz30l/twtRQx3fl1sXGjiqJHzSUJH/OV E2Mbn0Uyyf30YPflgFfnzzKC0M81hdiI6d1jAnKTcKmiNGmen0HhIsfpjjI/YWkx1WFL 16fPEmKlyc67db0V8hEDfD3k19JP0lUAkipl5fgNE/famAUcSQKbBM1MHnFd65WGuX6I zvFDc0NvZQdXH0Dghl71Rx75R2MLRQ2KIAEOREgjPbiaTyQ+K6BBvBCziLBGAW6/P9Uo hlXw== X-Gm-Message-State: AOJu0Yzf1Nls+FCDtwWFr0aW/I1Y0NwmXm6gtTqiNLJX2ghTDE5Jo84A gRBPVk1QQXdA5D5haFjgZU9cbbIO1AMjMSwtMAcf3Q== X-Google-Smtp-Source: AGHT+IGG9T6aRbPaRWTVnBmWs0xDlUd+awySyMmO96LVP9S96Rltx1iOrej1rIz/glTs0p1wejc1wrNoThvjnBG9w+Q= X-Received: by 2002:a17:906:eb17:b0:a03:6fd8:f14b with SMTP id mb23-20020a170906eb1700b00a036fd8f14bmr10117355ejb.28.1701150153208; Mon, 27 Nov 2023 21:42:33 -0800 (PST) MIME-Version: 1.0 References: <87msv58068.fsf@yhuang6-desk2.ccr.corp.intel.com> <87h6l77wl5.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkbf7gz6.fsf@yhuang6-desk2.ccr.corp.intel.com> <87msuy5zuv.fsf@yhuang6-desk2.ccr.corp.intel.com> <87fs0q5xsq.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkbe5tha.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87bkbe5tha.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Yosry Ahmed Date: Mon, 27 Nov 2023 21:41:54 -0800 Message-ID: Subject: Re: [PATCH v10] mm: vmscan: try to reclaim swapcache pages if no swap space To: "Huang, Ying" Cc: Minchan Kim , Chris Li , Michal Hocko , Liu Shixin , Yu Zhao , Andrew Morton , Sachin Sant , Johannes Weiner , Kefeng Wang , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: k3fz587c9xf4xsuqykj5x14r3wtbag9g X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: F087DC0007 X-HE-Tag: 1701150154-887824 X-HE-Meta: U2FsdGVkX1+ILQgtnqPRx5Db5boQ037Odz1g793quahRkNYx3bjnahYj1EEz9eolT47TDQYbQfxqvexz3KDofcXTuM5HmRW+gHzufTAfhing74RFovklryvujUynIMWaxtgE4MEQf5CPM6bP2CeDW3dIBY+9amPjYxFGPBa0wiS3nqH6p40syb+4kCvwJRp0xf3d9Sf8A3AKWSy5phdMMryLgRIdHTmRF8f8pzt9YRqeSZLv4o5VPOE9GdVG/j1kEMq75wSG2mP2TB2iOM6VV1zna14NfKj47Oik0guIUWKhs5KG+b8qz27uDq/1X1zPJhG2X31P02hDQ1JgZDEKlI919niYb2/5nk50lbGxvkCs1cHfJniBaort9Mvw6iE68fwlO0k7yf2DZnVe9BgG61hg1Iq0y0zx4kSMYQGVG3Kv0KCBTh+MIXVzfrn4SkbBxntM0x24q3qWGDM36ihi3BFy+MMCiphLjyKz8i67CqzTUSCFL+RXX5QfJTwpBNk6glG/R1JRiTWEzXuF+ZBS0pzpQABXXfQ/NEcR83Fs2hirLodAAYLtw7BSthMjxWy0+rNE0YMquvklx73w8Ub9ZDiFIyfACzXr187a621TY68DpBV/pM9om4myqXlOJWrSPu/yjy+drhGw9LgKAyJqHTRP7QHWSXTsUVIiEQtcBOfkvB6yOOqXmlAkU/mXm/5aqHDPxl4uUBuCnQoySIyPMDFWgjbODA0bdbU6NTtIJ2K+gS2/Y7QBwEKE3B7GvP0CsHeECInZz8VGoKOMTid9cMgNnbdoEyTou5oILZEeS0t7llwYBvWJ6dz832uuCm+8c15ei/Asd6vIWzIdIk1gvr7BZ5v6e4Rpn2XKrHIFIjCbA0NLQBTCCuGEUqPT0nutdnUTKDfNXC5XQ1DVR1ulk3BMKfcu/F8uolPyU6Z/C7Z8JvJQxwSshFy2hG0F4y/TN1ET42RLNkk5DUOp2I1 CtjK4BfY 9cwCY4sXfxQgvT8RP55GqGKlLuhfP6Igc+oOJ58omIxEH4g9/NbkjddqZ5exO6URRaKEYEJQHpKb1HVKOXs8cjRnVVW2dqxbl2HgRzjv2vMeHs3deaAtghh0bCOJ6lbjKHqSn7qaVNcgVKrvFchyVVPSf9eKTvQ8lPA7DPPN/qrES0A4fcHw8DbjKJJjFn9gqJnR1XVgjQb8sCdFZaFvMWRji3k7doC9AbSqzCMTxSnaEhM7RPkY2c+rR2PO4L6Azto/BsdW24kWlj5YGNqLgSOecD9uZdCXoTRIXAvgicOGWmGXIQeK3bUFxD8COSsjiJNw8YP0gQDrPFnY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 27, 2023 at 9:39=E2=80=AFPM Huang, Ying = wrote: > > Yosry Ahmed writes: > > > On Mon, Nov 27, 2023 at 8:05=E2=80=AFPM Huang, Ying wrote: > >> > >> Yosry Ahmed writes: > >> > >> > On Mon, Nov 27, 2023 at 7:21=E2=80=AFPM Huang, Ying wrote: > >> >> > >> >> Yosry Ahmed writes: > >> >> > >> >> > On Mon, Nov 27, 2023 at 1:32=E2=80=AFPM Minchan Kim wrote: > >> >> >> > >> >> >> On Mon, Nov 27, 2023 at 12:22:59AM -0800, Chris Li wrote: > >> >> >> > On Mon, Nov 27, 2023 at 12:14=E2=80=AFAM Huang, Ying wrote: > >> >> >> > > > I agree with Ying that anonymous pages typically have dif= ferent page > >> >> >> > > > access patterns than file pages, so we might want to treat= them > >> >> >> > > > differently to reclaim them effectively. > >> >> >> > > > One random idea: > >> >> >> > > > How about we put the anonymous page in a swap cache in a d= ifferent LRU > >> >> >> > > > than the rest of the anonymous pages. Then shrinking again= st those > >> >> >> > > > pages in the swap cache would be more effective.Instead of= having > >> >> >> > > > [anon, file] LRU, now we have [anon not in swap cache, ano= n in swap > >> >> >> > > > cache, file] LRU > >> >> >> > > > >> >> >> > > I don't think that it is necessary. The patch is only for a= special use > >> >> >> > > case. Where the swap device is used up while some pages are= in swap > >> >> >> > > cache. The patch will kill performance, but it is used to a= void OOM > >> >> >> > > only, not to improve performance. Per my understanding, we = will not use > >> >> >> > > up swap device space in most cases. This may be true for ZR= AM, but will > >> >> >> > > we keep pages in swap cache for long when we use ZRAM? > >> >> >> > > >> >> >> > I ask the question regarding how many pages can be freed by th= is patch > >> >> >> > in this email thread as well, but haven't got the answer from = the > >> >> >> > author yet. That is one important aspect to evaluate how valua= ble is > >> >> >> > that patch. > >> >> >> > >> >> >> Exactly. Since swap cache has different life time with page cach= e, they > >> >> >> would be usually dropped when pages are unmapped(unless they are= shared > >> >> >> with others but anon is usually exclusive private) so I wonder h= ow much > >> >> >> memory we can save. > >> >> > > >> >> > I think the point of this patch is not saving memory, but rather > >> >> > avoiding an OOM condition that will happen if we have no swap spa= ce > >> >> > left, but some pages left in the swap cache. Of course, the OOM > >> >> > avoidance will come at the cost of extra work in reclaim to swap = those > >> >> > pages out. > >> >> > > >> >> > The only case where I think this might be harmful is if there's p= lenty > >> >> > of pages to reclaim on the file LRU, and instead we opt to chase = down > >> >> > the few swap cache pages. So perhaps we can add a check to only s= et > >> >> > sc->swapcache_only if the number of pages in the swap cache is mo= re > >> >> > than the number of pages on the file LRU or similar? Just make su= re we > >> >> > don't chase the swapcache pages down if there's plenty to scan on= the > >> >> > file LRU? > >> >> > >> >> The swap cache pages can be divided to 3 groups. > >> >> > >> >> - group 1: pages have been written out, at the tail of inactive LRU= , but > >> >> not reclaimed yet. > >> >> > >> >> - group 2: pages have been written out, but were failed to be recla= imed > >> >> (e.g., were accessed before reclaiming) > >> >> > >> >> - group 3: pages have been swapped in, but were kept in swap cache.= The > >> >> pages may be in active LRU. > >> >> > >> >> The main target of the original patch should be group 1. And the p= ages > >> >> may be cheaper to reclaim than file pages. > >> >> > >> >> Group 2 are hard to be reclaimed if swap_count() isn't 0. > >> >> > >> >> Group 3 should be reclaimed in theory, but the overhead may be high= . > >> >> And we may need to reclaim the swap entries instead of pages if the= pages > >> >> are hot. But we can start to reclaim the swap entries before the s= wap > >> >> space is run out. > >> >> > >> >> So, if we can count group 1, we may use that as indicator to scan a= non > >> >> pages. And we may add code to reclaim group 3 earlier. > >> >> > >> > > >> > My point was not that reclaiming the pages in the swap cache is more > >> > expensive that reclaiming the pages in the file LRU. In a lot of > >> > cases, as you point out, the pages in the swap cache can just be > >> > dropped, so they may be as cheap or cheaper to reclaim than the page= s > >> > in the file LRU. > >> > > >> > My point was that scanning the anon LRU when swap space is exhausted > >> > to get to the pages in the swap cache may be much more expensive, > >> > because there may be a lot of pages on the anon LRU that are not in > >> > the swap cache, and hence are not reclaimable, unlike pages in the > >> > file LRU, which should mostly be reclaimable. > >> > > >> > So what I am saying is that maybe we should not do the effort of > >> > scanning the anon LRU in the swapcache_only case unless there aren't= a > >> > lot of pages to reclaim on the file LRU (relatively). For example, i= f > >> > we have a 100 pages in the swap cache out of 10000 pages in the anon > >> > LRU, and there are 10000 pages in the file LRU, it's probably not > >> > worth scanning the anon LRU. > >> > >> For group 1 pages, they are at the tail of the anon inactive LRU, so t= he > >> scan overhead is low too. For example, if number of group 1 pages is > >> 100, we just need to scan 100 pages to reclaim them. We can choose to > >> stop scanning when the number of the non-group-1 pages reached some > >> threshold. > >> > > > > We should still try to reclaim pages in groups 2 & 3 before OOMing > > though. Maybe the motivation for this patch is group 1, but I don't > > see why we should special case them. Pages in groups 2 & 3 should be > > roughly equally cheap to reclaim. They may have higher refault cost, > > but IIUC we should still try to reclaim them before OOMing. > > The scan cost of group 3 may be high, you may need to scan all anonymous > pages to identify them. The reclaim cost of group 2 may be high, it may > just cause trashing (shared pages that are accessed by just one > process). So I think that we can allow reclaim group 1 in all cases. > Try to reclaim swap entries for group 3 during normal LRU scanning after > more than half of swap space of limit is used. As a last resort before > OOM, try to reclaim group 2 and group 3. Or, limit scan count for group > 2 and group 3. It would be nice if this can be done auto-magically without having to keep track of the groups separately. > > BTW, in some situation, OOM is not the worst situation. For example, > trashing may kill interaction latency, while killing the memory hog (may > be caused by memory leak) saves system response time. I agree that in some situations OOMs are better than thrashing, it's not an easy problem.