From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1E3EC27C6E for ; Sat, 15 Jun 2024 00:25:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 623F26B0180; Fri, 14 Jun 2024 20:20:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F10AB8D0005; Fri, 14 Jun 2024 20:20:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C042C6B0178; Fri, 14 Jun 2024 20:20:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1C4DD8D0005 for ; Fri, 14 Jun 2024 20:20:08 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 796B6160998 for ; Sat, 15 Jun 2024 00:20:07 +0000 (UTC) X-FDA: 82231215654.30.B3408B9 Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) by imf29.hostedemail.com (Postfix) with ESMTP id A4FBB120010 for ; Sat, 15 Jun 2024 00:20:05 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jYEFXPE9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718410803; a=rsa-sha256; cv=none; b=k03paOe7fOkR26egrBLpiz43poRRgK7PJDJ+bqJZQKn/soayk1FDgXvk3DMN29zYvq3FfL MIkrOERq/PyRCr20Fu7IpyY9cxTU4KWx1P0PHy0afRc4M8+O0ouRTlo+bhXl2lc1IarTiU Au0QJfa7VaD4YScW0ORdhSX+ImqadYo= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jYEFXPE9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718410803; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RygvKJoxEF8JAtJ3lLlWfUVRrpUQj262assGYwQV07M=; b=X/Pcm4Gj0CboWJtAX3uMegfJ8xIOQJDpFLc3MDXZc980Oh9L69CmGROuBQZTq1uwS2xBrU Xo7+mdrSTSNHkdbcab646wegvZf96xtgySNK8rPBS7lQ/JMDzqd2SjVOXAyxEbvTjffxlm T5q2L0SicweqHgXflovxEeZ1Ad6OTwM= Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-a6265d48ec3so384747066b.0 for ; Fri, 14 Jun 2024 17:20:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718410804; x=1719015604; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=RygvKJoxEF8JAtJ3lLlWfUVRrpUQj262assGYwQV07M=; b=jYEFXPE9IzOagkQzyuCIAi1neu6MqI4MUXFCmicbnmCg1BqRyA9/ps99Kuw8RrFihD /46chg2VrzjQTsWiYYyivYIup2Zs1g3ZebjZH9gej3nkqEnfI/e1qW+qzBHFcUESKFOl b8aC/Bkgnyd/aGOYaZ38p5VH1OHwRWnDBR+M1tT0dg8y9if8R1dd5oy3/RxOWfxKJTBT 89K6DclzgHRCtUOqY8/Ai2PUoZXh1ga9D/M3IhO4VL+cxgQ4LDrJkmsfZca2Hs7A85xb dbdiFBmiqoANI3bTKrIIUJsz8WlKjH3XFAD9OI/1Tr4Hd350XSoe/B8HYWz6qEOiGlZI QXXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718410804; x=1719015604; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RygvKJoxEF8JAtJ3lLlWfUVRrpUQj262assGYwQV07M=; b=w+zTgQyU85vaKJJSBKs4VbT32icCHXTAMRE9kevJ8QEG6kDX8+kd0JMl8M28w3bJ9H 2rgQi9DTR04KFsQn5v+c7PNjPeEM+W3sZaNZPZlkQihs2nSUMd1OCKrkPjj9NPtF5UOL RDkMyHi0qSB2uguf+h8ckAbJymKErCWMQpSMAL3x98MuRS1dqrfVxGHvlf0IWiuQb7+y VIfOtWRudY5I+JREBJFaktovlKSfGE/bT50uMID3fdv4sx5ypCmapHorh2kB9EqZg8Ay jdU06SkGaKbVvVwyt9EEXwBDqFaOwf3HTMd2RzqSsmVualP2XJXCVGm2mUqAbQ3pPR02 YADg== X-Forwarded-Encrypted: i=1; AJvYcCWUj6+Tt8AeF3FvYmHVfCo1h1rlz2ktI5u//cZHPMg+GoxPExPHCiikAukjS8nv+bwzKZFwGXVoWSgrlWmtVF3VQcU= X-Gm-Message-State: AOJu0YwrwEvrAjU+5HhbTLm95pSOSJqBhex7TJMI30ooySZhha6mgGdR myKN3SyGHnHb0cma1Uc+j+MX26AAkxpR2H0B1SR2l3Uau6cmdI0UG24bqIw+FTm/E3uNR1NaoPd FCrHuMfg/gZNHFkl3g5qGU7e0GmeUVAiWpbsH X-Google-Smtp-Source: AGHT+IGzB5bijriy81AjYjCOAZe0BYsEgAEUKweu6L/T1lvt83MgTMVDacATN9iudJmEjqDpPJSX3NXIKUbXAAiBfAk= X-Received: by 2002:a17:906:6a02:b0:a6f:5f5d:e924 with SMTP id a640c23a62f3a-a6f60cef67cmr330348466b.6.1718410802786; Fri, 14 Jun 2024 17:20:02 -0700 (PDT) MIME-Version: 1.0 References: <20240608155316.451600-1-flintglass@gmail.com> In-Reply-To: From: Yosry Ahmed Date: Fri, 14 Jun 2024 17:19:24 -0700 Message-ID: Subject: Re: [PATCH v1 0/3] mm: zswap: global shrinker fix and proactive shrink To: Takero Funaki Cc: Nhat Pham , Johannes Weiner , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: A4FBB120010 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: gdikmf8ow4ijbmr4sehmkftmh1srxzmj X-HE-Tag: 1718410805-216356 X-HE-Meta: U2FsdGVkX1+FkhEbauBsFqIVFhNb/RIr/9HDT4ZCCTvbGGPqKG/9lgb8CzI9m4SOy56gGiUAqTYdoMydmj9x3OpI6zrQ3HV2aKcCCD6dJ+Jbulpb3H1y7rYsY31QRaBQg6En+pVwSWwPWojzll5r0aUljMemKsZBDq9Yc0VUOlpl2W7jP5OM/XjFvgJPQ43JGtBojhu2ImLHOLRrT3PBpA93aZa6PGYqIVQ3a5povDL4tFvv9R27sh0+yrUmDn7Yj8it8bXvs6+v+wUkk9zV43WYO0J0E8Dl0cW+O92LmuLpbCnKIkcHfQTP3QZ2/7tfwp4VsIVOo/B7Fe53tqfr7N0algrpyOOquYUcmxoZlGS8SDpF7vR4sPmYNAJ7HQNRA1BRiQNO5f54e/XyYiSR5lA0g+aaJsOL0BBEbGGDNjMOuIF2geYqB+qHT30ippH4qqlXVy/SugS+46TTvrCJMe80fY/tg8gSNkv02Cif/HxAQjZbPghWfm+HZn7grnHOwcbkvWlcRppyoVR06IjijD4fFb8CeGxjwRr4ONDzpBo7L9u1BCXuuGA0G1jco4onHyo/VGqeptoWgGrsv6tUhZEVmtqxmTM75myH8lX9BcznW0X1DC2XN7eRSCRusAui97LMRKofe1MzCkN8WbP12AnbmokQegiR/R62eWMqY3cR1BECzAWY5EkcHHwBW7ETbGy1KegJJJGkSMt3ACICL/wDXcIejiSCr5Os0wz+NVwpCciJiaOyYJ7bNg8Jqw9ixkKPg065iSXS3jySfxAOQ4URNpaUlMqs6sT7+olpXa3cCzcyNPCRjGgyDMuh8hJ2ExwDXzFkxFimvNkoQ6DpuXWRz+4EIIvmZ2IsrtqROMvdw/AVEsCul9B4buKtyIh5afgOi1skJIQyGzhDbOurG3HhFMTtjoxIBGZ9oRx/uBtE3prC4hd707kMHPCRhxtVupAYLRv7jmwP6g+G3xR tG49qct0 0F85Rx+gZjUC4tyUyv4SonGWPTTkBNHgnBwpbgHwuM8D6uNUXpOrfXCRJNdOY4j1X0izxNdCr0+pl2g79Jj5Pz+v0Rvv66XlhqC2vNVniX9Eqx/nacSRO85zlY/PK+5L0K1tTs4+GC6VCSXvAHXRlGpJmb9E9I9WL8TTm3nlf/sc/uujJ37KfXYxkxKXDGqEC+M2/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.002739, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 13, 2024 at 9:09=E2=80=AFPM Takero Funaki wrote: > > 2024=E5=B9=B46=E6=9C=8814=E6=97=A5(=E9=87=91) 0:22 Nhat Pham : > > > > Taking a step back from the correctness conversation, could you > > include in the changelog of the patches and cover letter a realistic > > scenario, along with user space-visible metrics that show (ideally all > > 4, but at least some of the following): > > > > 1. A user problem (that affects performance, or usability, etc.) is hap= pening. > > > > 2. The root cause is what we are trying to fix (for e.g in patch 1, we > > are skipping over memcgs unnecessarily in the global shrinker loop). > > > > 3. The fix alleviates the root cause in b) > > > > 4. The userspace-visible problem goes away or is less serious. > > > > Thank you for your suggestions. > For quick response before submitting v2, Thanks for all the info, this should be in the cover letter or commit messages in some shape or form. > > 1. > The visible issue is that pageout/in operations from active processes > are slow when zswap is near its max pool size. This is particularly > significant on small memory systems, where total swap usage exceeds > what zswap can store. This means that old pages occupy most of the > zswap pool space, and recent pages use swap disk directly. This should be a transient state though, right? Once the shrinker kicks in it should writeback the old pages and make space for the hot ones. Which takes us to our next point. > > 2. > This issue is caused by zswap keeping the pool size near 100%. Since > the shrinker fails to shrink the pool to accept_thr_percent and zswap > rejects incoming pages, rejection occurs more frequently than it > should. The rejected pages are directly written to disk while zswap > protects old pages from eviction, leading to slow pageout/in > performance for recent pages on the swap disk. Why is the shrinker failing? IIUC the first two patches fixes two cases where the shrinker stumbles upon offline memcgs, or memcgs with no zswapped pages. Are these cases common enough in your use case that every single time the shrinker runs it hits MAX_RECLAIM_RETRIES before putting the zswap usage below accept_thr_percent? This would be surprising given that we should be restarting the shrinker with every swapout attempt until we can accept pages again. I guess one could construct a malicious case where there are some sticky offline memcgs, and all the memcgs that actually have zswap pages come after it in the iteration order. Could you shed more light about this? What does the setup look like? How many memcgs there are, how many of them use zswap, and how many offline memcgs are you observing? I am not saying we shouldn't fix these problems anyway, I am just trying to understand how we got into this situation to begin with. > > 3. > If the pool size were shrunk proactively, rejection by pool limit hits > would be less likely. New incoming pages could be accepted as the pool > gains some space in advance, while older pages are written back in the > background. zswap would then be filled with recent pages, as expected > in the LRU logic. I suspect if patches 1 and 2 fix your problem, the shrinker invoked from reclaim should be doing this sort of "proactive shrinking". I agree that the current hysteresis around accept_thr_percent is not good enough, but I am surprised you are hitting the pool limit if the shrinker is being run during reclaim. > > Patch 1 and 2 make the shrinker reduce the pool to accept_thr_percent. > Patch 3 makes zswap_store trigger the shrinker before reaching the max > pool size. With this series, zswap will prepare some space to reduce > the probability of problematic pool_limit_hit situation, thus reducing > slow reclaim and the page priority inversion against LRU. > > 4. > Once proactive shrinking reduces the pool size, pageouts complete > instantly as long as the space prepared by shrinking can store the > direct reclaim. If an admin sees a large pool_limit_hit, lowering > accept_threshold_percent will improve active process performance. I agree that proactive shrinking is preferable to waiting until we hit pool limit, then stop taking in pages until the acceptance threshold. I am just trying to understand whether such a proactive shrinking mechanism will be needed if the reclaim shrinker for zswap is being used, how the two would work together.