From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4FD02E82CD8 for ; Wed, 27 Sep 2023 20:39:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BC8CE6B0200; Wed, 27 Sep 2023 16:39:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B79256B0201; Wed, 27 Sep 2023 16:39:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A40EB6B0202; Wed, 27 Sep 2023 16:39:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 91E3C6B0200 for ; Wed, 27 Sep 2023 16:39:34 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 58CFE140E0A for ; Wed, 27 Sep 2023 20:39:34 +0000 (UTC) X-FDA: 81283543068.22.006413D Received: from mail-ua1-f42.google.com (mail-ua1-f42.google.com [209.85.222.42]) by imf17.hostedemail.com (Postfix) with ESMTP id 42DA740014 for ; Wed, 27 Sep 2023 20:39:32 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=PiqzmrWG; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf17.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.42 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695847172; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Z2gl2b6anOv/WlO5fpZiL7fnlm+S3JfLVNwXnhVppVA=; b=vnZQpAOh903cCUaDAqhMqxOrjxVYLwRncJI451bpJ2LEr5e+cs9vUoYXtx7z5+na1mrReu 2tFg6FviVkW2MRLCSYQZbVAvJ7JwB+baL8Wg+UEXRuobL1zfl3hWWoiAuzC0KvcJ9EA3bF p3GsbMWRN6IwEKlvcmBhA+CXZE5pVrA= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=PiqzmrWG; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf17.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.42 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695847172; a=rsa-sha256; cv=none; b=O18wHm4q7hCwrqDEWYHKqPLOLt1LdIGYLxLWQ5jEP4Be+nc231R3UJSzOfmPyOqJczebfi MlCnCE59jS+aiH9qTAYEaTJHeOCxofFs/s6DS6/ye7bFuXoCAEAZUlSs4yC3rK5a9xeYZB 7+0hAuVINNP8dzY6whn4RoKTb52WMQY= Received: by mail-ua1-f42.google.com with SMTP id a1e0cc1a2514c-7ab9488f2f0so4261235241.3 for ; Wed, 27 Sep 2023 13:39:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1695847171; x=1696451971; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=Z2gl2b6anOv/WlO5fpZiL7fnlm+S3JfLVNwXnhVppVA=; b=PiqzmrWG2yYX8T/Soaw/C99WQB2GfB1XGeWjx7ETGImUs5SuS6V33HLLm9vLxTfauj DEve4vWZssVZtOvBVGjlR69xFEIxmhNnY3xHMX4XTw2+UGDZDyRCvQluHUmnokulh03w E6JgEcGNWSdl5SzmQS3lWiV+fBCVmDmmbN93z799WgM1Np8oLoPwrpE+7m8UVqasNdmL Zgy5fPAM4EUMSHHfeRUjSpyjqO0OMbtxXRW2uKS2z52VwEEAyEuS9NaDEEywFIdvws5/ gNhoX1WKuQwFHmi607j6O8tEeGdDkIr92/LsFSGFvAzxJRwDU1FwdT4kUkG7Oy80yhr8 tN4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695847171; x=1696451971; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Z2gl2b6anOv/WlO5fpZiL7fnlm+S3JfLVNwXnhVppVA=; b=QKr0E/MnUvg+KdPMNzpcE5XnRvCVPJ7OgWXErwHvI6L7etSrVLt2oo2ORfsKf/fCek 0jsToGt7NtI9EccSYv3or3bHSlmhSh1j9UyY+nyS8GuEK5oiwGr+AG73BN9WDww40KvH HqKUDmykXEwIQmZZIjup6khDcSb6HAaqkwJhh4TpeNI7TWpart5JCCx0argWYFRJcNeC aZU7YAxlgQTHcdUyR1+aoRVLpmwsyrYWdl5nXbRDjS1DTOup3ggxWALlztyqYaaOs9q8 2NePEpHoE2l/37+xcTAqZppgJioFk4Dhz7q3U3zOIu9Vt794t710+H1ejHZuibKO9aZ1 I2SA== X-Gm-Message-State: AOJu0YyLS5oi2HxrQ0OmUMqnoLB9yj26yFTZzqZAaFvrn7COIpBaZ2ey Zf6IYuLpgA/mJt8esCxOGxqkPw== X-Google-Smtp-Source: AGHT+IHYckBTqMPbLNAfsNcVh7/kOSbCbqWFqji9OTMU0IgQOfgIaUmZJgp26NbpR8o1s0L4hy1XIw== X-Received: by 2002:a67:bd05:0:b0:452:b96c:313b with SMTP id y5-20020a67bd05000000b00452b96c313bmr3069566vsq.13.1695847171214; Wed, 27 Sep 2023 13:39:31 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:ba06]) by smtp.gmail.com with ESMTPSA id n11-20020a0cdc8b000000b0065b2be40d58sm1426496qvk.25.2023.09.27.13.39.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Sep 2023 13:39:30 -0700 (PDT) Date: Wed, 27 Sep 2023 16:39:29 -0400 From: Johannes Weiner To: Yosry Ahmed Cc: Nhat Pham , akpm@linux-foundation.org, cerasuolodomenico@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Chris Li Subject: Re: [PATCH v2 1/2] zswap: make shrinking memcg-aware Message-ID: <20230927203929.GA399644@cmpxchg.org> References: <20230919171447.2712746-1-nphamcs@gmail.com> <20230919171447.2712746-2-nphamcs@gmail.com> <20230926182436.GB348484@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 42DA740014 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: nhirbokhhignq69kwx8q6q43nhx9kgs8 X-HE-Tag: 1695847172-990329 X-HE-Meta: U2FsdGVkX1/gUVyLVfrE9O0SBBTAiyaD3OTz5mRmyeWNVPsR5+jwj8WVIgpIVh3PJSPQfkySSNqdPrvExI80AVau5Op+JkbwWHY09OnOWmF71HkiegoiJlvs+9lCIKpaWQI4dJZamMjQ8vkJvrrCtmdlRx7SP822Uc9gs4TJDWgxQVDsBqI4jhcSxWSFHKE/i+L7Z3qFpaZMC+oGO8BNTjR1LIGhS6cQLRqCghirKGqMv/gmrZcAnt9kom1fDQAtm717SChAs17g+MYSJKWfEm2rk3J4dYb+UIpBgIM2qG3pFKbdBKODBtjNwoKtiQEkQg3IgrKmu1+V3hFnTSXPjGbRZdOc2t80awTaNOxJyLmz65OtiigUhNvFKe8y/Rlvo6vz7+7tsqCXpS9Ma/tvu8YXdMb+5FrHrfXKdmLAGJMe4GPSX84HSzHQ3e2F8zXkuGdAT/swxLYSFw7UrU2qel6Up2K+VDo+F8mH8vdFCmZRF1FeHjjWoMql01SR96IqYdks8ZklZ2z+5DZ45U7lB95MqUnnxuBmCBzJOaHqjajw8AwwNodkRu3Z0Qy+daszASe91cC7zJn+G1rsnNMRGqPNleY4Zxd27d7hyf+NXyPjG9vBeXs6hg2vAjzyjIWa+58vfv6QaGKT97XtKFOIUVe3p6SgqzkCou2SAyBr5U+iBuJZlSC1FWtw19BoNkWW5aHIhSEbrwKM6X57B730jCCX+OlS1qd9FhdEoVPU49B5VIXuyb2cuhR/Nq2TI84UNeJh+fdviYtXSWs/AcoCQB3thV0eazIBEwgSEjniD9W8JV5whlvT9+gwJpdoC3PTPZWK+QS/Tgp7ZZ7cLQXxF+thz/ywgbuPbnkCho8iVswO2vIooEbRtJzwcrn/SCdkJjNUmehbpDvkOMZfc2VKbK7EkaklFTbBw/DRgGDoEYzDruSi3KxNjy89fKcEE/+23PVP/PbuRI5+xcSkuN3 HRJW3T7A IpoJConcLsEdqgGc86YVYfSmefVEr9g/1ICKE2UWVvptQoJYlvRJtRqR6Pt656Dby+j/XWAiHagtQtUvrA+4o386zfAJ0qBhwc0EQ4gpH+HMpWwJLPIbSKMxHUqyCQpQgXcKvG4n2GVV6CORE6oD7CV17eZFWKHWQrHOGuuHukZePlzL7PyIqFZ/iigs4RjPZyhVmC76qy7BTdKruYchFKeIoENxVkiQYlo8ZbW796wXJdGVs025/ytSYr0idVCZBLkBn5jldFYcj9YtBUK0lFkOjYbaO0gFvDrZ+W5Fgp3aXqp8P3v4bLauUYLHgpZdW9RrVNGQMBAT75Lfq0LkPSuhC89BUFQry6Chd2PlvRCe6x736z9tk7gQPemGGhKEpQ4pn3na7OS+HRTxsbzefHTl3G9o9p5LouQZB2Zfmwt9l7fTd2d4Wv7I7K/DKSs0Dq80mdlwtn+indm7rZgXwjDFp/kH2mZVfj9ZsxYdm7G27qYleuaTABeP0rjkYj93QHTtj//084eMnBXVPeTqqPPIx3LQsWzXWX5hWz/vFFRc/FyGOIC4TZO5Wvg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Sep 26, 2023 at 11:37:17AM -0700, Yosry Ahmed wrote: > On Tue, Sep 26, 2023 at 11:24 AM Johannes Weiner wrote: > > > > On Mon, Sep 25, 2023 at 01:17:04PM -0700, Yosry Ahmed wrote: > > > +Chris Li > > > > > > On Tue, Sep 19, 2023 at 10:14 AM Nhat Pham wrote: > > > > > > > > From: Domenico Cerasuolo > > > > > > > > Currently, we only have a single global LRU for zswap. This makes it > > > > impossible to perform worload-specific shrinking - an memcg cannot > > > > determine which pages in the pool it owns, and often ends up writing > > > > pages from other memcgs. This issue has been previously observed in > > > > practice and mitigated by simply disabling memcg-initiated shrinking: > > > > > > > > https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u > > > > > > > > This patch fully resolves the issue by replacing the global zswap LRU > > > > with memcg- and NUMA-specific LRUs, and modify the reclaim logic: > > > > > > > > a) When a store attempt hits an memcg limit, it now triggers a > > > > synchronous reclaim attempt that, if successful, allows the new > > > > hotter page to be accepted by zswap. > > > > b) If the store attempt instead hits the global zswap limit, it will > > > > trigger an asynchronous reclaim attempt, in which an memcg is > > > > selected for reclaim in a round-robin-like fashion. > > > > > > Hey Nhat, > > > > > > I didn't take a very close look as I am currently swamped, but going > > > through the patch I have some comments/questions below. > > > > > > I am not very familiar with list_lru, but it seems like the existing > > > API derives the node and memcg from the list item itself. Seems like > > > we can avoid a lot of changes if we allocate struct zswap_entry from > > > the same node as the page, and account it to the same memcg. Would > > > this be too much of a change or too strong of a restriction? It's a > > > slab allocation and we will free memory on that node/memcg right > > > after. > > > > My 2c, but I kind of hate that assumption made by list_lru. > > > > We ran into problems with it with the THP shrinker as well. That one > > strings up 'struct page', and virt_to_page(page) results in really fun > > to debug issues. > > > > IMO it would be less error prone to have memcg and nid as part of the > > regular list_lru_add() function signature. And then have an explicit > > list_lru_add_obj() that does a documented memcg lookup. > > I also didn't like/understand that assumption, but again I don't have > enough familiarity with the code to judge, and I don't know why it was > done that way. Adding memcg and nid as arguments to the standard > list_lru API makes the pill easier to swallow. In any case, this > should be done in a separate patch to make the diff here more focused > on zswap changes. Just like the shrinker, it was initially written for slab objects like dentries and inodes. Since all the users then passed in charged slab objects, it ended up hardcoding that assumption in the interface. Once that assumption no longer holds, IMO we should update the interface. But agreed, a separate patch makes sense. > > Because of the overhead, we've been selective about the memory we > > charge. I'd hesitate to do it just to work around list_lru. > > On the other hand I am worried about the continuous growth of struct > zswap_entry. It's now at ~10 words on 64-bit? That's ~2% of the size > of the page getting compressed if I am not mistaken. So I am skeptical > about storing the nid there. > > A middle ground would be allocating struct zswap_entry on the correct > node without charging it. We don't need to store the nid and we don't > need to charge struct zswap_entry. It doesn't get rid of > virt_to_page() though. I like that idea. The current list_lru_add() would do the virt_to_page() too, so from that POV it's a lateral move. But we'd save the charging and the extra nid member.