From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A33A5E7716E for ; Sat, 7 Dec 2024 04:44:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0D5C58D0011; Fri, 6 Dec 2024 23:44:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 081DF6B0320; Fri, 6 Dec 2024 23:44:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E8DED8D0011; Fri, 6 Dec 2024 23:44:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C1B1C6B031F for ; Fri, 6 Dec 2024 23:44:25 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 74B18AF3FC for ; Sat, 7 Dec 2024 04:44:25 +0000 (UTC) X-FDA: 82866921186.12.687A3F8 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf11.hostedemail.com (Postfix) with ESMTP id 2B89440014 for ; Sat, 7 Dec 2024 04:44:06 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bF3AN1IH; spf=pass (imf11.hostedemail.com: domain of yuzhao@google.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733546651; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VFG4/DDc1t3oTiANV495l58l0n5DpODqKmw31EcAeL0=; b=oTMT6BQpl18sE5i0qklwRaevUzlQ21R8Vj3LDhuJn1VBhD163giW4qFh2pzq4SIf/6+TLE ocWMA97z9gGPRu61+Y/sDtRizXsPAGuAhgUuDTwb3NlaaZr+1qEkOmkWjBWz2lR3eZ+VJS ihXIQOXw4KDqDmRRJMeSFRer9q2/FTY= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bF3AN1IH; spf=pass (imf11.hostedemail.com: domain of yuzhao@google.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733546651; a=rsa-sha256; cv=none; b=Qt+cu5JoLML7hcBYSOaCx6jXkugzKbgVHcw+2ExVe9Y3JrtPl52L4cbCqnxfCfYuoQRTz1 C/ymFFA61v73fFiYkKAIU2n6rCOKF7YOg4tfERLMfGw3YtCFpOXLWShro8qNaMTNCwwPnF iXAHnFMIWoT+EV+YuLQIFp8H//nbOKo= Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-21145812538so22947145ad.0 for ; Fri, 06 Dec 2024 20:44:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733546662; x=1734151462; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=VFG4/DDc1t3oTiANV495l58l0n5DpODqKmw31EcAeL0=; b=bF3AN1IHa5SUnVxjJfREXdEzIIxmlTczGwN7FU5UEOq0kslynIGqIBOz0VHMKQjIK+ hT8OVGQvFV0N4N7FCZqRpk+hSzWPUutjj9ub0VFCjvDBkX5fRd1Y7EQpP65GKdXiZ7dF oRiZt/LTrPaTqNiKNhMzo/GXhSOgQI4nKgxbthbBpIyULnkStnJ3D1FTgkBUr4T+Zt3y bABGsjWRpbnxyIeLh2agK83K6Xe+Oi+Kl05QuS7g4PLmJHzFs33ldttf07BtyPgPNNdC KUK1ydB56iOf3ZXdTBZwuPhdyI753FGO1QD2tkXjACW/O4o/altzvsG7eSzMaQ+I/Xm8 LPYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733546662; x=1734151462; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=VFG4/DDc1t3oTiANV495l58l0n5DpODqKmw31EcAeL0=; b=beX+/U53sJm/JMZdfY4XD3KEdEfpPreZV/z2SooleKV+wCuoxUE8H3z1Q/mVbYPmd6 hsyF4ycSvld7XGSvBBYrkpe38NWaInldNCzJ/pNdP8bFHNh28i2FT0NJwdxx0oPEQW1L yA/ZBw7l2j3cfjqKtmzFMCUHhXZNQaY+jFWQ6kkQO28Dv5ig9zMXUyC3vIaMpzXt6spF jZ0lusqMyH7b+3GcBIEIjt5J4gSRTuB0TjkKaiulptRi1ZVDibHayJy49IYk2qOkmTv1 Sy8HgqGHgKs0HbcwHgSorWznt2JGEofA9d3bJNdBHdB5j/+Rh++OJhdtdEEL7uDw7DA8 6Mvg== X-Gm-Message-State: AOJu0Yx/0TmX3eU9R86PsLyoP45hZlqVrIpPQ8+0D/o8GRKqAdN+tReB qyPc4QcLFx8c92bBt/q5LTlBmK+6bbwmc4qJNUex4A/uKfiHg0ypBJOtWEvRZg== X-Gm-Gg: ASbGncvM/XN2aRRTzi5Jrdghfejq8pM79Ea15cYRoRoFDM+LsSebozjmPPe7gEkAEev kd7k30SeznuGCB5txUw8M8BmOWaR95G8o4BOmCLAGQDl/NprN9fhmestNeOFLQY+YQimOInweXE WqfA1rT8j40QiMGhBLd7bCJLewoT0xTcXT34ohR6aCEPSWgFZ6dPoE/nAwDFsBHBeO7aKQKbbJw DEhMfXYXOfFBD+m6SX9dcT520ABNEnHz2Imsyb1snZki1JbLbAVjjUcU3OhGm7ZqUNZR7FtglKG ISyX X-Google-Smtp-Source: AGHT+IH2kG310mYk49qwnHG1UmbpOXayl+lCDmcKkjax9K+JDIj0BMgxcTsXChwYTuPKX7SaehsPsg== X-Received: by 2002:a17:902:d48c:b0:215:7b06:90db with SMTP id d9443c01a7336-21614d687a6mr65578985ad.13.1733546662069; Fri, 06 Dec 2024 20:44:22 -0800 (PST) Received: from google.com ([2a00:79e0:2e28:6:c5e3:4c27:49a5:9726]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-216268c36c8sm8053835ad.253.2024.12.06.20.44.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Dec 2024 20:44:21 -0800 (PST) Date: Fri, 6 Dec 2024 21:44:15 -0700 From: Yu Zhao To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kairui Song , Kalesh Singh Subject: Re: [PATCH mm-unstable v2 6/6] mm/mglru: rework workingset protection Message-ID: References: <20241206003126.1338283-1-yuzhao@google.com> <20241206003126.1338283-7-yuzhao@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241206003126.1338283-7-yuzhao@google.com> X-Rspamd-Server: rspam05 X-Stat-Signature: d7eagfipd49irmggrrcwoffxs1ppkgxk X-Rspamd-Queue-Id: 2B89440014 X-Rspam-User: X-HE-Tag: 1733546646-231074 X-HE-Meta: U2FsdGVkX18sXSkkFMtY/IYTSGOqgirgRkflFzKnq51Xt9J1aQEha7Z9TVGB7epk52QQOdkhCQQKbRQQgFOi/gNlFftC7RMa7E0tc8OVRmsyqwlAflysv1GLPk3YOdKB0BxUmmDVowqoRq3mpXal5nD7DwisO41DbIgE6rn2dTQEBEjSYTVAacglKiRLqg/c9ZFvytufQ5kUIbXxFcaBxff3tyk/mI7/I4ttf/FB0fl8hsfTEFxpAqXSJS+QXonN0QKgVS0sXpiPaZAT5IHm1ZL0ov0mTlA6uJnXufPpzcQlBgcEIkiU/YlqI6eAkA5v7fjaQxfy4ymN0X8KPEWuUe632X8EcZnam+bwPYKw1Yu5J5//Y95yyrmu6q9CBvfZJfKVsyyaw6Xkn0/Z4s0ZhNKFD0JLbqsraGSZmHukmV40G04dfaCRtr1Xm8PL2nDPikTlop+B+JaUyDl+2jIQb+hd+3V5NL2bay1DG0cOohekeerUswWCoZzpc4tX7Rw03l/6qxcOnrWGGBXjMEwNmG3yzR33n7yUoFpJZI/yQ8P321b9K1F/Mehrsmvc6UWACM6yKM9r9SuWZEQ+5oeg+hmY9UgXdeCmmf7oZ8yPZmA10x7m/Tqb4tPI5OP+8ODCf29HWkqJW26jMA4vreh2RywvLOJTqRk58DvR6fVV175CCvyRtHPzjwZ+SMmBtwiMUmMYp06oUf4GAmKZV4d9+cQ/asc4F17Lrk7PWXUyHr8BLLqQlchvwWF8YBYsbQOTJb5q3YOs+dzf8u2s1OJ6wbU09gD755Y2alV03rVhbVWP8UPbv87F7fvqXtdiNTYf1greYRfwrCw0qlAVXApjpR2x80AUf6lE8vwIPCqkxqweKHISU0+WYVOU4tSQe/YWd/o6UwhW2psqK/UwEickh9sRxBOfGzGpdsDEg5qfne26iFYWiERgJ38G51D2G6t219bEqJZTPIx6Z7oVbA0 9uC2Nvy/ SCJNV6eJIx6KMje6Ezygw3gcw84hsMyYTsid9DPslIfFqiJhkCrNd24XAPdwUjiyK9/rnnusknNcOO5+V+TJv3lD1srJpC93lO/JAR6v+kwOv/hBEf18ywnPInK3J4qySw66IZSK5wK6xnduj7pAGfElWC2L81/sOr5kPyKX7iPjvRjMrfkSfBuD0rzm4jS/1I85GF00xrEF5YziTKTD0g6jY7LO+8rv+nvQ7/gQTnAUj690bdq+B/VlhXdI2dnUJa7C/hUA0fmKZYU1F9jIYSUt/qB1321jTn67WJns2WzMixp/F+eofqpLjcoVmJG7E0ibW3TNO/pBX0pa1UUYyLWkG2LsM6qhOYKMmR2gBRWLoIo4S/CF8CvfdC249CHwV1JuPlV95lgaH6RscJ2zqZvwGVIL3DYJVspZhTLt6Yu1ACVDOXW4m/4X2UPSXxqTMe1migODsIaT9yol8YwVccrs2a7YcbgUZnVnnE4pjCcN5cEs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.106450, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 05, 2024 at 05:31:26PM -0700, Yu Zhao wrote: > With the aging feedback no longer considering the distribution of > folios in each generation, rework workingset protection to better > distribute folios across MAX_NR_GENS. This is achieved by reusing > PG_workingset and PG_referenced/LRU_REFS_FLAGS in a slightly different > way. > > For folios accessed multiple times through file descriptors, make > lru_gen_inc_refs() set additional bits of LRU_REFS_WIDTH in > folio->flags after PG_referenced, then PG_workingset after > LRU_REFS_WIDTH. After all its bits are set, i.e., > LRU_REFS_FLAGS|BIT(PG_workingset), a folio is lazily promoted into the > second oldest generation in the eviction path. And when > folio_inc_gen() does that, it clears LRU_REFS_FLAGS so that > lru_gen_inc_refs() can start over. For this case, LRU_REFS_MASK is > only valid when PG_referenced is set. > > For folios accessed multiple times through page tables, > folio_update_gen() from a page table walk or lru_gen_set_refs() from a > rmap walk sets PG_referenced after the accessed bit is cleared for the > first time. Thereafter, those two paths set PG_workingset and promote > folios to the youngest generation. Like folio_inc_gen(), when > folio_update_gen() does that, it also clears PG_referenced. For this > case, LRU_REFS_MASK is not used. > > For both of the cases, after PG_workingset is set on a folio, it > remains until this folio is either reclaimed, or "deactivated" by > lru_gen_clear_refs(). It can be set again if lru_gen_test_recent() > returns true upon a refault. > > When adding folios to the LRU lists, lru_gen_distance() distributes > them as follows: > +---------------------------------+---------------------------------+ > | Accessed thru page tables | Accessed thru file descriptors | > +---------------------------------+---------------------------------+ > | PG_active (set while isolated) | | > +----------------+----------------+----------------+----------------+ > | PG_workingset | PG_referenced | PG_workingset | LRU_REFS_FLAGS | > +---------------------------------+---------------------------------+ > |<--------- MIN_NR_GENS --------->| | > |<-------------------------- MAX_NR_GENS -------------------------->| > > After this patch, some typical client and server workloads showed > improvements under heavy memory pressure. For example, Python TPC-C, > which was used to benchmark a different approach [1] to better detect > refault distances, showed a significant decrease in total refaults: > Before After Change > Time (seconds) 10801 10801 0% > Executed (transactions) 41472 43663 +5% > workingset_nodes 109070 120244 +10% > workingset_refault_anon 5019627 7281831 +45% > workingset_refault_file 1294678786 554855564 -57% > workingset_refault_total 1299698413 562137395 -57% > > [1] https://lore.kernel.org/20230920190244.16839-1-ryncsn@gmail.com/ > > Reported-by: Kairui Song > Closes: https://lore.kernel.org/CAOUHufahuWcKf5f1Sg3emnqX+cODuR=2TQo7T4Gr-QYLujn4RA@mail.gmail.com/ > Signed-off-by: Yu Zhao > Tested-by: Kalesh Singh > --- > include/linux/mm_inline.h | 94 +++++++++++++------------ > include/linux/mmzone.h | 82 +++++++++++++--------- > mm/swap.c | 23 +++--- > mm/vmscan.c | 142 +++++++++++++++++++++++--------------- > mm/workingset.c | 29 ++++---- > 5 files changed, 209 insertions(+), 161 deletions(-) Some outlier results from LULESH (Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics) [1] caught my eye. The following fix made the benchmark a lot happier (128GB DRAM + Optane swap): Before After Change Average (z/s) 6894 7574 +10% Deviation (10 samples) 12.96% 1.76% -86% [1] https://asc.llnl.gov/codes/proxy-apps/lulesh Andrew, can you please fold it in? Thanks! diff --git a/mm/vmscan.c b/mm/vmscan.c index 90bbc2b3be8b..5e03a61c894f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -916,8 +916,7 @@ static enum folio_references folio_check_references(struct folio *folio, if (!referenced_ptes) return FOLIOREF_RECLAIM; - lru_gen_set_refs(folio); - return FOLIOREF_ACTIVATE; + return lru_gen_set_refs(folio) ? FOLIOREF_ACTIVATE : FOLIOREF_KEEP; } referenced_folio = folio_test_clear_referenced(folio); @@ -4173,11 +4172,7 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) old_gen = folio_update_gen(folio, new_gen); if (old_gen >= 0 && old_gen != new_gen) update_batch_size(walk, folio, old_gen, new_gen); - - continue; - } - - if (lru_gen_set_refs(folio)) { + } else if (lru_gen_set_refs(folio)) { old_gen = folio_lru_gen(folio); if (old_gen >= 0 && old_gen != new_gen) folio_activate(folio);