From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A12CC27C4F for ; Fri, 31 May 2024 20:32:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AE6316B00A1; Fri, 31 May 2024 16:32:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A94C26B00A2; Fri, 31 May 2024 16:32:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 935046B00A3; Fri, 31 May 2024 16:32:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 776A16B00A1 for ; Fri, 31 May 2024 16:32:00 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D2ABDA08CD for ; Fri, 31 May 2024 20:31:59 +0000 (UTC) X-FDA: 82179837558.25.1D10F01 Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) by imf01.hostedemail.com (Postfix) with ESMTP id EEE3F4000D for ; Fri, 31 May 2024 20:31:57 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0QSDTQnL; spf=pass (imf01.hostedemail.com: domain of yuzhao@google.com designates 209.85.128.47 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717187518; a=rsa-sha256; cv=none; b=loK050jnJ+zbdv+yw03waJywdblv87/4fJL1WCIdox25jT3l9szmw5eMqMTQf+qYL76rIY 191tmAJV5oWx1GfWDbeRNVuvN0zBlumqLS3H79Ju5yR+JyyJ4C+MZVJ58Cxaejci6on6Id f2RXK+hONNBKqqRtzuItSS2lMAtjHLM= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0QSDTQnL; spf=pass (imf01.hostedemail.com: domain of yuzhao@google.com designates 209.85.128.47 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717187518; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=InPM3IgrwlWAIe+nSHKsuhmVDndDnlj7rMcdOWuFi5w=; b=bfPbAcK2b788Rtv0KJZGTSV+yss7axWEuUI0AAFBE0mqgWkFR6hbQJZugleJWh3IbhM21U py2v8o+P4E0K9wCSkac34RNSA2gJOoSdQ3XL1m5TK99NIdY3oHlXH+29M+YlSetSsI6kxr 2PhUPvhgsMXIbcPifO7KjfZRJbtyEJk= Received: by mail-wm1-f47.google.com with SMTP id 5b1f17b1804b1-42133fbe137so4895e9.0 for ; Fri, 31 May 2024 13:31:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717187516; x=1717792316; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=InPM3IgrwlWAIe+nSHKsuhmVDndDnlj7rMcdOWuFi5w=; b=0QSDTQnLgA99VB0rDC7+nEte3pLMhUJ9WIGmTVsNng8h0DPv58KJsoeUp5DEtD4a1N /47hDR0STu/qLZTFLxTB/56mFezY+z/ohl2AW7CX9N4UFYRcXmM5u3pctXSLXVJKM67N wItrLQqAgQwFDC9Rwiiz5r9w8mKiFR2eqjfYIZtDBtaQXmiNPjgo4079QOdpTPUHzGyQ BaGUiO+LDUkzszgtuEmuieWw5l8vpS/j51br5wHnRHI5U0fX3GLNKs/b/zCTu4Gd1mqY gT/SyUzFHwGzgJcWA8esMrbgp2D3EsRCSRVG4eR0Pry6pSOLvDvgatx9TY5b2/gyehG3 bLeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717187516; x=1717792316; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=InPM3IgrwlWAIe+nSHKsuhmVDndDnlj7rMcdOWuFi5w=; b=HACSDCcMUNKKgTyy3gYkgMjKD5KHVa92tpnHSBL4rBRf/x5jxS3pc6fgd7jHF0aTsO whxVuup4GXsDOXogyqKzbS7SyIQYXzEa1SqFnhRa0nSnSMTdOeccAYXh2coZNYwtpuFU McSG+5oC4jbb++f0gzzTcy+LDLBqx5puJwdU9vBH299NXnk/5s9AM1hGQ3frxyckZKL9 taihvNxXkIbq8TSvlslt56OlXLDdKEankloylkM0DvfLrNebmVxlUo6NJGmxIa9bObIj 7hmzjT+RPBWPV127aLwGRoXr4jULUDr9he3/IkiSYjohiU4zMINzZKx3TtuDR4HlH6Xy +ftQ== X-Forwarded-Encrypted: i=1; AJvYcCX87l3cLPcwKPpJ809+ulX2xglTF20KgmPDWMsDMKsPgLWdwTRWHxDMWYTBWHV80WyjVPkFOjq3k/wHpCoSbh9esCU= X-Gm-Message-State: AOJu0YzU0Q6jNh6nNhV1IWXSyrecywBu4wb+IZT0zUzOWJJkYbN56zt/ Xs3zoiei9jQTJnr/hxkXow16buY+wAgDDSG3ft7r/opqOd4vnShs9vttJfnApWfl5uJPIEAiPjs OBjIADs2aVnNLXIHOnfnOHt8TscwtoL9UJYdO X-Google-Smtp-Source: AGHT+IECr7zScct/L9KDZG2FMT9+ZV9pueXumB1G4qrj8mkBEYnZr/Z/kM44zsq5c/gUQqURoi9spkN6RMIRJ4Ct+FE= X-Received: by 2002:a05:600c:299:b0:418:97c6:188d with SMTP id 5b1f17b1804b1-421358ce41bmr41075e9.7.1717187515962; Fri, 31 May 2024 13:31:55 -0700 (PDT) MIME-Version: 1.0 References: <20240529180510.2295118-1-jthoughton@google.com> <20240529180510.2295118-3-jthoughton@google.com> In-Reply-To: From: Yu Zhao Date: Fri, 31 May 2024 14:31:17 -0600 Message-ID: Subject: Re: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging To: Oliver Upton Cc: James Houghton , Andrew Morton , Paolo Bonzini , Albert Ou , Ankit Agrawal , Anup Patel , Atish Patra , Axel Rasmussen , Bibo Mao , Catalin Marinas , David Matlack , David Rientjes , Huacai Chen , James Morse , Jonathan Corbet , Marc Zyngier , Michael Ellerman , Nicholas Piggin , Palmer Dabbelt , Paul Walmsley , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Shuah Khan , Suzuki K Poulose , Tianrui Zhao , Will Deacon , Zenghui Yu , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: EEE3F4000D X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: 61tpc7mpnhqiam37s5wywgs89xgusmbb X-HE-Tag: 1717187517-426843 X-HE-Meta: U2FsdGVkX1/96zco75aWz8K1Funy2PpyLoWVtd6Sm+7B/aCAaLaJEpEj4vSlHbL+TaRnyv7YM0fylpvkBHf41CoxUisqZSGir9JUezpw9GPhFMMlMTWckN0PB9yrYEZSi1YC9ZF7HXwlQDR9JDWfwn2Ja3U/WKQysg/F/ejs4G02/5pjCCQyDVQkKAWIn/b4yZdk9hu/qVx0DAvsPGGdMjTZbXJDBmfXd6bVP4hnUjX+6t4rOkf/XsmaHj3dnkpyGsS+/Hw1C2nX8xNBizW1/1SRdHdDGTdkztOwokshHYVbwLoUgBp4PQQ+67RGrVCiXbkvf6bMddZPaABbcEtOEaM0+wNKKCc/i5/iaFXM4+I2aubUT8yrbuVQ7cyMkczSuEJeaJCfkg6Wbotx9D9JBmLJKJTothonES5ZoGuFjRv9vQTYKBD315txCh6l1o7PJsBzCcrIsJMx2QGCSSd8rUqk5F07pWPTEtqy2Nth29SFO37Btikk403NUg498zfNkECwnAASY4uGu2kBvA0NOHa6hLn3u7IB1ng9BgTPdRBLQ347UwKc8VTZpsHPDUuSCkg5oCdiJf5ePyGx/4h/taIB+IiHre8G4dKbtKOD1GrWyocQXOXJchpJ4QIY/Xf2yjJ1ePjMLZKy+1pBCTWVu6Nx4YSDPhX4NUil1wdQOjSwbdryElR7tkR/vubrLzRtw/g2S1wm1oWVuS/WEh7ByiioSLQTp9NQs9OHvonXpJ7zOI/NVjWtjZT8hPj8/AkXFPTWX9Iv1g7+ELeJkFq0ILGPyy1Y9y3NUs8lp3/TYo/6UafD4nj/BbTOxsdB8fMaxpql6CdAHSMkLq59qEqR6mrfsGJXemWO9Xmk0+rXyymzYVz1l1fJkB36s9GRFFrz0QLXfM39llPtj3RXjviphxJ1RQtgL6o/cXahPSUlH4+RqeYqPqfRBXj2+0OSCZ6J4Ff1ZtNlNF8/I0OWAoW SsuaKHzY uo/b5g+NYFgmh5HrLjHm1xnSMA1o99KVrAiaDDQSccqLp+HrbF2R2aNkCuQ4Qj007VDUkmMSbds+x/pkPL11+AcPecZYnp1bpDqtkevJz8lS2y/Itk/5z2ImnJUZN1V/r+r661iBrCgHfDvQMGcN+YWWFQzBXrFUX4aygv05RSoSAZ/AY4v0WZ+8ejYtDZrTUubHeZAuznA/7FXXEzJ1jgvuI6Vh9+0/S4IaxgtA/sAi56/NkX0Lh1bk8v4Ge3Xejyal/XvdyVOiQ9R1BagsRFeusZAPMKScH8B0Z X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, May 31, 2024 at 1:24=E2=80=AFAM Oliver Upton wrote: > > On Wed, May 29, 2024 at 03:03:21PM -0600, Yu Zhao wrote: > > On Wed, May 29, 2024 at 12:05=E2=80=AFPM James Houghton wrote: > > > > > > Secondary MMUs are currently consulted for access/age information at > > > eviction time, but before then, we don't get accurate age information= . > > > That is, pages that are mostly accessed through a secondary MMU (like > > > guest memory, used by KVM) will always just proceed down to the oldes= t > > > generation, and then at eviction time, if KVM reports the page to be > > > young, the page will be activated/promoted back to the youngest > > > generation. > > > > Correct, and as I explained offline, this is the only reasonable > > behavior if we can't locklessly walk secondary MMUs. > > > > Just for the record, the (crude) analogy I used was: > > Imagine a large room with many bills ($1, $5, $10, ...) on the floor, > > but you are only allowed to pick up 10 of them (and put them in your > > pocket). A smart move would be to survey the room *first and then* > > pick up the largest ones. But if you are carrying a 500 lbs backpack, > > you would just want to pick up whichever that's in front of you rather > > than walk the entire room. > > > > MGLRU should only scan (or lookaround) secondary MMUs if it can be > > done lockless. Otherwise, it should just fall back to the existing > > approach, which existed in previous versions but is removed in this > > version. > > Grabbing the MMU lock for write to scan sucks, no argument there. But > can you please be specific about the impact of read lock v. RCU in the > case of arm64? I had asked about this before and you never replied. > > My concern remains that adding support for software table walkers > outside of the MMU lock entirely requires more work than just deferring > the deallocation to an RCU callback. Walkers that previously assumed > 'exclusive' access while holding the MMU lock for write must now cope > with volatile PTEs. > > Yes, this problem already exists when hardware sets the AF, but the > lock-free walker implementation needs to be generic so it can be applied > for other PTE bits. Direct reclaim is multi-threaded and each reclaimer can take the mmu lock for read (testing the A-bit) or write (unmapping before paging out) on arm64. The fundamental problem of using the readers-writer lock in this case is priority inversion: the readers have lower priority than the writers, so ideally, we don't want the readers to block the writers at all. Using my previous (crude) analogy: puting the bill right in front of you (the writers) profits immediately whereas searching for the largest bill (the readers) can be futile. As I said earlier, I prefer we drop the arm64 support for now, but I will not object to taking the mmu lock for read when clearing the A-bit, as long as we fully understand the problem here and document it clearly.