From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4714AFA3730 for ; Fri, 13 Sep 2024 08:51:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D5DF96B0096; Fri, 13 Sep 2024 04:51:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D0DF66B00A4; Fri, 13 Sep 2024 04:51:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BAE596B00B0; Fri, 13 Sep 2024 04:51:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 96B346B0096 for ; Fri, 13 Sep 2024 04:51:04 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 5015241F14 for ; Fri, 13 Sep 2024 08:51:04 +0000 (UTC) X-FDA: 82559095248.28.E8893FB Received: from mail-yb1-f176.google.com (mail-yb1-f176.google.com [209.85.219.176]) by imf23.hostedemail.com (Postfix) with ESMTP id 5EAC8140002 for ; Fri, 13 Sep 2024 08:51:02 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=raspberrypi.com header.s=google header.b=LL8H9XSi; dmarc=pass (policy=reject) header.from=raspberrypi.com; spf=pass (imf23.hostedemail.com: domain of phil@raspberrypi.com designates 209.85.219.176 as permitted sender) smtp.mailfrom=phil@raspberrypi.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726217456; a=rsa-sha256; cv=none; b=d5cMknfGz/9QHWBuETeMau21kvidvQuVF75EKhqln2gOyUgbC9yE67UWNr7t6sK9ZyLjEP lDgE1/5qsaEx8q0+TC79/PRH3HzWheL4pukBmNgEksg5YbThqtYecdJTGqV/VtzuWD4DlW P2z64MVyRVJaVObrVDBeiamTu7B4DZE= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=raspberrypi.com header.s=google header.b=LL8H9XSi; dmarc=pass (policy=reject) header.from=raspberrypi.com; spf=pass (imf23.hostedemail.com: domain of phil@raspberrypi.com designates 209.85.219.176 as permitted sender) smtp.mailfrom=phil@raspberrypi.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726217456; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Jrr0WrAjQ2YAJSM4VyMh9qqVSn+p5loNOOOpJRpA5Mg=; b=uD1QNi7/68wdB9VTceKfIqx+ToXTCKwtOEC8TbeMvAPqxGOSuhySXOINfhaBpiEgvu9Mer uXY/ovjHjKKZF63Z2HGx4vYb8t6zyIhhrUK1ee3EUh4rFjSGD+3P4cG3pAcNdlwY7Phn++ EY6spbkZJTPRp00lZ6yZN44vpxzvnec= Received: by mail-yb1-f176.google.com with SMTP id 3f1490d57ef6-e02c4983bfaso1915835276.2 for ; Fri, 13 Sep 2024 01:51:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raspberrypi.com; s=google; t=1726217461; x=1726822261; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Jrr0WrAjQ2YAJSM4VyMh9qqVSn+p5loNOOOpJRpA5Mg=; b=LL8H9XSi/3APeZU74WCdDMR/QiXTcmsEDTIwxSDFXr4N2EFQ0sBPPxkgOAWouW03Yt hK4EZFh0lXDYAPTCwTHZGY1xuKY1HlDHqj7+3lvzyY7fnYMrL/8HhRQ5HcY+NjRpreFS jHwi2ttDaH8Y9SY6q/tD39Ic1fs3cE9gVXSxiYEPdOMg5G60WT5+XX2YNiHSqd+uu9qH 4nAimq9mF9E5PRLMfJDI3FsGFCR1IUfH6np2N2+yZh6AaH1I+EelF6+3OuziGlsb6CC+ zWDfZFAPOFMn8I/JIJDuP4/xRqbmRaQWj10s/RpO+r1MlhwoNV/okbSbi5eiobt7Cpqq CfpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726217461; x=1726822261; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Jrr0WrAjQ2YAJSM4VyMh9qqVSn+p5loNOOOpJRpA5Mg=; b=LBTOt1ZMlaFMAOYSvlh06QZI1GtAtSN38UKd0z8n31wqHRDCOw71FRxiqhSydrC2Ma MXPd2WcFP7FD5BPUxj5ELr7yfhuQ/50LUXWgxSd2poZcmSJvlMN5oC3HHyuo4Nkg15Uz GXqKt+DzcyJJozDxZYp9jp5FOxIDqWFfPibTK9Mw3EoDNzxhciRae/RlOrOpEbag8FA8 DoL+WDYtXDz6WVIt6HtTCDYjshntnKWS3QRl0meTKBp/C7MFRlKgY4dintPQoWMwsjVY X8wz82xkWGINTGN+jET1hYsL2pO9Z10aI/Jy7tg5kQGs4JTO1JkSCbR/U3N9M3r4CWuW WLhQ== X-Forwarded-Encrypted: i=1; AJvYcCWXIL42MqkErzG6KuAUWlYBioYqsMsRxNrHRvg7fE9Ug+K+Iv3NymV5tBwE3AHldBo67H9HS/04Bg==@kvack.org X-Gm-Message-State: AOJu0YyTdictrkNNpd4GQHWn8VwgcJn16rp+Voi+iYCbZHp1vqwWS/EX JTIV120B7qp7nvjtyK6r14t9/2KGJtL07HUmwYgBvW49tQ6IAHruwp7gP7QBSI3ppGtqRpCG+S2 NnsUTPz031jO6TvGxqsJpjR1GoSiRFG/W+2ruQQ== X-Google-Smtp-Source: AGHT+IHrl59h8Vr+tvImYvWz25g67Ss0A1tuBL7G1cAxSNKR7QAhstodp4YP8SrA/QvNPfz5ziFmohhpEnQw+BMH1pE= X-Received: by 2002:a05:6902:1690:b0:e1a:441f:3208 with SMTP id 3f1490d57ef6-e1d9dbbe452mr4975881276.23.1726217461237; Fri, 13 Sep 2024 01:51:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Phil Elwell Date: Fri, 13 Sep 2024 09:50:50 +0100 Message-ID: Subject: Re: Questions about TLB flushing and lru_gen_look_around To: Yu Zhao Cc: Andrew Morton , linux-mm@kvack.org, linux-rpi-kernel@lists.infradead.org, Linux ARM , Will Deacon Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: y8eer5s16yps8s6yitepiub6xysm7nmb X-Rspamd-Queue-Id: 5EAC8140002 X-Rspamd-Server: rspam02 X-HE-Tag: 1726217462-678778 X-HE-Meta: U2FsdGVkX18Wq5755tZQ+cCuWS3C80WWuLFeBdAC9b9OUdzS3qYmTxiXYkkB1HSqx4d9j7FH3sHHMjBZ9Yxo8L9a/NAUhobh6pzys3Xw2nEsVyjJA7S8G4eyWUrOOPztQwpYFb5CU0C7AMKBTlLoCa5CCG6Rju6dUU1LDRSVq7EDG8Ec9ye9oJN7rfWJ6Tv45niNcCA/RYuiJy+mE7dUx0iW/aEaZ4YwdUHguRXJ4dXJ5snYfLMK6nonloaMzrUA4XhBmPuoYt3q0IdWPVUueSghqjH8TkNFlCYX17UlFNFhqra112ph/T3OBa99i6+ULyQj+fmT/jmpXaBhHtt0wxj6y9JOJqAzhJtjQEUA3iVQV4AjAuasDMowepqVASM2uipdxvfRiZwWbrfDp2p24iZmyxmwC00cen8LIfEA8pZB0P84UVGgaL1cSY/mk+UL26WUsgwxsSEIpv0WbLfIAtGOegLEQYwvpGdIF5MXRfoTyOXx45IvBaqeg369ZygBELVOBZvIttLdPDZbPwItHAAR2AdpaoHM47BeJrWVWG9FszpLtglw6ZMcWhYvzJc7SiXGkSDtOhD38RbfYCCHglR0aomGpkzT2Lp/E3bTJFsuBkJB6geIdAj7TfyK6laWMnHzdmZKkS/tUwjL+LdgPewo/2bei3357sgEUAW6ZIJIxJWCKVeD4hM2vqhEa25UKigYP9X2l5upe9dg27DCj71aYZkr5I5KYAFm2UEHlquc8++2grvUqd4SFYzdMB+YYliS3MssRMcW9nomfCXUJhK+CJdAcjPcHER/aHEO0RfUfokyyKJcEDetoYRC5x1QAhVzbpXZ1JtP1yzAwYQhyFbwEx1Owowv/rdd6lUBzQxv/NVGceOEFHmPlY9yiLAwfIvm0BmJOc2E1DedgaXVPKaEwfKyi8rUoZb8SHlwl1oWJEvPNx91pGWOWATXp9iIkuSwmc06q27I/eXR48j svcod/7V 7bRnbJk2Jk9BKWa6wGAa1TOu+yH/6woDYAD8ir3re45e6qoMWCz52TUim3uy46NxzULqmbeFODjmhs7a6aGpaNfs4h1WbYr0HgV6gL9Dw8v6M9bQzpBKQxAF+DgFYNj3Q25MwnwNH1dtm87rv4Ki/ARw03fHGbpxBpbUWZvq+Qp9R8jyW4CNKIEK8moJzuGmYcwDDTaZOwJICi2aqLd6X/91gFJINev1jWbGhapyLYbwDW08R0VazEd06gMf4rod3vZETpbGfJ1tGa6/yE6vfNN/fNJeWXfacRIKFSGtgqnuTTCQevOKyqzZyCUSrd0wkoEd8pjOGuw2Xb1UHe4g2cVVzhAsu40+Mc+NSwhXdveMTI/w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Yu, Thanks for getting back to me so quickly. On Fri, 13 Sept 2024 at 04:59, Yu Zhao wrote: > > Hi Phil, > > On Thu, Sep 12, 2024 at 7:03=E2=80=AFAM Phil Elwell wrote: > > > > Hi, > > > > I've spent many hours recently trying to diagnose a problem that > > manifests as a CPU spin, under load and memory pressure, that can last > > for many seconds. The problem can be seen on our downstream kernels > > from 6.5 onwards, when built for ARCH=3Darm, running on a Pi 3B (BCM283= 7 > > - quad A53). I've not tested a pure Linux 6.5, but this is not a bug > > report. > > > > Pi 3B has limited RAM (1GB), and it was discovered that restricting > > this further to 512MB made the spins more frequent, as did adding > > other processes. Running an ARM64 kernel in the same configuration > > leads to normal OOM behaviour. > > > > I traced the spin to a loop in __copy_to_user_memcpy where > > pin_page_for_write fails repeatedly, sometimes for hundreds of > > thousands of times. The pin is failing because the user page in > > question is marked as being old (L_PTE_YOUNG is unset). When this > > happens, the code tries to freshen the page using __put_user, but in > > this case it is not triggering the required page fault. Digging > > deeper, it can be seen that the PTE in the ARM's shadow hardware PTE > > is 0 as expected, but clearly the MMU is not seeing this otherwise it > > would be faulting; a TLB flush for that PTE fixes it. > > > > The TLB non-coherency for that PTE can be attributed to a call to > > ptep_test_and_clear_young from lru_gen_look_around, which clears the > > L_PTE_YOUNG bit in the Linux PTE > > Yes, it does that. > > > and zeroes the hardware PTE > > I don't see how it can happen, or why it's needed. Could you explain? The ARM V7 MMU lacks many features, including the YOUNG flag. To work around this lack of features, Linux maintains two PTEs per page - one an idealised Linux PTE and one for use by the MMU. For situations where the MMU needs additional software support, a zero is written to the hardware PTE to force a page fault. The maintenance of the shadow PTEs is handled by the ARM-specific set_pte_ext method. > > but doesn't call flush_tlb_cache. > > Correct, and this is because that arch-specific API currently doesn't > require TLB flushes, from the MM's POV. None of the current callers > does, I doubt they were used on arm (32 bit) at all, except MGLRU. > > > Two possible "fixes" are: > > > > a. Replace ptep_test_and_clear_young with ptep_clear_flush_young, > > which includes the TLB flush. > > b. After the loop over the page range from "start" to "end", include a > > call to flush_tlb_range from "start" to "end" if the "young" count is > > non-zero. > > > > My questions are: > > > > 1. Which bit of code is meant to take care of TLB coherency where > > lru_gen_look_around has made changes? > > None, since the API doesn't explicitly require it (or at least the MM > assumes), as I mentioned above. I'm new to this area, but I think this statement is wrong, as I'll explain = next. > > 2. Between the two patches a) and b), which is preferable? b) would > > seem better if IPIs are needed to broadcast the TLB flushes, but it > > seems that BCM2837 has new enough CPU cores not to require such > > broadcasts. > > Could this be fixed within arm? If not, we would have to update the > requirement of that arch-specific API. This would affect other archs > that don't require TLB flushes, assuming they exist. And we would need > to fix all callers of ptep_test_and_clear_young() in MM. I think it has already been "fixed". Both ptep_test_and_clear_young and ptep_clear_flush_young have optional architecture-specific implementations. The fact that both functions exist - one with flush and one without - says to me that the non-flush version is intended to be used when coherency is not required (yet). There is evidence to support this in the X86 implementation of ptep_clear_flush_young [1], where the comment says: /* * On x86 CPUs, clearing the accessed bit without a TLB flush * doesn't cause data corruption. [ It could cause incorrect * page aging and the (mistaken) reclaim of hot pages, but the * chance of that should be relatively low. ] * * So as a performance optimization don't flush the TLB when * clearing the accessed bit, it will eventually be flushed by * a context switch or a VM operation anyway. [ In the rare * event of it not getting flushed for a long time the delay * shouldn't really matter because there's no real memory * pressure for swapout to react to. ] */ I think functions that care about coherency should be using ptep_clear_flush_young, trusting the architectures to not perform unnecessary flushes when coherency is already guaranteed. > > 3. walk_pte_range has a similar loop, but it seems it doesn't need to > > be patched to fix my spin, possibly because it isn't called. > > Correct. > > > If a > > patch to lru_gen_look_around is needed, might one be needed here as > > well? > > No, because that code is disabled, unless hardware can set A-bit, > e.g., arm64 v8.2. Thanks - that makes sense. Phil [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree= /arch/x86/mm/pgtable.c?h=3Dv6.11-rc7#n597