From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E00DDC4167B for ; Tue, 5 Dec 2023 14:43:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 55DF36B0080; Tue, 5 Dec 2023 09:43:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 50D836B0088; Tue, 5 Dec 2023 09:43:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3FC866B0092; Tue, 5 Dec 2023 09:43:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2BC686B0080 for ; Tue, 5 Dec 2023 09:43:08 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E9A20120148 for ; Tue, 5 Dec 2023 14:43:07 +0000 (UTC) X-FDA: 81533032014.12.F77EEF8 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf16.hostedemail.com (Postfix) with ESMTP id 9D84B180024 for ; Tue, 5 Dec 2023 14:43:05 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701787386; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U51LSDaqyuiEPq+4FkR+5edG2tUVhnJX9Dfx9GtfI8k=; b=Qd4SKhxS8FjdEm7EgtVE9Q6U+lgytQIhlYfVX1SqCVP9NLS8lZebFjUkui4ghEmG01NV+2 n4WeO2Bn767qlkOp8nYbzRyStAn+boaMIxKvCp0h0f0a2gsQrIEGO2q5qOygnRVOBGRD5Q Z7G7R5aTtl90MTXKpqExtoO0CDIRdX4= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701787386; a=rsa-sha256; cv=none; b=Wi8OzmkEy2yaQ/79UtfXoIeROVsUhJTuCbiLGz4jrSoQoyJzXPiFMuH9t2F0Dn38hAtFgt Bjl8QFvpuxnK5dElPRI6GP1CFdaIXxYsecT2fQaGQnUKbxOBTfjejMgpUnogulIloJa1Dc OTmHf8GLFw1vX0F5lIDhxSbkmDqdqQ8= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 21EC2139F; Tue, 5 Dec 2023 06:43:51 -0800 (PST) Received: from [10.57.73.130] (unknown [10.57.73.130]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3962C3F6C4; Tue, 5 Dec 2023 06:43:03 -0800 (PST) Message-ID: <70dcdad7-5952-48ce-a9b9-042cfea59a5d@arm.com> Date: Tue, 5 Dec 2023 14:43:01 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/2] arm64: hugetlb: Fix page fault loop for sw-dirty/hw-clean contiguous PTEs Content-Language: en-GB To: James Houghton , Steve Capper , Will Deacon , Andrew Morton Cc: Mike Kravetz , Muchun Song , Anshuman Khandual , Catalin Marinas , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20231204172646.2541916-1-jthoughton@google.com> From: Ryan Roberts In-Reply-To: <20231204172646.2541916-1-jthoughton@google.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 9D84B180024 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: d867ofnkbw14ywpsku6oz6rbckexh6tn X-HE-Tag: 1701787385-836405 X-HE-Meta: U2FsdGVkX1/lVLYLpkIjA9crHEBU+pe9N1k6ViJ6/nWx6VEIyZYNOw4ewuKn1K5H50RgspXlflTov2KxUc/WP1hndEvFoybx6BesLnBaK7SxnAY4pjwBCu6PNdh2t/hKMJLeYnLNxLxM5EBhC9j6HuQwvjx68zfm8KbZjeNKe8yNv7YeP5rbCXlSNX6YNBfm7HETx9s8lvTQHaBTwWgduQxdCkKXak4YV8zLkxJon9AqmUjFlokvBz5zZhA4pR8TL/2Y8HJwHKrohVd3T7FI/mar7u62mhIhrR9AGPSBqFOwtYWWPPPqDmjASCFth6eZNf0p7b+TXbromkTQgRtTlwPxBnWGhhbqxpFHLnhxsf+BMMJebYz+vRCb/rnZS4bV2sADzVjW321JhFLhcPuPEIDCj7MeYRSXb1mPFGuQ2VyhMPDAwK8J2OAW8bq+FyepNPljp7n2xePb3sbhTVriEMaCm3watXDr9GHUYAwIgf3VfcFYID+l/zlv1Fs9u8UrOveS39P0BZsdHqtr6F83tDRpxn4qkrQwFFYANYkpm4FBaF2qP1g6sbW5/haR+JLBii9pjzLxbVeOAT7Wu+wi7L83yro/xre6u4EMoV4ufD54IsWbmvSULzcV+Td7Ir/1NJx9nf412Ur2Bq0mLedVwsWq4hBVXw+oLguJSdGihfo2iGgQqALQHncYYkwpXucjejjLo31/o2y91zcTzmCrwBWUwK5BxX8XLqiqiQuNGDeJDJxW7zSEqWRMOoECtHeY4nIqJh9tIpEEQL54yNZ1Fp1v9m8+TlcOjYFL5aTKkaCldRX9KIIhrdrGfRQDVA/6ZyQ/JyMKhNt8YpHqSAjo73yr9QFXIYz2qkIWdBgmzXyVRJndNy1yWV5g60HomgChO8bBHwYSsgQsNlLcsjn9PnZ174LaqWHdcRyDj6CDVhnGp5Y7ZzQBU1ZtyVeDtAyuZ4TmblEY0j1E3nl9d8d jSxZJY7b F/F5lAso0DK5uyX01tXSC5EcrZpw/GiEKMyv66h3BNTDsTdW4zwyjdIa3EA9cFGuUtO0UqHh1V7fyYzDEPAyIf1gr5zNlNek+LIMVfcyfvYCd24T3mKK4w/brsgoZ3EDo2FQYksYnfAhaJkUJcNQLWaeneHPGDFvrArhg87LLvbeYhxE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 04/12/2023 17:26, James Houghton wrote: > It is currently possible for a userspace application to enter a page > fault loop when using HugeTLB pages implemented with contiguous PTEs > when HAFDBS is not available. This happens because: > 1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean > (PTE_DIRTY | PTE_RDONLY | PTE_WRITE). Hi James, Do you know how this happens? AFAIK, this is the set of valid bit combinations, and PTE_RDONLY|PTE_WRITE|PTE_DIRTY is not one of them. Perhaps the real solution is to understand how this is happening and prevent it? /* * PTE bits configuration in the presence of hardware Dirty Bit Management * (PTE_WRITE == PTE_DBM): * * Dirty Writable | PTE_RDONLY PTE_WRITE PTE_DIRTY (sw) * 0 0 | 1 0 0 * 0 1 | 1 1 0 * 1 0 | 1 0 1 * 1 1 | 0 1 x * * When hardware DBM is not present, the sofware PTE_DIRTY bit is updated via * the page fault mechanism. Checking the dirty status of a pte becomes: * * PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY) */ Thanks, Ryan > 2. If, during a write, the CPU uses a sw-dirty, hw-clean PTE in handling > the memory access on a system without HAFDBS, we will get a page > fault. > 3. HugeTLB will check if it needs to update the dirty bits on the PTE. > For contiguous PTEs, it will check to see if the pgprot bits need > updating. In this case, HugeTLB wants to write a sequence of > sw-dirty, hw-dirty PTEs, but it finds that all the PTEs it is about > to overwrite are all pte_dirty() (pte_sw_dirty() => pte_dirty()), > so it thinks no update is necessary. > > Please see this[1] reproducer. > > I think (though I may be wrong) that both step (1) and step (3) are > buggy. > > The first patch in this series fixes step (3); instead of checking if > pte_dirty is matching in __cont_access_flags_changed, check pte_hw_dirty > and pte_sw_dirty separately. > > The second patch in this series makes step (1) less likely to occur. > Without this patch, we can get the kernel to write a sw-dirty, hw-clean > PTE with the following steps (showing the relevant VMA flags and pgprot > bits): > i. Create a valid, writable contiguous PTE. > VMA vmflags: VM_SHARED | VM_READ | VM_WRITE > VMA pgprot bits: PTE_RDONLY | PTE_WRITE > PTE pgprot bits: PTE_DIRTY | PTE_WRITE > ii. mprotect the VMA to PROT_NONE. > VMA vmflags: VM_SHARED > VMA pgprot bits: PTE_RDONLY > PTE pgprot bits: PTE_DIRTY | PTE_RDONLY > iii. mprotect the VMA back to PROT_READ | PROT_WRITE. > VMA vmflags: VM_SHARED | VM_READ | VM_WRITE > VMA pgprot bits: PTE_RDONLY | PTE_WRITE > PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY > > Applying either one of the two patches in this patchset will fix the > particular issue with HugeTLB pages implemented with contiguous PTEs. > It's possible that only one of these patches should be taken, or that > the right fix is something else entirely. > > [1]: https://gist.github.com/48ca/11d1e466deee032cb35aa8c2280f93b0 > > James Houghton (2): > arm64: hugetlb: Distinguish between hw and sw dirtiness in > __cont_access_flags_changed > arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify > > arch/arm64/include/asm/pgtable.h | 6 ++++++ > arch/arm64/mm/hugetlbpage.c | 5 ++++- > 2 files changed, 10 insertions(+), 1 deletion(-) > > > base-commit: 645a9a454fdb7e698a63a275edca6a17ef97afc4