From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24746C4167B for ; Tue, 5 Dec 2023 17:55:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6D89C6B006E; Tue, 5 Dec 2023 12:55:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 688766B007E; Tue, 5 Dec 2023 12:55:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 550316B0075; Tue, 5 Dec 2023 12:55:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 424656B007E for ; Tue, 5 Dec 2023 12:55:24 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 1C44FA01BC for ; Tue, 5 Dec 2023 17:55:24 +0000 (UTC) X-FDA: 81533516568.24.40496B7 Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) by imf29.hostedemail.com (Postfix) with ESMTP id 4C3D4120013 for ; Tue, 5 Dec 2023 17:55:22 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Cyq77sJG; spf=pass (imf29.hostedemail.com: domain of jthoughton@google.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701798922; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=14hZsL2EZPjcbdl66EWnzrt1ktBdJ9RVwmhzDfYd1QY=; b=B5waTueNvNCt3AmAwVlD52OJ4IRibUSPV81ida0Z34plxDG90lDdqBaBXFg6rZC1smaiKP CydiYRmw6Lhnf6QFH2w5HQw06TpJmkdKj3zVaUDdPqTzaSf6XKj3FeqiHoDjSIJdpB4920 O8y15bhW2p/LlaC1JQRTIHrRGhEc6r8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701798922; a=rsa-sha256; cv=none; b=6ipErg1hp/vmMGDsLac4OfFdmbZr8TmQRbiy6SpXoGSEmdvUftzED8mAFeJbbvZ84Bt4z7 MN//6CQH2MSD+qeMWxd1uBCF7Fi6Py/KkP5UJ87TnW2h86t/VWeMHT62Sze+P+VTUIZcod 54sUkEL7dPYwG60rGCA9e3oNDJgvbLU= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Cyq77sJG; spf=pass (imf29.hostedemail.com: domain of jthoughton@google.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-4255cc2bcdaso3311cf.1 for ; Tue, 05 Dec 2023 09:55:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701798921; x=1702403721; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=14hZsL2EZPjcbdl66EWnzrt1ktBdJ9RVwmhzDfYd1QY=; b=Cyq77sJGyA8PFBHwL5z1MnZuO3YVpW1BZ15xhfm16QdETMnDIGEcIUaReU5EDyoJRx EwpReWt5T9xNhFCWc0v5ylyG1rdiC7T3Kk7t89IEwTzdxPaBWYEmXn6PZP42RxW/omyw uRmUeUbz/qUpoq1hRzRNfK3+WvJ/Fzy41DQddQsVjjY0xEQ1v4M4Ve9fZ3nq1n3/vuEa POqWZCxyr+IBXHpoCd4V9HxAMZ6/NQDWlWH57Fe3waZiK28sfgPau7eIVprs6RYC/PQa TAAeoQIU2LGjewnYRyjyEvwXZX/LBxaiAs3Mi1tFe1w74FnZN/UABhpuPMVSyDtiNmVO 0fPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701798921; x=1702403721; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=14hZsL2EZPjcbdl66EWnzrt1ktBdJ9RVwmhzDfYd1QY=; b=IaD9DYlJ5kG7L886uDCQyQt1DjSJO6q6xE5tyrEXvIvCuoSinq+j/ShwliknB0f4Zt oJmyReUwb3K0Swh3mCLrfY/7xmptTCvjx9S6ufC/6oN2uU4W5NOaO/InCa/MwM9cFaJh KkdVplsIMqRQdK/3eDuGAb5cUeNqcxdtq5YbvZH204NnXHfdNJPKli+4a9ZDvUApbOvz rseLpmYmwcYS+s8wbYwMxX6YaDF10bjGSaOpvo8b7SgdeDd2CWzYDQntndqbucjf7Ubo yLK41pfv/xl4+9P6FbZPm9iAwp1JVoq8xeL1tUqLB4RPOIsbytVxOwTden2YSnya/PfZ qwBw== X-Gm-Message-State: AOJu0YxO2rpCKuFgLFKAc1ppV2nYRQxfXj5SxIiEXyYCeBGJDpf9K9LL 5KIBbWtft9Oq7WxWNLtZMdfCfCrp8uA+nLSqpsiaPw== X-Google-Smtp-Source: AGHT+IFuEEh+7llLgACUbcG3qpOSXaC4d/phdzPYYFh4nee56nOgh+uRdBPaMboPmcMAUuYsex0aFjYRwGCXDywPBVs= X-Received: by 2002:a05:622a:216:b0:423:a0d5:6370 with SMTP id b22-20020a05622a021600b00423a0d56370mr1126930qtx.27.1701798921112; Tue, 05 Dec 2023 09:55:21 -0800 (PST) MIME-Version: 1.0 References: <20231204172646.2541916-1-jthoughton@google.com> <70dcdad7-5952-48ce-a9b9-042cfea59a5d@arm.com> In-Reply-To: <70dcdad7-5952-48ce-a9b9-042cfea59a5d@arm.com> From: James Houghton Date: Tue, 5 Dec 2023 09:54:44 -0800 Message-ID: Subject: Re: [PATCH 0/2] arm64: hugetlb: Fix page fault loop for sw-dirty/hw-clean contiguous PTEs To: Ryan Roberts Cc: Steve Capper , Will Deacon , Andrew Morton , Mike Kravetz , Muchun Song , Anshuman Khandual , Catalin Marinas , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: skrmfq14ti6b58438wwc8d6bkb7r5e1a X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 4C3D4120013 X-Rspam-User: X-HE-Tag: 1701798922-962679 X-HE-Meta: U2FsdGVkX18v+VvG9oUr9ROU+d32RU8UVuXfj/sjJ2leTb1unRWkpD7Uo7nYWopB4S/glOrakinczsLIgCNLA8g1ZTuAfeS6L4VJfpi5+9ptPNL0niG9qeQN0PnZr7W+or92NaNkLfDxbWuMd1KkD8q+d0y20+ALiShypKlGJBjXRSo2851wqkaCrNuxvY5wwCVYJvXJtzmUx9smWzxiZxsDu12S2Npllya4FXqVzJg7b7CedWOXImDeQpmMzsQUeKFnfR0L5H3Ndg5RJIkdgHKWyPIrtBjcNUru/8K9m85cc5s6m7w7tn/HCsnLoba+MYrId9RLYm9vRaGBTedLhNgn0jYp7MocqqU83877RBt8QZQJ9VJk8p32/rhNiX8jUdWeZUpENm2E0qJbdmDBrCQU94GQJ7AyrGcEbeFI97Q8Z7kNVybfJ/AD7wSj+PEniJ83Zy7nZQLjT4MwaW1ngoE2qutbSKnYZS4HPJ2EneGbv3TNOg1ovUkz5taGUXTpQ0NSj4dlC7TWEjUPlGfYuDJx7onPGpkVylMUjjvHeZK2SnOduzqN6+HqzyWgaTE0IhuKD/edIlPYTwzK0/bLpUrPsgv3va4aiRtajODcXSPUhzJa/EWMRhucktjkUucmAOxdSWFblSmqaoUxVYj8RFjz6GG4jQvouVkR7WxB5Y1FkaO5jTHrcKXP543J/ItavhM3TeXYlblej6MDYI6yvwYRi3Ezf9xUwLWiNVCBR0GWOXMURPtFjDdEQIV189Ypb9dZ+d3fSrkUbs7rxgkKrnVKT/n4pMrCp2M6Z7qmVA69Xsbc8VFJCWlJE3IH2fHcudz2izTaLd+MBiVUUwvDeLJgsitTrrSAO7sWECjK5E6IvE9ZvwnBGr4atxZNJSaxgScj7XXtq9yVknprP0d0GhheL7oqCJmbldydD41fXQKFhTDI4ZjZlgWYZQMYrSJPHHDtDvlnASb01Hh6BLH XZuT9TB+ yqFnRTU155KFASnzBHtTTYBsW7rHwlHh8q9bii7iXSJnV1VfGKDbrVvtbV1KBj1sXPDzMcaDqTKzTBozPi78DKbHkPC6fFqYx1AJXximN8fzMIOcrSmHsrv4fkO//VkUmoMs9Rj5wUlFODUp+X0HozXh0ulTItKq+wdMVKd54gN9R5VAluyVu+BfMCI5wqemURYcYCfYB4KJSun9ReJqVePZ+gZrgBDdfNxciA2K4CWwxlroqCkE60TOucuyJlYec9Gznc9vTQ9ecra+YB3UJ6ol5WaGoDiviXwVHMxdt5OJZq6ZRmShk/ikMDQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Dec 5, 2023 at 6:43=E2=80=AFAM Ryan Roberts = wrote: > > On 04/12/2023 17:26, James Houghton wrote: > > It is currently possible for a userspace application to enter a page > > fault loop when using HugeTLB pages implemented with contiguous PTEs > > when HAFDBS is not available. This happens because: > > 1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean > > (PTE_DIRTY | PTE_RDONLY | PTE_WRITE). > > Hi James, > > Do you know how this happens? Hi Ryan, Thanks for taking a look! I do understand why this is happening. There is an explanation in the reproducer[1] and also in this cover letter (though I realize I could have been a little clearer). See below. > AFAIK, this is the set of valid bit combinations, and > PTE_RDONLY|PTE_WRITE|PTE_DIRTY is not one of them. Perhaps the real solut= ion is > to understand how this is happening and prevent it? > > /* > * PTE bits configuration in the presence of hardware Dirty Bit Managemen= t > * (PTE_WRITE =3D=3D PTE_DBM): > * > * Dirty Writable | PTE_RDONLY PTE_WRITE PTE_DIRTY (sw) > * 0 0 | 1 0 0 > * 0 1 | 1 1 0 > * 1 0 | 1 0 1 > * 1 1 | 0 1 x > * > * When hardware DBM is not present, the sofware PTE_DIRTY bit is updated= via > * the page fault mechanism. Checking the dirty status of a pte becomes: > * > * PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY) > */ Thanks for pointing this out. So (1) is definitely a bug. The second patch in this series makes it impossible to create such a PTE via pte_modify (by forcing sw-dirty PTEs to be hw-dirty as well). > > The second patch in this series makes step (1) less likely to occur. It makes it impossible to create this invalid set of bits via pte_modify(). Assuming all PTE pgprot updates are done via the proper interfaces, patch #2 might actually make this invalid bit combination impossible to produce (that's certainly the goal). So perhaps language stronger than "less likely" is appropriate. Here's the sequence of events to trigger this bug, via mprotect(): > > Without this patch, we can get the kernel to write a sw-dirty, hw-clean > > PTE with the following steps (showing the relevant VMA flags and pgprot > > bits): > > i. Create a valid, writable contiguous PTE. > > VMA vmflags: VM_SHARED | VM_READ | VM_WRITE > > VMA pgprot bits: PTE_RDONLY | PTE_WRITE > > PTE pgprot bits: PTE_DIRTY | PTE_WRITE > > ii. mprotect the VMA to PROT_NONE. > > VMA vmflags: VM_SHARED > > VMA pgprot bits: PTE_RDONLY > > PTE pgprot bits: PTE_DIRTY | PTE_RDONLY > > iii. mprotect the VMA back to PROT_READ | PROT_WRITE. > > VMA vmflags: VM_SHARED | VM_READ | VM_WRITE > > VMA pgprot bits: PTE_RDONLY | PTE_WRITE > > PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY With patch #2, the PTE pgprot bits in step iii become PTE_DIRTY | PTE_WRITE (hw-dirtiness is set, as the PTE is sw-dirty). Thanks! > > [1]: https://gist.github.com/48ca/11d1e466deee032cb35aa8c2280f93b0