From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44090C433EF for ; Sun, 19 Jun 2022 12:21:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B2B2C6B0078; Sun, 19 Jun 2022 08:21:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AD9F76B007B; Sun, 19 Jun 2022 08:21:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 97DF06B007D; Sun, 19 Jun 2022 08:21:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 859E96B0078 for ; Sun, 19 Jun 2022 08:21:07 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 4B9701206B4 for ; Sun, 19 Jun 2022 12:21:07 +0000 (UTC) X-FDA: 79594894974.03.16CADE7 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf07.hostedemail.com (Postfix) with ESMTP id D82234009A for ; Sun, 19 Jun 2022 12:21:06 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id k12-20020a17090a404c00b001eaabc1fe5dso8401718pjg.1 for ; Sun, 19 Jun 2022 05:21:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=UCUUpc980CK6BMfx+nVOhGH5cao0O/sA4QPhchZEsDc=; b=cK0v43icSNxfneSYVmcWpjvm1XoZ55AXps5Rrn2VrOjh/76LDg2gKeWPMROYTeE5cP 7J43fMEHn1Vm9zbkCijGx21fsoReIisUdAvR7jbx6iU69E91DGUgt1kD6tifOQG0rjw1 0i0RhfexaYJlaePdEIAmSXs0da2ZQbiTweUMKjmVRNE+oxUsIGxDnNA3vvPEkFk6DZOJ 2j6IBfSdOpzteBRgqnazIxDQBNb2Jl+erqY4qFIpOlGG+YG5t23KkTJ9SM6TF/EmUbJi f/GfYynVTZD6nfQAb5OAN9FXSrGWlcF+WaWMSVPNWTNvJTrb+ixlOn3lk3mVtc4si4Ip Mu5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=UCUUpc980CK6BMfx+nVOhGH5cao0O/sA4QPhchZEsDc=; b=OtpAqOvQ8yTLkdYUF1Ii+C8RDsclqfxWI5uVr1ZnWYtY8C5IFPxAPFEHGrOVRTNhb7 8NaBm7f2gc+eW248MNuD3u1QODxedKYOQjZjZavx/mbxh5M1lsZ5EgsWUndihVYfq/Ym cR9JGewd6UxUo4TzTWzI0bd/2v85ThLNcocg2Vewj5/nnM2q/ZS9mXEiktZy6KOUdQ4o rVUWkPFJT/ePko8le6EbrVZfXXzI/1DhBekBzhHwNEoTZrOn4Bv2eLzkLbwRCZpmYqYI frUV/iF1sPdtGJ7DYTfifOiHs6JUpCbzofN5VZ2eObrO3nxuBO1toPTJtrwF5eDRzZC7 gOag== X-Gm-Message-State: AJIora/jD4rDJ0ifg2EIgVB8VmW1A5IzjLyW8J++nQecsFNlOYYS6Ocj 6CMzpR+fhZBBhF4s3xO9mRs= X-Google-Smtp-Source: AGRyM1teekohyXGlKs64paIxqvYVV0UgFQTTrLzOPOZeE0jxdSDO6DbIJ0/cpVO5vtsXw1irQQWo9Q== X-Received: by 2002:a17:902:7248:b0:167:95e2:f83c with SMTP id c8-20020a170902724800b0016795e2f83cmr19042020pll.74.1655641265608; Sun, 19 Jun 2022 05:21:05 -0700 (PDT) Received: from hyeyoo ([114.29.24.243]) by smtp.gmail.com with ESMTPSA id y1-20020a63ad41000000b003fae8a7e3e5sm6982828pgo.91.2022.06.19.05.20.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Jun 2022 05:21:04 -0700 (PDT) Date: Sun, 19 Jun 2022 21:20:55 +0900 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> To: "Edgecombe, Rick P" Cc: "peterz@infradead.org" , "rppt@kernel.org" , "tglx@linutronix.de" , "linux-mm@kvack.org" , "dave.hansen@linux.intel.com" , "Williams, Dan J" , "x86@kernel.org" , "hpa@zytor.com" , "aarcange@redhat.com" , "mingo@redhat.com" , "Christopherson,, Sean" , "Lutomirski, Andy" , "pbonzini@redhat.com" , "bp@alien8.de" , "Tianyu.Lan@microsoft.com" , "aneesh.kumar@linux.ibm.com" , "chu, jane" Subject: Re: [RFC 2/2] x86/mm/cpa: drop pgprot_clear_protnone_bits() Message-ID: References: <20220614063933.13030-1-42.hyeyoo@gmail.com> <20220614063933.13030-3-42.hyeyoo@gmail.com> <6e3eb8a0fc059419b77e1f6fdf3cb8ab746eb37b.camel@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6e3eb8a0fc059419b77e1f6fdf3cb8ab746eb37b.camel@intel.com> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655641266; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UCUUpc980CK6BMfx+nVOhGH5cao0O/sA4QPhchZEsDc=; b=Di+yeGSpZZDw8pS/peW4CMuySaWn+HB7zfp/Z6n3oiUeAFtbaG5nonPhz7rlOuQCsGwoJ0 Xd1lX7WDZG835TvKS/borC7YVJcAYVjLzQNY+g77S6QkzfAkFgUZBncCUqVhoXMx5Fdjqe Re8z+PbvJ5/wZu0TbSvPpcfS1mXjM/I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655641266; a=rsa-sha256; cv=none; b=GNLCY+rAzbOU6tFNO2HHCV1Lr6tSev9+LT1fv9hQnUFDic6tw6r6LjHYb39NKxAQySCzcj PM3XLdKMxaWS6ZbJBZYlDzoAHB9GalsVSw9D8XURWMUn5OG1dCw3haTVYMexFyqTRtHk7T sloC3VjZEFI2hT1EgGQJ71WiiZ9isXw= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=cK0v43ic; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=cK0v43ic; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com X-Rspamd-Server: rspam12 X-Rspam-User: X-Stat-Signature: rqmmnw1w7bk1c9pbh659men7hti3xsd1 X-Rspamd-Queue-Id: D82234009A X-HE-Tag: 1655641266-216036 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jun 15, 2022 at 06:18:15PM +0000, Edgecombe, Rick P wrote: > On Wed, 2022-06-15 at 12:47 +0900, Hyeonggon Yoo wrote: > > On Tue, Jun 14, 2022 at 06:23:43PM +0000, Edgecombe, Rick P wrote: > > > On Tue, 2022-06-14 at 15:53 +0900, Hyeonggon Yoo wrote: > > > > On Tue, Jun 14, 2022 at 03:39:33PM +0900, Hyeonggon Yoo wrote: > > > > > commit a8aed3e0752b4 ("x86/mm/pageattr: Prevent PSE and GLOABL > > > > > leftovers > > > > > to confuse pmd/pte_present and pmd_huge") made CPA clear > > > > > _PAGE_GLOBAL when > > > > > _PAGE_PRESENT is not set. This prevents kernel crashing when > > > > > kernel > > > > > reads > > > > > a page with !_PAGE_PRESENT and _PAGE_PROTNONE (_PAGE_GLOBAL). > > > > > And > > > > > then it > > > > > set _PAGE_GLOBAL back when setting _PAGE_PRESENT again. > > > > > > > > > > After commit d1440b23c922d ("x86/mm: Factor out pageattr > > > > > _PAGE_GLOBAL > > > > > setting") made kernel not set unconditionally _PAGE_GLOBAL, > > > > > pages > > > > > lose > > > > > global flag after _set_pages_np() and _set_pages_p() are > > > > > called. > > > > > > > > > > But after commit 3166851142411 ("x86: skip check for spurious > > > > > faults for > > > > > non-present faults"), spurious_kernel_fault() does not confuse > > > > > pte/pmd entries with _PAGE_PROTNONE as present anymore. So > > > > > simply > > > > > drop pgprot_clear_protnone_bits(). > > > > > > > > > > > > Looks like I forgot to Cc: Andrea Arcangeli > > > > > > > > Plus I did check that kernel does not crash when reading > > > > from/writing > > > > to > > > > non-present pages with this patch applied. > > > > > > Thanks for the history. > > > > > > I think we should still fix pte_present() to not check prot_none if > > > the > > > user bit is clear. > > > > I tried, but realized it wouldn't work :( > > > > For example, when a pte entry is used as swap entry, _PAGE_PRESENT is > > cleared and _PAGE_PROTNONE is set. > > > > And other bits are used as type and offset of swap entry. > > In that case, _PAGE_BIT_USER bit does not represent _PAGE_USER. > > It is just one of bits that represents type of swap entry. > > > > So checking if _PAGE_PROTNONE set only when _PAGE_USER is set > > will confuse some swap entries as non-present. > > Oooh, right. So the user bit records "when a pagetable is > writeprotected by userfaultfd WP support". I'm not sure if maybe PCD is > available to move that to and leave the user bit in place, but it > sounds like an errata sensitive area to be tweaking. > > > > > > The spurious fault handler infinite loop may no > > > longer be a problem, but pte_present() still would return true for > > > kernel NP pages, so be fragile. Today I see at least the oops > > > message > > > and memory hotunplug (see remove_pagetable()) that would get > > > confused. > > > > As explained above, I don't think it's possible to make > > pte_present() > > accurate for both kernel and user ptes. > > > > Maybe we can implement pte_present_kernel()/pte_present_user() > > for when kernel knows it is user or kernel pte. > > This seems like a decent option to me. It seems there are only a few > places that are isolated to arch/x86. But there are some places where kernel does not know if it's kernel pte or user pte. For example show_fault_oops() can be called for both kernel and user address. Is something like this acceptable? static inline bool pte_present_address(pte_t pte, address) { if (kernel address) return pte_present_kernel(pte); return pte_present_user(pte); } > > > > or pte_present_with_address(pte, address) if we don't > > know it is user pte or kernel pte. > > -- Thanks, Hyeonggon