From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1002C433E1 for ; Fri, 24 Jul 2020 19:44:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C09BE206F6 for ; Fri, 24 Jul 2020 19:44:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C09BE206F6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 48EF76B0025; Fri, 24 Jul 2020 15:44:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 418106B0026; Fri, 24 Jul 2020 15:44:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E0206B0027; Fri, 24 Jul 2020 15:44:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0104.hostedemail.com [216.40.44.104]) by kanga.kvack.org (Postfix) with ESMTP id 15BF66B0025 for ; Fri, 24 Jul 2020 15:44:00 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 966F38124 for ; Fri, 24 Jul 2020 19:43:59 +0000 (UTC) X-FDA: 77073994998.22.coil71_09049ad26f49 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 689D31802EC74 for ; Fri, 24 Jul 2020 19:43:59 +0000 (UTC) X-HE-Tag: coil71_09049ad26f49 X-Filterd-Recvd-Size: 5526 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Fri, 24 Jul 2020 19:43:57 +0000 (UTC) IronPort-SDR: bJzvDg5401AJdH9vB6Gw9tmPd0n/1xYxOcXxBnWhM/2bkvlq8fzNvzAJZbpEncjoXJbmvhF+S8 3qinsVGuNAAA== X-IronPort-AV: E=McAfee;i="6000,8403,9692"; a="148262732" X-IronPort-AV: E=Sophos;i="5.75,391,1589266800"; d="scan'208";a="148262732" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jul 2020 12:43:56 -0700 IronPort-SDR: xw35vgWVMvuh2FxBDPdLOZ98hPNSo7kyjZ00bcaGwKX/k2DG8gIcu2MZetY8wt/D3BJ5XnscaB FrmYkinokGDA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,391,1589266800"; d="scan'208";a="272667289" Received: from iweiny-desk2.sc.intel.com ([10.3.52.147]) by fmsmga008.fm.intel.com with ESMTP; 24 Jul 2020 12:43:56 -0700 Date: Fri, 24 Jul 2020 12:43:56 -0700 From: Ira Weiny To: Andy Lutomirski Cc: Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Dave Hansen , x86@kernel.org, Dan Williams , Vishal Verma , Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: Re: [PATCH RFC V2 17/17] x86/entry: Preserve PKRS MSR across exceptions Message-ID: <20200724194355.GA844234@iweiny-DESK2.sc.intel.com> References: <20200724172344.GO844235@iweiny-DESK2.sc.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.1 (2018-12-01) X-Rspamd-Queue-Id: 689D31802EC74 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jul 24, 2020 at 10:29:23AM -0700, Andy Lutomirski wrote: >=20 > > On Jul 24, 2020, at 10:23 AM, Ira Weiny wrote: > >=20 > > =EF=BB=BFOn Thu, Jul 23, 2020 at 10:15:17PM +0200, Thomas Gleixner wr= ote: > >> Thomas Gleixner writes: > >>=20 > >>> Ira Weiny writes: > >>>> On Fri, Jul 17, 2020 at 12:06:10PM +0200, Peter Zijlstra wrote: > >>>>>> On Fri, Jul 17, 2020 at 12:20:56AM -0700, ira.weiny@intel.com wr= ote: > >>>>> I've been really digging into this today and I'm very concerned t= hat I'm > >>>>> completely missing something WRT idtentry_enter() and idtentry_ex= it(). > >>>>>=20 > >>>>> I've instrumented idt_{save,restore}_pkrs(), and __dev_access_{en= ,dis}able() > >>>>> with trace_printk()'s. > >>>>>=20 > >>>>> With this debug code, I have found an instance where it seems lik= e > >>>>> idtentry_enter() is called without a corresponding idtentry_exit(= ). This has > >>>>> left the thread ref counter at 0 which results in very bad things= happening > >>>>> when __dev_access_disable() is called and the ref count goes nega= tive. > >>>>>=20 > >>>>> Effectively this seems to be happening: > >>>>>=20 > >>>>> ... > >>>>> // ref =3D=3D 0 > >>>>> dev_access_enable() // ref +=3D 1 =3D=3D> disable protection > >>>>> // exception (which one I don't know) > >>>>> idtentry_enter() > >>>>> // ref =3D 0 > >>>>> _handler() // or whatever code... > >>>>> // *_exit() not called [at least there is no trace_pri= ntk() output]... > >>>>> // Regardless of trace output, the ref is left at 0 > >>>>> dev_access_disable() // ref -=3D 1 =3D=3D> -1 =3D=3D> does not= enable protection > >>>>> (Bad stuff is bound to happen now...) > >>>=20 > >>> Well, if any exception which calls idtentry_enter() would return wi= thout > >>> going through idtentry_exit() then lots of bad stuff would happen e= ven > >>> without your patches. > >>>=20 > >>>> Also is there any chance that the process could be getting schedul= ed and that > >>>> is causing an issue? > >>>=20 > >>> Only from #PF, but after the fault has been resolved and the tasks = is > >>> scheduled in again then the task returns through idtentry_exit() to= the > >>> place where it took the fault. That's not guaranteed to be on the s= ame > >>> CPU. If schedule is not aware of the fact that the exception turned= off > >>> stuff then you surely get into trouble. So you really want to store= it > >>> in the task itself then the context switch code can actually see th= e > >>> state and act accordingly. > >>=20 > >> Actually thats nasty as well as you need a stack of PKRS values to > >> handle nested exceptions. But it might be still the most reasonable > >> thing to do. 7 PKRS values plus an index should be really sufficient= , > >> that's 32bytes total, not that bad. > >=20 > > I've thought about this a bit more and unless I'm wrong I think the > > idtentry_state provides for that because each nested exception has it= 's own > > idtentry_state doesn't it? >=20 > Only the ones that use idtentry_enter() instead of, say, nmi_enter(). Oh agreed... But with this patch we are still better off than just preserving during c= ontext switch. I need to update the commit message here to make this clear though. Thanks, Ira