From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61453C4332F for ; Thu, 17 Nov 2022 23:23:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CE26A8E0002; Thu, 17 Nov 2022 18:23:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C91F66B0072; Thu, 17 Nov 2022 18:23:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B59458E0002; Thu, 17 Nov 2022 18:23:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A17C76B0071 for ; Thu, 17 Nov 2022 18:23:13 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 5A0A41A1239 for ; Thu, 17 Nov 2022 23:23:13 +0000 (UTC) X-FDA: 80144512266.20.FB4A700 Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com [209.85.221.50]) by imf22.hostedemail.com (Postfix) with ESMTP id EA4D2C0015 for ; Thu, 17 Nov 2022 23:23:12 +0000 (UTC) Received: by mail-wr1-f50.google.com with SMTP id j15so5788458wrq.3 for ; Thu, 17 Nov 2022 15:23:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=To+1hKZHNGliAAsi3sjlLMbkRpr0UyHLDkyr8XsSEH4=; b=qpqdZ0sqb/StH3lCysYFk74dHUJh7EZV2fUhEEVHMn80HcqT9aJ8kHz28WVtNYnon7 iVPoihuR+QYXMiA7Ak+I3j+7ZZukwg/8E4VqY2WibVFfszYGO3Q43ZLIJoI44o8mHdWA 5E15D3LyRolOWgZGiC16oXnyv64RBOgy+sd/WTsdf6pr9swHoqqmSmTP5pYtd3WUdqqA A5SjgOZrJxCzWiTq/8M91jgz2jw7FIa/lSmu6zbmJO46o39gOeN5tQ2M2BV1hAo0yEmK +v/dJI/Q0Le/WyZsqDRpwTaLC4CoEjRfw1LE7Mhy/g22lh1f86JVRp1D77JPTcfHzo4y sPdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=To+1hKZHNGliAAsi3sjlLMbkRpr0UyHLDkyr8XsSEH4=; b=7iCpC67NSaBEUxqmn7KVcPwQI+EE2t5zIHLDkZZ/wmyNAtWx7EQ1Ki3gqjlKcDnshn LXYf2K3OBKvZUXgiazoFohoj+8wNtQVrVYQom7GeZnAxmPXou7EgMhBKl2tCajUY3vto 35D9vBM5rWoFnkBHTdS6bEjlE8PKf8G2HenePMkwub2BwqPswXxvHZbUIpJYXAEmMUre swLRfvHJboUyfELbwm7znGU2YO0xPTFiGt7hhkYOAZTHoK8I7BVr+XmzE4M+cC5IXy9N uc/jH8daEYNFwjvoWtVN4QEOHDVyxljLHK7xGm6Mt7NY1aKju3i55ok4nRldMfUSsx0l Rg8g== X-Gm-Message-State: ANoB5pmBM3PGa4WnnxrNxHN1coiCVneQ8fGJjOEsJ2XmkJDBB3l6cgui ViuUi6onB0rQCcN5UQ6/ClL5hA== X-Google-Smtp-Source: AA0mqf5MqYZmGGq4x1Tmt014mbnl6hcoMvKbKayoFXRl6fajhYj0DuK6Sfc8tc7ZgsN7NJ0Qjpo9cA== X-Received: by 2002:adf:e103:0:b0:22e:3180:f75a with SMTP id t3-20020adfe103000000b0022e3180f75amr2743740wrz.340.1668727391387; Thu, 17 Nov 2022 15:23:11 -0800 (PST) Received: from elver.google.com ([2a00:79e0:9c:201:4799:a943:410e:976]) by smtp.gmail.com with ESMTPSA id k1-20020a5d6281000000b0022ae0965a8asm2062148wru.24.2022.11.17.15.23.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Nov 2022 15:23:10 -0800 (PST) Date: Fri, 18 Nov 2022 00:23:03 +0100 From: Marco Elver To: Dave Hansen Cc: Naresh Kamboju , Peter Zijlstra , kasan-dev , X86 ML , open list , linux-mm , regressions@lists.linux.dev, lkft-triage@lists.linaro.org, Andrew Morton , Alexander Potapenko Subject: Re: WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect Message-ID: References: <4208866d-338f-4781-7ff9-023f016c5b07@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4208866d-338f-4781-7ff9-023f016c5b07@intel.com> User-Agent: Mutt/2.2.7 (2022-08-07) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668727393; a=rsa-sha256; cv=none; b=V78zLu1F/qp6PArWK58WX9RvfkkTtb/YeuSn7TSTDijak4Dyn/1qnKkBirJ34I4oJH0s09 IYcvbsLWyZ8WZGlbEpRkxsPaMe8SVtR7xcluG7nxB6vrIKywDl0BJnUZ1XT1BLjIRCXVAz Obzzo29MGiyJiNBC8ZuFA4T+JGFA7zY= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=qpqdZ0sq; spf=pass (imf22.hostedemail.com: domain of elver@google.com designates 209.85.221.50 as permitted sender) smtp.mailfrom=elver@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668727393; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=To+1hKZHNGliAAsi3sjlLMbkRpr0UyHLDkyr8XsSEH4=; b=iUDgRFuDP0jrpUX2Qjx/WV29e2DLbIj+SuqkWtuKt3fNiMXWPmVRP0UdCVy2RBriKW8C1R 1UzEhWbAAhEVTEVxFWzpSOm+4U+bP0ruia3x4tPnvE+Wlk7edUO7YOM9Jjlh/KLPxQcH+y vyBQed6l/UBfq4m/Ez6mNcASlQ8OSaY= X-Stat-Signature: rfwhcq3119a4eyb49zp6gujojxuw6uj8 X-Rspamd-Queue-Id: EA4D2C0015 X-Rspamd-Server: rspam01 X-Rspam-User: Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=qpqdZ0sq; spf=pass (imf22.hostedemail.com: domain of elver@google.com designates 209.85.221.50 as permitted sender) smtp.mailfrom=elver@google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1668727392-649418 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Nov 17, 2022 at 06:34AM -0800, Dave Hansen wrote: > On 11/17/22 05:58, Marco Elver wrote: > > [ 0.663761] WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect+0x7b/0x120 > > [ 0.664033] WARNING: CPU: 0 PID: 0 at mm/kfence/core.c:234 kfence_protect+0x7d/0x120 > > [ 0.664465] kfence: kfence_init failed > > Any chance you could add some debugging and figure out what actually > made kfence call over? Was it the pte or the level? > > if (WARN_ON(!pte || level != PG_LEVEL_4K)) > return false; > > I can see how the thing you bisected to might lead to a page table not > being split, which could mess with the 'level' check. Yes - it's the 'level != PG_LEVEL_4K'. We do actually try to split the pages in arch_kfence_init_pool() (above this function) - so with "x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()" this somehow fails... > Also, is there a reason this code is mucking with the page tables > directly? It seems, uh, rather wonky. This, for instance: > > > if (protect) > > set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT)); > > else > > set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT)); > > > > /* > > * Flush this CPU's TLB, assuming whoever did the allocation/free is > > * likely to continue running on this CPU. > > */ > > preempt_disable(); > > flush_tlb_one_kernel(addr); > > preempt_enable(); > > Seems rather broken. I assume the preempt_disable() is there to get rid > of some warnings. But, there is nothing I can see to *keep* the CPU > that did the free from being different from the one where the TLB flush > is performed until the preempt_disable(). That makes the > flush_tlb_one_kernel() mostly useless. > > Is there a reason this code isn't using the existing page table > manipulation functions and tries to code its own? What prevents it from > using something like the attached patch? Yes, see the comment below - it's to avoid the IPIs and TLB shoot-downs, because KFENCE _can_ tolerate the inaccuracy even if we hit the wrong TLB or other CPUs' TLBs aren't immediately flushed - we trade a few false negatives for minimizing performance impact. > diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h > index ff5c7134a37a..5cdb3a1f3995 100644 > --- a/arch/x86/include/asm/kfence.h > +++ b/arch/x86/include/asm/kfence.h > @@ -37,34 +37,13 @@ static inline bool arch_kfence_init_pool(void) > return true; > } > > -/* Protect the given page and flush TLB. */ > static inline bool kfence_protect_page(unsigned long addr, bool protect) > { > - unsigned int level; > - pte_t *pte = lookup_address(addr, &level); > - > - if (WARN_ON(!pte || level != PG_LEVEL_4K)) > - return false; > - > - /* > - * We need to avoid IPIs, as we may get KFENCE allocations or faults > - * with interrupts disabled. Therefore, the below is best-effort, and > - * does not flush TLBs on all CPUs. We can tolerate some inaccuracy; > - * lazy fault handling takes care of faults after the page is PRESENT. > - */ > - ^^ See this comment. Additionally there's a real performance concern, and the inaccuracy is something that we deliberately accept. > if (protect) > - set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT)); > + set_memory_np(addr, addr + PAGE_SIZE); > else > - set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT)); > + set_memory_p(addr, addr + PAGE_SIZE); Isn't this going to do tons of IPIs and shoot down other CPU's TLBs? KFENCE shouldn't incur this overhead on large machines with >100 CPUs if we can avoid it. What does "x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()" do that suddenly makes all this fail? What solution do you prefer that both fixes the issue and avoids the IPIs? Thanks, -- Marco