From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7052BC47258 for ; Wed, 17 Jan 2024 19:18:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ED8A56B0074; Wed, 17 Jan 2024 14:18:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E88276B0088; Wed, 17 Jan 2024 14:18:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D4F8C6B0089; Wed, 17 Jan 2024 14:18:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C33436B0074 for ; Wed, 17 Jan 2024 14:18:31 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A1FBF1606AF for ; Wed, 17 Jan 2024 19:18:31 +0000 (UTC) X-FDA: 81689764422.22.99DEEA1 Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com [209.85.128.52]) by imf17.hostedemail.com (Postfix) with ESMTP id AEB8040013 for ; Wed, 17 Jan 2024 19:18:29 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HbHuDsKV; spf=pass (imf17.hostedemail.com: domain of elver@google.com designates 209.85.128.52 as permitted sender) smtp.mailfrom=elver@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705519109; a=rsa-sha256; cv=none; b=M9mHafK37eBZRADKqu0XBSCTf7h1k0nzqFVIqQRAhEgh9iYC6YFWXaGwZ+80NxskV22cn3 XuX7qmUC+QWjDQCDx5Lgk7u/gTeXIm6ZPs3holFiHCNWAmyhWTNeLkgt4gi+WhA4ISHNkX 7oie7dB0mbKjtJBN3jtrHW2776Rl1+w= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HbHuDsKV; spf=pass (imf17.hostedemail.com: domain of elver@google.com designates 209.85.128.52 as permitted sender) smtp.mailfrom=elver@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705519109; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RY9ElmoBN1yaIqWHYG7QtdZi3QcPx+E4bgcakiLYwTI=; b=5gdmRHGTg5+CS9d3A2CmBmdvEDh/rmHEHT1dKlAZerIqI4vkPPumqqmc+3Xh5epKI6qFrZ sH+ND9ThejLrMUMxHerGO3jiNrGmx81OheE11Z6j7F2xNdk5h8FxnIGP7kauTbNM7A05sv sytE0mqgQ3kO9UxhotD8rJMVungouUU= Received: by mail-wm1-f52.google.com with SMTP id 5b1f17b1804b1-40e8c5dab9aso7800845e9.3 for ; Wed, 17 Jan 2024 11:18:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1705519108; x=1706123908; darn=kvack.org; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=RY9ElmoBN1yaIqWHYG7QtdZi3QcPx+E4bgcakiLYwTI=; b=HbHuDsKVs2iGK9PTdu+0w8OXEIruSwHiGBG/FPFSjYv99F3i1nPs1HiyQ1s42vx18N n0d40I0bXAD5YwhzHvHb26AhXppbbKQWVtZM5kKX7P+b+NqRjawG1DMMSRUlQAyL01Yx je3NwyVHiVL7xQ8+l0A7SE9SR3sdjJZu+Bjl2uikV4kYinkp38Zw7Xss3AmwSVPnzKrb qsyerN8rB4FkIWVVGDKSVVY8bcs40q7iqerIjxqRJe8a1TeJyQV/OU/1degKasNXIRKU TBkBxpPGlcquLAQS8LrJezMx6/HOiesnyjwuci24dnjClCDUc1ZdytKsXXEdCORoPyC+ Yolw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705519108; x=1706123908; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RY9ElmoBN1yaIqWHYG7QtdZi3QcPx+E4bgcakiLYwTI=; b=dl53v43dYyAfHhpTdlv4t2woi6DRx8WbW9fDwedfxrhwowABem373dylN0dHCb3P3F J4KHsPJBKSQeXkOOHBLE/9gwiHceg0sBSSxAVPzG9sjnIJrysvLmBPlmgeriOb2rcRzU MMIo/Ywdzz5SGTpf+ZkLPIAm7qE3vyemTo5nQstLgaehCFiGB2VuJPvs5QuY+PNGu2XW +rggazobj9g57U3PzH9AX6vTzR7f1XLeZBiLES7cRNOzIMWEqA0Y0Y4hUt3aMUI52Jqc mWM6o0SkdmdBXGueCgW35VU9bDTyuMpFKTtROL291xzcU07iPdAxdmfjbewW2JfMKrKB ThAQ== X-Gm-Message-State: AOJu0Yz4OBX6KOBpsvhieTSgjLxqJiHX4BpReyR29xiCobeDiKGC60wv giYHK61ZVsHVpq4r977WKingbJRbe+z8 X-Google-Smtp-Source: AGHT+IF4oL0BAsF+ha3wzXLc42eUoMZFRTm2jnAgJhUc+DjiH4e3oSAEbqgOtj+F3gpV7AEiC56wkA== X-Received: by 2002:a7b:ca59:0:b0:40e:76f6:9613 with SMTP id m25-20020a7bca59000000b0040e76f69613mr3246094wml.8.1705519107822; Wed, 17 Jan 2024 11:18:27 -0800 (PST) Received: from elver.google.com ([2a00:79e0:9c:201:e545:8ceb:c441:7541]) by smtp.gmail.com with ESMTPSA id bi13-20020a05600c3d8d00b0040e8800fcf3sm3597189wmb.5.2024.01.17.11.18.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jan 2024 11:18:27 -0800 (PST) Date: Wed, 17 Jan 2024 20:18:21 +0100 From: Marco Elver To: Alexander Potapenko Cc: quic_charante@quicinc.com, akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com, dan.j.williams@intel.com, david@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mgorman@techsingularity.net, osalvador@suse.de, vbabka@suse.cz, "Paul E. McKenney" , Dmitry Vyukov , kasan-dev@googlegroups.com, Ilya Leoshkevich , Nicholas Miehlbradt , rcu@vger.kernel.org Subject: Re: [PATCH] mm/sparsemem: fix race in accessing memory_section->usage Message-ID: References: <1697202267-23600-1-git-send-email-quic_charante@quicinc.com> <20240115184430.2710652-1-glider@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.2.12 (2023-09-09) X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: AEB8040013 X-Stat-Signature: jsjkepj5pwqx7dt4anx7z3yrbgwfhcbx X-Rspam-User: X-HE-Tag: 1705519109-78534 X-HE-Meta: U2FsdGVkX18bm8OrNg1W3BExokgtRNuaViIBojiBM65eBiLYtaOK/qbvq4eI/tCuBHH6wsAn14bGvxS0pbro17y+MaJ4lcHJ2spL7/DodzTc+3Cr8ONqkpaYqwy5rHe5VjQ6btJUN3iejUUTeXKO6LCE9Tu519GxtwwOk/4/oobOcohiaLAgTHi+uPeGtSSBWMtF59usp5MVJqaFgM3Mxrrbt6z5iJlhqbt16O3ArIFZwIdW0bbnOUs30p/MwzFPVFRoW/vRMsrWz3xkmbKQo1v9Bl9MaX8eeGVrl0fS5fbikxJJ/1ay7Q0B+GEnnoDttyWdUugAUzmV9N/HQpGtqrvmTX+ZfIJc9O7457S7xGJY5WsrWaft8wgZeht4PEUClDdOGxqeZagMUzy5+TesfrV5fB0ASViAeuEHixohJ7kYD4F0L8UV3beCfJinYgeFLMwikxebfqqdwtMHLUAMbaNVqeABtwrHYh4AgnLQMHqDMFTPH7wK2wHXOqBrtZ6wkp+DSVFp1r06T6KAs1/q90aa396APU3s1dc43ugOcJzltSpSbixbOsGPPS1VuL/CQZpREQTjvzE7XkDhwloKucE1M8SRjhCQypWVhE6Gg+sxFucdyMK5BmhlEcNFHQYhAlOLQrV9VzNvBXbELXK7lAoDci4M0nz/cz/B5QUSDSvVUFor7wp4ZelBg4ONTAuw6zVF8JG7JLA4h8pX2AhGLURTpK5vBQDd9xAewUJ6/iTEzYVt1tkA3IERg0FCLAtT54k9Ch8E0da4VU1cKgzCUXCxTNilj1lUQ/Ed4dnG1s0l5JcG0vzogYH3Tgx+6DGoUtyBFFJRv2I54VqUYfKvUMME5omFFsM0WeP2PlirmIcmOX5IWYimH08LuJqnpXW79uXQYlwZ0EFlgEL+kaMaZFzTY9gW8HwIKj9L5Cy0+g9zGNQOEqvFNYdgcOHhNo5I0rheG0qw0LNZoCjeVy0 UTpL8/R7 dRgBpRifcnLgbRStk8tSw777lD6UYNfD+BVp1iCLvXPeCVAvSTwSIJkjrpB1twMucoV73rKeUTYgEbwS18AksDnW2P3b+8tR08f8VELOzLA1X1Funzyh74LGk97AVuFfZrv0VGVHBM+Pi8pEPlSo4OBBlgw/EnALwLFolfvV+JIx1qycUKlZFqrB7D00V0/mUAIu6HeT18CmtUU5tSInMKvk68Cscvc4ow1np4zmeHcc3qeZW1/97nkjbKqCE9i+PZVL9CcV7i9gA5wqv9hN8qzX8REzcMePY64CYC8AjynwHGzsA6yVaa6ygQO3O0kAvmqDBCD4ywaFTp5QGSi/lo4LB4pv18SOWUMGOM37Uw5v7YKYltbcEmLCgwbAP6NORjMhp11TqI+CWOppscV54bYsu7j3CafpGr6KEaCvnxUKQtKHZZpIm6+Z3s+tRmGjH34z3nkyXlnA/QSt54HxjIcPb0GfuhMtBl3J8okSrmuMOV6qSBq7BbWjoINWfyfuoZfvpcIh3uftVVELnEmd6aZXIqkpyepJWCGpdY2fLrm0se1Dzih7RiiUU4fYwB2JP5W9KoP3bPGuYVi21egFucd95ug== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jan 15, 2024 at 09:34PM +0100, Marco Elver wrote: > On Mon, 15 Jan 2024 at 19:44, Alexander Potapenko wrote: > > > > Cc: "Paul E. McKenney" > > Cc: Marco Elver > > Cc: Dmitry Vyukov > > Cc: kasan-dev@googlegroups.com > > Cc: Ilya Leoshkevich > > Cc: Nicholas Miehlbradt > > > > Hi folks, > > > > (adding KMSAN reviewers and IBM people who are currently porting KMSAN to other > > architectures, plus Paul for his opinion on refactoring RCU) > > > > this patch broke x86 KMSAN in a subtle way. > > > > For every memory access in the code instrumented by KMSAN we call > > kmsan_get_metadata() to obtain the metadata for the memory being accessed. For > > virtual memory the metadata pointers are stored in the corresponding `struct > > page`, therefore we need to call virt_to_page() to get them. > > > > According to the comment in arch/x86/include/asm/page.h, virt_to_page(kaddr) > > returns a valid pointer iff virt_addr_valid(kaddr) is true, so KMSAN needs to > > call virt_addr_valid() as well. > > > > To avoid recursion, kmsan_get_metadata() must not call instrumented code, > > therefore ./arch/x86/include/asm/kmsan.h forks parts of arch/x86/mm/physaddr.c > > to check whether a virtual address is valid or not. > > > > But the introduction of rcu_read_lock() to pfn_valid() added instrumented RCU > > API calls to virt_to_page_or_null(), which is called by kmsan_get_metadata(), > > so there is an infinite recursion now. I do not think it is correct to stop that > > recursion by doing kmsan_enter_runtime()/kmsan_exit_runtime() in > > kmsan_get_metadata(): that would prevent instrumented functions called from > > within the runtime from tracking the shadow values, which might introduce false > > positives. > > > > I am currently looking into inlining __rcu_read_lock()/__rcu_read_unlock(), into > > KMSAN code to prevent it from being instrumented, but that might require factoring > > out parts of kernel/rcu/tree_plugin.h into a non-private header. Do you think this > > is feasible? > > __rcu_read_lock/unlock() is only outlined in PREEMPT_RCU. Not sure that helps. > > Otherwise, there is rcu_read_lock_sched_notrace() which does the bare > minimum and is static inline. > > Does that help? Hrm, rcu_read_unlock_sched_notrace() can still call __preempt_schedule_notrace(), which is again instrumented by KMSAN. This patch gets me a working kernel: diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 4ed33b127821..2d62df462d88 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -2000,6 +2000,7 @@ static inline int pfn_valid(unsigned long pfn) { struct mem_section *ms; int ret; + unsigned long flags; /* * Ensure the upper PAGE_SHIFT bits are clear in the @@ -2013,9 +2014,9 @@ static inline int pfn_valid(unsigned long pfn) if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) return 0; ms = __pfn_to_section(pfn); - rcu_read_lock(); + local_irq_save(flags); if (!valid_section(ms)) { - rcu_read_unlock(); + local_irq_restore(flags); return 0; } /* @@ -2023,7 +2024,7 @@ static inline int pfn_valid(unsigned long pfn) * the entire section-sized span. */ ret = early_section(ms) || pfn_section_valid(ms, pfn); - rcu_read_unlock(); + local_irq_restore(flags); return ret; } Disabling interrupts is a little heavy handed - it also assumes the current RCU implementation. There is preempt_enable_no_resched_notrace(), but that might be worse because it breaks scheduling guarantees. That being said, whatever we do here should be wrapped in some rcu_read_lock/unlock_() helper. Is there an existing helper we can use? If not, we need a variant that can be used from extremely constrained contexts that can't even call into the scheduler. And if we want pfn_valid() to switch to it, it also should be fast. Thanks, -- Marco