From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3182C4707B for ; Thu, 18 Jan 2024 09:43:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 54F2E6B0071; Thu, 18 Jan 2024 04:43:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4FF866B007B; Thu, 18 Jan 2024 04:43:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3CBC16B007D; Thu, 18 Jan 2024 04:43:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 2C7016B007B for ; Thu, 18 Jan 2024 04:43:16 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E3BCB1402D1 for ; Thu, 18 Jan 2024 09:43:15 +0000 (UTC) X-FDA: 81691943550.20.DA06D4C Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) by imf20.hostedemail.com (Postfix) with ESMTP id 055F71C0009 for ; Thu, 18 Jan 2024 09:43:13 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=uplh7WG4; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of elver@google.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=elver@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705570994; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cuYpIw9soBWDz51Uc9U824cyLkAJt+cFBIInHg6YOR8=; b=GYOr6qLpP4BKGb7Q1zQ0DMDSaCmCO0q3WDxSWqbgipv6S5QIkbVKnb52F+nqI0TK7Zdh69 jjm+lhzoikyM4Ue1VyR0Uihd1w32VzyYJ2aul5JKBbDWJtXxIqZ8WBlgAQKxiRo+0pvUPx kMxlFchmwYzOauPAuKkC45GMlb1OTY0= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=uplh7WG4; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of elver@google.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=elver@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705570994; a=rsa-sha256; cv=none; b=qSSG2M2uIoFOPbXtaG6I7qer5nG4sU4jH2ydMhWKkdVt/KCqAA11dj9w8VwNgRpEgg40gk Udbxcrn6R6ipecgAOkso9n1pBrEUC+VhiIiVr4SFnl5Hm/AcXhSnuxWcB/6cafOJBjoeVL VBjzAyX3EW+sdM+o6z8U9caUNRVjnas= Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-3367a304091so10462290f8f.3 for ; Thu, 18 Jan 2024 01:43:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1705570992; x=1706175792; darn=kvack.org; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=cuYpIw9soBWDz51Uc9U824cyLkAJt+cFBIInHg6YOR8=; b=uplh7WG4n+sD8JE46PqUYVdoNXFj+i6LUMoEQ9ANOJJGPwJd/5GcBRtcaSFC/xJOyJ Mw3HJoxwjc5OOhtSpQqnMfpnukoA9N3UoL8aoP3D0fUL/GJi7n/pX3O0LKu351hjxUHJ pm66R8nLoTRyQ6/1ur/i3614Qo75SZJf6MIXp5fP14UvL6Dtp3O6aHdf65spNDt2zmzk 4zv2n894Op83oNqZCVPE2Iy8+K2fAJOjKYMtKxbs2FyhBD6wN9CxDNX87Yez/cQp7Dhe LuuP+x7so2k56JATnPqAm4Xb8KvmcxzXGmEUxF91RL3gHLzdCJCGi9RNdF9g8FOKNIaB g7Dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705570992; x=1706175792; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cuYpIw9soBWDz51Uc9U824cyLkAJt+cFBIInHg6YOR8=; b=iUuOZRh5lCZB6rVsIM3EdumkrBMUCTidivf0jVKKaKx9j9efpLCJ4FD2pywAj7a5Hu zUY3c3vpue+9j4J4bJxcr9Uhi5Nrd5UNI+GWpsarTX40jrKT8lHMpB2a1rcVI1ihogJw sa6EHPzET+XcfC0DKpNjc8a2oURe2RKhJ6GyxZIK4gFEZ56Bm/Q8g0/TBSam9wAWpFmj kHuUWnPT3+8aJLmeSccAos69xgJ2it/agoDddkdCObwaCOFy3dlcHLplUd82NEWVdApx IVeFkV2B+5+U/zS2Ye/RLqo/zGfUHJ8AejlXAQ+XEJFtBO4/TtO3TkBiP9J1ynNqli6g 6Rrg== X-Gm-Message-State: AOJu0Yzo9VBtWj7M06sGmq/fpFDYSvNvTemOvhuikvLmcD6BNHoQcXgG gxRaPNQZRta5TlI7UEafZ3tbuHj+ySkzgPzl/M88vjsiZNAQN2UaprcCnXylLA== X-Google-Smtp-Source: AGHT+IGzcGfdh/XGZX6vq9NkjYiXbIvTy/IuZlNkPA9MWbrFDI/4qIoYnFKmLtyVcsgZ61zbZ7ar+Q== X-Received: by 2002:a5d:508a:0:b0:337:bde6:63b3 with SMTP id a10-20020a5d508a000000b00337bde663b3mr205462wrt.31.1705570992147; Thu, 18 Jan 2024 01:43:12 -0800 (PST) Received: from elver.google.com ([2a00:79e0:9c:201:9d7e:25fb:9605:2bef]) by smtp.gmail.com with ESMTPSA id q8-20020adff788000000b003367a51217csm3581808wrp.34.2024.01.18.01.43.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jan 2024 01:43:11 -0800 (PST) Date: Thu, 18 Jan 2024 10:43:06 +0100 From: Marco Elver To: Alexander Potapenko Cc: quic_charante@quicinc.com, akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com, dan.j.williams@intel.com, david@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mgorman@techsingularity.net, osalvador@suse.de, vbabka@suse.cz, "Paul E. McKenney" , Dmitry Vyukov , kasan-dev@googlegroups.com, Ilya Leoshkevich , Nicholas Miehlbradt , rcu@vger.kernel.org Subject: Re: [PATCH] mm/sparsemem: fix race in accessing memory_section->usage Message-ID: References: <1697202267-23600-1-git-send-email-quic_charante@quicinc.com> <20240115184430.2710652-1-glider@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.2.12 (2023-09-09) X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 055F71C0009 X-Stat-Signature: 3nxppxxr3bb9dac3k9mm5i4zg1ds3ij6 X-HE-Tag: 1705570993-641849 X-HE-Meta: U2FsdGVkX1/OgvMhwk0SL9WyTT+Oz+063Bkbqht5dC47M9wvr5PqDEPhVvBJlxCo3ltbfp9pLltxWkND5tXr5/SDKilsXqz2x9PGC/pxSYWwcfvllwx2dpB0bjRVK0D/i8cHg/jFhbgzCW+rS2+TGCICUNXFsfxkHVuTAiTQGfRt8vFurFUzQVRV+XDHOZ/LWI9wlvOCWb9YnyVYXnmTG/W0P1k5af+8t7Hol9UkHUXnq9KH/9BGcpZUi6sJH9Se7MK8jpWGmpFWkra1gvQmmv38HV+GOy+ptsNJP2MPMsH/rg81dTi6V4Spb5Wi2sWer41VIP5LowGcOGicgK9Gy5v6rOZkzXZ1bC75JEUtxltE9tpbtybQo5aPiQS9KcqSJUD5bennT9gMbwgDtEYKBKMf3YbgRT39SmqlYvZ+UOD9BA2IGxsDR9OLY7W4qCTrDecNTghrPJTF9Y+RqlZdylVC7qY4NJoBD7khhCIC87qSIN6s9wbCj5t2r/o9w+HVKZvYa2zCMxCcQQCNUhqsLNwOZ8VBK89wysLevkrxHd1c6ssVHUwp6gm+IfQTeLZ//xW69/rweRlwbgeL8zhZWpmf/ragthQAxBtAf46TFGmUmjDVrsmBJ4hheTsIG8nJldfG9npvPgwEZ4/JiudZEdME32xzBpcVMh95XUinylGUiEVLP9J7KNTWvYGfFbLNwNmVxhNa/iJAQLnWhceSJdZOmGYnLcdamsxJbfU9me0gU6ewJfyJpzTQkbRWQhMMSilRuoVU5ggpypSiQaYtls0jhA0IgJtrjA+IKe5V9ieM+n3cx1NsIGG5ImEyXNswclWOcfZB/FAQ/SuO8vW2yEaMT5rYP0MNzag4X25m/ZBO9wY04+WPAMPrVBPXJcYKYx80G7IpJgC7iKKu/ktrOjYGRoZehCvTBFULOZVtxjovD5pA5JSjcJx+YQnn8AdZYngpCDvoNwKbOh5JDZa FlYEWbJW zK2BGszIm02Hc9W794z2PrUe6UP0gnU/jG9FynMIbsdqHJ+A+nPCBhLCL51VgmLUrva+S8fyVmVM2cHWQLtqRMUmGLBrCAKSqGnm/H01mrtcdjR2sBttQy1Dw7iE1KHwGk6jrsh/PMcj8J6bcndsAd/NM6ZqC042AUsd2wtf5r35TFqz4XI4eJHzls3WZgkK5D1WF6eQEZ9e2HML/moWAaAPdW2SznWXoQbtpfHvRK7GOJYip7fdzY9TFqbbgv32joQvLdjGC01qmhRjgbc6cXYF+7rmXfebAAP91IUHFs4cV7njnCqS2sbmfr9fPVYmcMtVaXcubwVkrXCbTivVCt4s71I58IVoO1h80+FAxIXhrOlBasGgvq2bc5nWUhmgCAPFbNzL8CxKEfNWt68yqCcoIBRezHhLhck+d X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jan 18, 2024 at 10:01AM +0100, Alexander Potapenko wrote: > > > > Hrm, rcu_read_unlock_sched_notrace() can still call > > __preempt_schedule_notrace(), which is again instrumented by KMSAN. > > > > This patch gets me a working kernel: > > [...] > > Disabling interrupts is a little heavy handed - it also assumes the > > current RCU implementation. There is > > preempt_enable_no_resched_notrace(), but that might be worse because it > > breaks scheduling guarantees. > > > > That being said, whatever we do here should be wrapped in some > > rcu_read_lock/unlock_() helper. > > We could as well redefine rcu_read_lock/unlock in mm/kmsan/shadow.c > (or the x86-specific KMSAN header, depending on whether people are > seeing the problem on s390 and Power) with some header magic. > But that's probably more fragile than adding a helper. > > > > > Is there an existing helper we can use? If not, we need a variant that > > can be used from extremely constrained contexts that can't even call > > into the scheduler. And if we want pfn_valid() to switch to it, it also > > should be fast. The below patch also gets me a working kernel. For pfn_valid(), using rcu_read_lock_sched() should be reasonable, given its critical section is very small and also enables it to be called from more constrained contexts again (like KMSAN). Within KMSAN we also have to suppress reschedules. This is again not ideal, but since it's limited to KMSAN should be tolerable. WDYT? ------ >8 ------ diff --git a/arch/x86/include/asm/kmsan.h b/arch/x86/include/asm/kmsan.h index 8fa6ac0e2d76..bbb1ba102129 100644 --- a/arch/x86/include/asm/kmsan.h +++ b/arch/x86/include/asm/kmsan.h @@ -64,6 +64,7 @@ static inline bool kmsan_virt_addr_valid(void *addr) { unsigned long x = (unsigned long)addr; unsigned long y = x - __START_KERNEL_map; + bool ret; /* use the carry flag to determine if x was < __START_KERNEL_map */ if (unlikely(x > y)) { @@ -79,7 +80,21 @@ static inline bool kmsan_virt_addr_valid(void *addr) return false; } - return pfn_valid(x >> PAGE_SHIFT); + /* + * pfn_valid() relies on RCU, and may call into the scheduler on exiting + * the critical section. However, this would result in recursion with + * KMSAN. Therefore, disable preemption here, and re-enable preemption + * below while suppressing rescheduls to avoid recursion. + * + * Note, this sacrifices occasionally breaking scheduling guarantees. + * Although, a kernel compiled with KMSAN has already given up on any + * performance guarantees due to being heavily instrumented. + */ + preempt_disable(); + ret = pfn_valid(x >> PAGE_SHIFT); + preempt_enable_no_resched(); + + return ret; } #endif /* !MODULE */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 4ed33b127821..a497f189d988 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -2013,9 +2013,9 @@ static inline int pfn_valid(unsigned long pfn) if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) return 0; ms = __pfn_to_section(pfn); - rcu_read_lock(); + rcu_read_lock_sched(); if (!valid_section(ms)) { - rcu_read_unlock(); + rcu_read_unlock_sched(); return 0; } /* @@ -2023,7 +2023,7 @@ static inline int pfn_valid(unsigned long pfn) * the entire section-sized span. */ ret = early_section(ms) || pfn_section_valid(ms, pfn); - rcu_read_unlock(); + rcu_read_unlock_sched(); return ret; }