From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5903C4707B for ; Thu, 18 Jan 2024 11:28:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5349E6B0096; Thu, 18 Jan 2024 06:28:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E3946B0098; Thu, 18 Jan 2024 06:28:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3AC156B0099; Thu, 18 Jan 2024 06:28:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 2AF576B0096 for ; Thu, 18 Jan 2024 06:28:14 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id F3A1740BC4 for ; Thu, 18 Jan 2024 11:28:13 +0000 (UTC) X-FDA: 81692208066.07.E013E2A Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) by imf28.hostedemail.com (Postfix) with ESMTP id 98003C0016 for ; Thu, 18 Jan 2024 11:28:09 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b="bq VveGw"; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf28.hostedemail.com: domain of quic_charante@quicinc.com designates 205.220.168.131 as permitted sender) smtp.mailfrom=quic_charante@quicinc.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705577289; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IoaCXp2bZnlOp0cjfLp4rHUzRkV2pKD6nav2X3Gxi60=; b=v8GRTzORW/vYfNvWX5wSGmwxK0rs+oSVi+dQrZbwaNJIlpzcG0xHzQPvzQZ7wxIPvohSrt 218gwqXTytGO/C7N+b7PWMxx+lKzxLxNkT50ve46WPU6D7mkxKavtofaep11JoGPIJnaCT fmC+Be2g3XR0N/k7E4q0LDj3aQ9nqQ4= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b="bq VveGw"; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf28.hostedemail.com: domain of quic_charante@quicinc.com designates 205.220.168.131 as permitted sender) smtp.mailfrom=quic_charante@quicinc.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705577289; a=rsa-sha256; cv=none; b=ACzFGlauE9wCdX0lioscZeo6CBRMzMMiw5ns5qqPnZFwUnxmBYsnk2fuNuP/jMarvXzHKd hzLRYEK6+jieH1pZ2KEfbG/FqOdxxlCcbU+JARaQdGMXYoI99Tk9l+f2jJ9+LhdBQ1Epw5 inULitaCQqx0+FFd5BQfDbevUtIamV8= Received: from pps.filterd (m0279864.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.24/8.17.1.24) with ESMTP id 40I602gl004450; Thu, 18 Jan 2024 11:27:52 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= message-id:date:mime-version:subject:to:cc:references:from :in-reply-to:content-type:content-transfer-encoding; s= qcppdkim1; bh=IoaCXp2bZnlOp0cjfLp4rHUzRkV2pKD6nav2X3Gxi60=; b=bq VveGw8yBE0/P537ZDzsC8MGu500Y5bu+zNNf48qRUzGWjOGj5m5hDmFJ1xCsn0LG ODKapGvqk9DEapRTvzNy/wqvEgWe09mzKM21KVMpf+4zxG27VSNlazkGjQt7jwkL sPmVH7cfzBQeNrj+fW7Y8UCenSrbSWYo3V1Aww1sQUmYUNyXXTTLA0lfJQffeN1+ MLkPUiy7JuKQ1NLPnr7SbVv0r0CkSfH+0rdbKzVYaZYbPwiR8dlJAHeM5QTw4WmM 4IFNyazanGxw1MUCMvV0oqUktaF8URlzuSe+WESw07vAyXqHpF/S66o2+6YivuPo +cIQ1H4xe05xzQCuFWLQ== Received: from nalasppmta04.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3vpx8sgsxf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 18 Jan 2024 11:27:52 +0000 (GMT) Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA04.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 40IBRpQX018662 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 18 Jan 2024 11:27:51 GMT Received: from [10.216.49.108] (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Thu, 18 Jan 2024 03:27:46 -0800 Message-ID: Date: Thu, 18 Jan 2024 16:57:43 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH] mm, kmsan: fix infinite recursion due to RCU critical section Content-Language: en-US To: Marco Elver , Andrew Morton CC: Alexander Potapenko , Dmitry Vyukov , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , , "H. Peter Anvin" , , , , References: <20240118110022.2538350-1-elver@google.com> From: Charan Teja Kalla In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01a.na.qualcomm.com (10.52.223.231) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: Rh-3EVEcIoOAH_xbT10Yj9vRa3gg77-3 X-Proofpoint-GUID: Rh-3EVEcIoOAH_xbT10Yj9vRa3gg77-3 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-01-18_06,2024-01-17_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 spamscore=0 phishscore=0 priorityscore=1501 mlxlogscore=999 impostorscore=0 mlxscore=0 bulkscore=0 suspectscore=0 malwarescore=0 clxscore=1011 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2311290000 definitions=main-2401180082 X-Rspam-User: X-Stat-Signature: ua6yuisnsu8ny9uokfxi5hjm5n4uzw86 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 98003C0016 X-HE-Tag: 1705577289-496148 X-HE-Meta: U2FsdGVkX18gt4wKsvCJEsL6g5BkJocq5gcHDyGK6NaDEwzqvwaDqJDNXlO+teSkyQZ12gbyBG7PGgSeFFGSlj6VIfkv7rsLKHb3nwji8lh7v3DuKmVPGoILJwOhfHa3Xb+zvQOULagLYeR9uDjUn6YULOD7BCGmTmJ615IOMELWdfh97ZdNCoann5+4zZPa5PNm3xA4zuuTthE+SJo31bIS55TfzG4HpPI2954qyF9ysA50th2/Bo3OqxDI0RgA65u6fNXDC/5ozJhGv9M2f5j6Q8fHqKsJBtgvZ2aNcUYKo6uFwtM9rWok0XKkwKUUqSOUPq/YhbJaiaw3KyZPIw+QzUq+UyvaqDOTKz8iEU/d6KGFb7cmUChmpojbKRvOEY/6sPXKArgyp6koLKl+/5q+6oRgWTD6SH2TtLghRz5tiRahaDskV/VLLICoxnzU8WPp994/NbELUpbFRdwe/3lkMlAn/5euJxKNtri2jnMXrN/kpDfD7vD3xGOBb94s74h/y9OvMgows8i1OcszK2w7cDJN5aCFGuGoo9wYF+4aHu9F46WAHr5Omclq5RNkYkfWIffBAm6nUHnvEE2nbZFBtjVwu5qblTHYMafY+uP6ONPIYovB7KRH9bfVGG6vdKAyf3yzxGCPIghVgEPDSAsVX05E9Rq8Pv01nLl0tJmyhqkxMYQF/shS0ONdtB+ob8sWC+thrK1UBanEIrqWVJDVNwlhLpQ/o2WgNfhEKJMMKyPPJnilVQ1PxgHOO0lDNmZLH5W78l8g7ziJJmt1rlq/cMNCV7iqadHOsbo3CrLup5OxjF5tLwahGRUorue9Ig3EWZKdlCMNCX9ZFaSBjdXJo13H/v4GRSuoPYRCA1VxGEVR9odqE0o3VpgqHRQWKi+F2AwmPmJwrmiVNcc1P5ejb9yBPvYYmURmsyPilFunqcw0BbFNwIqj/1XgqBgZ8F+SntYzmYrj6E33LN6 68c938vu eVGAtAJRkOkTNx1+UC/XuxHXznuWG6i/op3GonumanzHqH3/ItCgyb4cWJ3AHCDdAwlbzh6a+NBN8Nkyta7hXf7rcmpT7R94858kht91SCUFYvA40ycMtWdNY8te7vdezJhmZAK1B/7croa3dj3GpX9Jz4cOuYGOSny4EvagXFTEdbeOmgafAKfbzGS2ndDlBxSKuhAAsM6QaqfABjYSICSruSL/v/DlsmvhH6gzXxLawKFtqXb7oRLSAlHdUr5H8Liz1cS9YCg48REKRW+yB6ujT+tuAOqKfQDmJdYSKm8dsn/SKc5N+KRjHwTFlfFI0s+ESiioeR5pvIbso/IDIUHHUb8q9xcibMs02TBM9ZVaHTng2++KdyUiPyczRm7/71YwQUyGLL29vKN1Loq/jE5cY4nsKCgcKMtn1+EcbicXhlQUkQ/3OF0U4gUgT5mP4FwM1oBfBGjw87UeEyZoZ2njO78JAe7HPAxLkRHgikZz1ETFYIOoEr2OUxPbl5CAyZt1E91xdAY4Wh+qBhajjPde0MIzm19oUDstG X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: May I ask if KMSAN also instruments the access to the memory managed as ZONE_DEVICE. You know this is not the RAM and also these pages will never be onlined thus also not be available in buddy. Reason for the ask is that this patch is introduced because of a race between pfn walker ends up in pfn of zone device memory. If KMSAN never instruments this, does it look good to you to have the KMSAN version of pfn_valid(), as being suggested by Alexander in the other mail. Thanks, On 1/18/2024 4:37 PM, Marco Elver wrote: > On Thu, 18 Jan 2024 at 12:00, Marco Elver wrote: >> >> Alexander Potapenko writes in [1]: "For every memory access in the code >> instrumented by KMSAN we call kmsan_get_metadata() to obtain the >> metadata for the memory being accessed. For virtual memory the metadata >> pointers are stored in the corresponding `struct page`, therefore we >> need to call virt_to_page() to get them. >> >> According to the comment in arch/x86/include/asm/page.h, >> virt_to_page(kaddr) returns a valid pointer iff virt_addr_valid(kaddr) >> is true, so KMSAN needs to call virt_addr_valid() as well. >> >> To avoid recursion, kmsan_get_metadata() must not call instrumented >> code, therefore ./arch/x86/include/asm/kmsan.h forks parts of >> arch/x86/mm/physaddr.c to check whether a virtual address is valid or >> not. >> >> But the introduction of rcu_read_lock() to pfn_valid() added >> instrumented RCU API calls to virt_to_page_or_null(), which is called by >> kmsan_get_metadata(), so there is an infinite recursion now. I do not >> think it is correct to stop that recursion by doing >> kmsan_enter_runtime()/kmsan_exit_runtime() in kmsan_get_metadata(): that >> would prevent instrumented functions called from within the runtime from >> tracking the shadow values, which might introduce false positives." >> >> Fix the issue by switching pfn_valid() to the _sched() variant of >> rcu_read_lock/unlock(), which does not require calling into RCU. Given >> the critical section in pfn_valid() is very small, this is a reasonable >> trade-off (with preemptible RCU). >> >> KMSAN further needs to be careful to suppress calls into the scheduler, >> which would be another source of recursion. This can be done by wrapping >> the call to pfn_valid() into preempt_disable/enable_no_resched(). The >> downside is that this sacrifices breaking scheduling guarantees; >> however, a kernel compiled with KMSAN has already given up any >> performance guarantees due to being heavily instrumented. >> >> Note, KMSAN code already disables tracing via Makefile, and since >> mmzone.h is included, it is not necessary to use the notrace variant, >> which is generally preferred in all other cases. >> >> Link: https://lkml.kernel.org/r/20240115184430.2710652-1-glider@google.com [1] >> Reported-by: Alexander Potapenko >> Reported-by: syzbot+93a9e8a3dea8d6085e12@syzkaller.appspotmail.com >> Signed-off-by: Marco Elver >> Cc: Charan Teja Kalla > > This might want a: > > Fixes: 5ec8e8ea8b77 ("mm/sparsemem: fix race in accessing > memory_section->usage") > > For reference which patch introduced the problem. > >> --- >> arch/x86/include/asm/kmsan.h | 17 ++++++++++++++++- >> include/linux/mmzone.h | 6 +++--- >> 2 files changed, 19 insertions(+), 4 deletions(-) >> >> diff --git a/arch/x86/include/asm/kmsan.h b/arch/x86/include/asm/kmsan.h >> index 8fa6ac0e2d76..d91b37f5b4bb 100644 >> --- a/arch/x86/include/asm/kmsan.h >> +++ b/arch/x86/include/asm/kmsan.h >> @@ -64,6 +64,7 @@ static inline bool kmsan_virt_addr_valid(void *addr) >> { >> unsigned long x = (unsigned long)addr; >> unsigned long y = x - __START_KERNEL_map; >> + bool ret; >> >> /* use the carry flag to determine if x was < __START_KERNEL_map */ >> if (unlikely(x > y)) { >> @@ -79,7 +80,21 @@ static inline bool kmsan_virt_addr_valid(void *addr) >> return false; >> } >> >> - return pfn_valid(x >> PAGE_SHIFT); >> + /* >> + * pfn_valid() relies on RCU, and may call into the scheduler on exiting >> + * the critical section. However, this would result in recursion with >> + * KMSAN. Therefore, disable preemption here, and re-enable preemption >> + * below while suppressing reschedules to avoid recursion. >> + * >> + * Note, this sacrifices occasionally breaking scheduling guarantees. >> + * Although, a kernel compiled with KMSAN has already given up on any >> + * performance guarantees due to being heavily instrumented. >> + */ >> + preempt_disable(); >> + ret = pfn_valid(x >> PAGE_SHIFT); >> + preempt_enable_no_resched(); >> + >> + return ret; >> } >> >> #endif /* !MODULE */ >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index 4ed33b127821..a497f189d988 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -2013,9 +2013,9 @@ static inline int pfn_valid(unsigned long pfn) >> if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) >> return 0; >> ms = __pfn_to_section(pfn); >> - rcu_read_lock(); >> + rcu_read_lock_sched(); >> if (!valid_section(ms)) { >> - rcu_read_unlock(); >> + rcu_read_unlock_sched(); >> return 0; >> } >> /* >> @@ -2023,7 +2023,7 @@ static inline int pfn_valid(unsigned long pfn) >> * the entire section-sized span. >> */ >> ret = early_section(ms) || pfn_section_valid(ms, pfn); >> - rcu_read_unlock(); >> + rcu_read_unlock_sched(); >> >> return ret; >> } >> -- >> 2.43.0.381.gb435a96ce8-goog >>