From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACA6EC2BA1A for ; Tue, 18 Jun 2024 12:09:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 67AB56B0270; Tue, 18 Jun 2024 08:09:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 62A156B0272; Tue, 18 Jun 2024 08:09:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F1C66B0275; Tue, 18 Jun 2024 08:09:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 30AA96B0270 for ; Tue, 18 Jun 2024 08:09:31 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DC96F1407D8 for ; Tue, 18 Jun 2024 12:09:30 +0000 (UTC) X-FDA: 82243889700.18.093BC23 Received: from Atcsqr.andestech.com (60-248-80-70.hinet-ip.hinet.net [60.248.80.70]) by imf12.hostedemail.com (Postfix) with ESMTP id 0908F4001B for ; Tue, 18 Jun 2024 12:09:27 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of ycliang@andestech.com designates 60.248.80.70 as permitted sender) smtp.mailfrom=ycliang@andestech.com; dmarc=pass (policy=quarantine) header.from=andestech.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718712565; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=v40NAj8rQznISME+hI1lpBI1czLQY9dulVMgMv9vrjY=; b=oL+KNQw6uOzZNUjwdlfaeg+5vdKU4vN+hjdLnfb5Ms0WvNZPmN+7qhA/zMPYQ0Am3OG1U/ iHtL1JrdaPIp3+Nm2QeLi97KHY0QMQOzwDqLKw1AZCgUJQFYvuemSUvgxjsT6Uvqfbj2aL PS6fb4SpXI/Sn3m4ZiF5FrfEw5biEm8= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of ycliang@andestech.com designates 60.248.80.70 as permitted sender) smtp.mailfrom=ycliang@andestech.com; dmarc=pass (policy=quarantine) header.from=andestech.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718712565; a=rsa-sha256; cv=none; b=xgAw5B+FxOLxZLW2EBu43RHtypGpbJrc5E9Hp872NscXXplhqeI6Wx4KDmVXfJEIEo+4f8 HiR0VpBVJ/bXRjrk1I+vwZOI72sQ7ckos47VdlCr9Qf7svqX/IHbAJoQ+Oa0nUGUfeQCY6 fudR6wzO9Zd3NVPw3p7klKH4ytZns54= Received: from mail.andestech.com (ATCPCS34.andestech.com [10.0.1.134]) by Atcsqr.andestech.com with ESMTP id 45IC8uhb092424; Tue, 18 Jun 2024 20:08:56 +0800 (+08) (envelope-from ycliang@andestech.com) Received: from swlinux02 (10.0.15.183) by ATCPCS34.andestech.com (10.0.1.134) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Tue, 18 Jun 2024 20:08:55 +0800 Date: Tue, 18 Jun 2024 20:08:52 +0800 From: Leo Liang To: "Edgecombe, Rick P" CC: "lstoakes@gmail.com" , "akpm@linux-foundation.org" , "linux-mm@kvack.org" , "hch@infradead.org" , "linux-kernel@vger.kernel.org" , "bpf@vger.kernel.org" , "urezki@gmail.com" , "patrick@andestech.com" Subject: Re: [RFC PATCH 1/1] mm/vmalloc: Modify permission reset procedure to avoid invalid access Message-ID: References: <20240611131301.2988047-1-ycliang@andestech.com> <5e603eedf9e8fbd6efe1d118706dd82666e54251.camel@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <5e603eedf9e8fbd6efe1d118706dd82666e54251.camel@intel.com> User-Agent: Mutt/2.2.10 (e0e92c31) (2023-03-25) X-Originating-IP: [10.0.15.183] X-ClientProxiedBy: ATCPCS33.andestech.com (10.0.1.100) To ATCPCS34.andestech.com (10.0.1.134) X-DNSRBL: X-MAIL:Atcsqr.andestech.com 45IC8uhb092424 X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 0908F4001B X-Stat-Signature: teznz5afrwrqd719q4abipbg5dmyugyh X-HE-Tag: 1718712567-876450 X-HE-Meta: U2FsdGVkX1/i2LQPgEzeXueCPe9xg8LfGWotJhxmXhyns7LBIvRyiSAF+lX4aAcaXD6KJBOR8dUbLWITObzBGZDfZy6gBQfg+tXn+Eq3hBUBB5+7yP810TXfDgZqE5SV1XeUR1bJ2/UUdziMCFCOfaseHx4IcYIOnqno6Ru1mcqrvijDl6HF4zo8AFrC8+Um6GJufwMc376T2XKNFH1Odrl/juq0daRm2y+LMMCr/Z11MtZsqsna4lpbnAzMO9gMwFwu/T3ayLREfUXXYjK41na5cCku3u3uICoFrLwNDLBzn4WUEdLrZGb7VsttKSmDmDCLAuFqn7pVa3XKRhJ9epCHt8HvvS94b3mNRJGe9enP7aB5bKH0dlw3wMvK03D48raSFAe5RGHgmmdgXk7rBSM3pmv9x1ziz9uAdemJkEqocwmveTGEQV5b7D10BqTWmMYKaiY8oUhJsJTEACh4M4V0CP7H23YgOmJIELigrxAOlSgnYrVn22QvjIx9RGI87VZI3GH0/Vye/Nx3gx5MLXiAjCLg0GsxQ0J9L3EQ4INnt6kQjviW+5qmujoi/suTGX3yEoRGW4N1eTXTWqk8xiYahcOXpIC3vqMmZXW9jjJfmYsPx6iemAftVhOk9Wu2a7mZiTXTpg2b2MEHUro+9RLs57u8LKGgZm7gZNsKMV4BpqBEqqMiv3yyD6UdMrjOp5KcUi7JWKEmAH1b81GTVsmPpqAxOEYjMJdpNPw20PMwI16sCC7Ep0ADIbzORBmlsW0sPGVQmY4KhCIXzs6OnHNbqxQZpqS3pI5/HUiKL/rVoGHbDzKhFpmw/Cdvefn4q7lSDnf1CIyArfXW8PD2UEipTQi2725syO/ok17RX7aVhUbIxFL82qCYgubJLaeq11FkupJwyJrWMJbClNSPe9cxYRNJ2Pn8ZSsu08Kn2QWep94bLHWbpNOjku3JmfItptGovyjES8kLaiTpURn i813DVoa HuiMgjkLYOf7ZXbKDsa0AgNguR275oCO/hlGTK3mldgUhd1PjLH8ZKdtUNysSEceNHnpKesJ1udEbe/ELnXIa9cfJL2riAQRaWc3YJvhmNikAmlV94bKfmBPfDxIiuvzvV7cA5hB7++zo7WW1+4lLQE8YRTmVdoJJyLEaxzqLoOQWnbw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000927, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 11, 2024 at 02:21:42PM +0000, Edgecombe, Rick P wrote: > [EXTERNAL MAIL] > > On Tue, 2024-06-11 at 21:13 +0800, Leo Yu-Chi Liang wrote: > > The previous reset procedure is > > 1. Set direct map attribute to invalid > > 2. Flush TLB > > 3. Reset direct map attribute to default > > > > It is possible that kernel forks another process > > on another core that access the invalid mappings after > > sync_kernel_mappings. > > > > We could reproduce this scenario by running LTP/bpf_prog > > multiple times on RV32 kernel on QEMU. > > > > Therefore, the following procedure is proposed > > to avoid mappings being invalid. > > 1. Reset direct map attribute to default > > 2. Flush TLB > > Can you explain more about what is happening in this scenario? Looking briefly, > riscv is doing something unique around sync_kernel_mappings(). If a RO mapping > is copied instead of a NP/invalid mapping, how is the problem avoided? Hi Edgecombe, Sorry for the late reply and thank you for taking a look. What we are seeing at first is that running LTP bpf_prog03 test fails randomly on RV32 SMP QEMU with kernel 6.1 and the failed cause is a load page fault. After a bit of inspection, we found that the faulting page is a part of kernel's page table and the valid bit of that page's PTE is cleared due to this reset procedure. The scenario of this fault is suspected to be the following: 1. Running bpf_prog03: Creates kernel pages with elevated 'X' permission so that bpf program can be executed. 2. Finishing bpf_prog03: vfree code path to reset permission to default: a. Set the pages to invalid first b. Unmap the pages and flush TLB c. Reset them to default permission 3. Other core forkes new processes: sync_kernel_mappings copies the kernel page table. If the 3rd step happens during 2a, then we get a kernel mapping with invalid PTE permission, Therefore, if the invalid page is accessed, we'd get a page fault exception and the kernel panics. But despite all of the above conjecture, we still are wondering if setting the mappings to be invalid first is necessary. IMHO, "set to invalid --> unmap & flush TLB --> set to default" is identical to "set to default --> unmap & flush TLB". Could we not just reset them to default first and then flush TLB & free memory? Best regards, Leo