From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F5BFD3DEA9 for ; Fri, 18 Oct 2024 18:52:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D9EC76B00AF; Fri, 18 Oct 2024 14:52:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D4E3E6B00B1; Fri, 18 Oct 2024 14:52:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C16656B00B3; Fri, 18 Oct 2024 14:52:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9D0A66B00AF for ; Fri, 18 Oct 2024 14:52:52 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 8B0EB1407B7 for ; Fri, 18 Oct 2024 18:52:39 +0000 (UTC) X-FDA: 82687619406.26.D73AFDA Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf04.hostedemail.com (Postfix) with ESMTP id 441F34000D for ; Fri, 18 Oct 2024 18:52:35 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ee6qAM98; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729277408; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gRF4Z/yLkXd9c8BY+M9GWXkYvMXBr4L4IVX2bEjgBaQ=; b=lfK1JgTK29HHnCOB4kW/x7iy6cM2XbrPgnHOS54y9pJBU1mtJ8+ZhPpctJOiEkymLKIPo4 FifqwUPxZcWgWr/7TA21/G3vJEe6mPngpwis1JUMv49F7tbbLHDDy6fGZRp/CjxXVZrBIk tz5rLHtLnUHU4UJTdb9QHQPhNwmxCNs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729277408; a=rsa-sha256; cv=none; b=fk4FGNQH+3np/y6X5rvrh/uhJe9yaOrSis4v4fgFJ3qRSNjZj1AV4hINFuX4m7mCK+HIWx A+KS8pskhWUcw8H+7GbXIh7UyrSLo2bp5kFbHaY8FXin3lvftrmd+Nwz2/zqytaGl//dAN 2YyK1q0gYMxfdwvSMvJIfPsT3Do9QZ8= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ee6qAM98; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1729277569; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=gRF4Z/yLkXd9c8BY+M9GWXkYvMXBr4L4IVX2bEjgBaQ=; b=ee6qAM98TD4PsqoRjux9doziZ55ExFT5Y5kXhiK0SVlvtOGMDsOvfEMH9fbIEgC4hE3a0G 66JNiUEU29aL74FeE5tK3Wf7jpK+Na8j6DiMsnYsOWuAlPrKePoYmejcgqRtLesPZFDgYh MDaTafZIiyVUFomMVP6iiPtYmexMbtc= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-209-vC3HHmU8M2uihnuBTDMjBQ-1; Fri, 18 Oct 2024 14:52:48 -0400 X-MC-Unique: vC3HHmU8M2uihnuBTDMjBQ-1 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-37d589138a9so1263566f8f.1 for ; Fri, 18 Oct 2024 11:52:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729277567; x=1729882367; h=content-transfer-encoding:in-reply-to:organization:autocrypt :content-language:from:references:cc:to:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=gRF4Z/yLkXd9c8BY+M9GWXkYvMXBr4L4IVX2bEjgBaQ=; b=SFijy2RhFsKLFsBuIabaSJ/npmsRSnZaoO4E6Gv/r4W+RDwgOI/LPiNquCaOmJ2Nzs jo7/n/3IET5aNb9y1VShDl+rF60IAEYNtBUyy8OC832V/VYSbPVc1PQF6H0o/C0wvQ+f 14BaLzn1hEblUWu0JnlykknxQ6qhUxzviqdzuj+ICxHz13huxJre8JekQpWnpadELlYX z9Vwhd/Uo82njtcjbeeQJUPtouq0egXasL4eT0hWUT1lp4qXa5o0BGNEiBsKxTxSQ6g4 yJu5mhGPX8SRFkSHHEY/UCqPqLM2DNWVV4CLjfVbmYDcE9t8NpRGbEC1bcf5uIl5kIeJ wW0A== X-Forwarded-Encrypted: i=1; AJvYcCXrsNiO3HP/rcBTDxZt3nLHCWxyHa3oTT5F+jYxHziwjA2vC6oglnUBni0uWAjd9P2c9fkV7mZSdA==@kvack.org X-Gm-Message-State: AOJu0YyQMqk9eH6T8ZTdkKw//YzrMIF5UEYpo9VAikh+AHeoy1F8YWNp QEmImtf0vQuXsDqULAV1WUCuwwpeS+/XtkG5jM7hu74shfiTaDz78M1Vn2t89IfJOhuo4afkFGw KdKiT1Y+Tbv0btPJIreVViKukZrqMR9H6qvT27bw3oo7HJmHf X-Received: by 2002:a05:6000:c89:b0:37d:4e03:ff86 with SMTP id ffacd0b85a97d-37ebd3997bamr2453327f8f.49.1729277566874; Fri, 18 Oct 2024 11:52:46 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEKKdUa6Aa8Z9BpARJbPZl+R/35UxfqzS7FRrtrfIBaJcGtIrIU/9k3wWPpcPOVLNd2jWBtFg== X-Received: by 2002:a05:6000:c89:b0:37d:4e03:ff86 with SMTP id ffacd0b85a97d-37ebd3997bamr2453292f8f.49.1729277566293; Fri, 18 Oct 2024 11:52:46 -0700 (PDT) Received: from ?IPV6:2003:cb:c707:2400:68a3:92e0:906f:b69d? (p200300cbc707240068a392e0906fb69d.dip0.t-ipconnect.de. [2003:cb:c707:2400:68a3:92e0:906f:b69d]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-37ecf027d85sm2593277f8f.9.2024.10.18.11.52.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 18 Oct 2024 11:52:45 -0700 (PDT) Message-ID: <5f9ba14a-909b-4b49-b1de-3dc98b31aee0@redhat.com> Date: Fri, 18 Oct 2024 20:52:43 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 0/7] support for mm-local memory allocations and use it To: Fares Mehanna Cc: akpm@linux-foundation.org, ardb@kernel.org, arnd@arndb.de, bhelgaas@google.com, broonie@kernel.org, catalin.marinas@arm.com, james.morse@arm.com, javierm@redhat.com, jean-philippe@linaro.org, joey.gouly@arm.com, kristina.martsenko@arm.com, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mark.rutland@arm.com, maz@kernel.org, mediou@amazon.de, nh-open-source@amazon.com, oliver.upton@linux.dev, ptosi@google.com, rdunlap@infradead.org, rkagan@amazon.de, rppt@kernel.org, shikemeng@huaweicloud.com, suzuki.poulose@arm.com, tabba@google.com, will@kernel.org, yuzenghui@huawei.com References: <63d112d8-62d0-4e95-81f0-3031f990abc4@redhat.com> <20241011142547.24447-1-faresx@amazon.de> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: <20241011142547.24447-1-faresx@amazon.de> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 441F34000D X-Stat-Signature: bthe19gjir3dyi96psykqkq6r94tegak X-Rspam-User: X-HE-Tag: 1729277555-510766 X-HE-Meta: U2FsdGVkX1+TuAsGXzEIOoi+7pOAu5lzCStQ+kxtxx+EvVoMh570krGQWY6r2xWwHav49ayA6mAlbMaeC1ep0ziFbTL79KSK4AExjFlmoTmtQ5fLLB5pcnFu1PeOPiuIA3zcRO99LENhmEwTB7G4DRoLHAYOlkMbQ2MLZTBfmNCf/Jjl6dEmNZ4CcLRhXHWSECrRP1dOo4OVOw1OZUNX1RhTvoA0pzvdBvFXNISn/panJRs4UJZBGC+4Y/5X9W9naLYaP8GIWfjaneyAPWHORNQUsGsIE1kp51rbxde34W+dFU3XPf81XaSbQE3OTYmpqKMsF2rkxnKtZKKjKmiyGNxugvn14Q6DQ6QtnmJtY/dtT5r2T9iaVI1CcbTFTXm7huoRzfyFiHPoJ7GumeGuQtpywEya/snmSKR5q+DKPQ4Y52WX0DZt2xgxdkq2h1e79OCUJV827x6n0mJe7Vr5ZYdzzSg0wcGYYaG1FfQrZWyvn5pzCbt7PXyTbkVbxGKQU7FOIObc1g494ZEIGaqyK2dYo/+4zNU3vDinNms9q+3+uhMq7iVLw/Jjrs8/WGG8LPO1zJ2v1XL+6g/Hx+oqumOqu/UHKjnThHs6pcAGUd8MiaVhIxghVq6Wf85u/xHQOWq0Ph/DKs8kOZSBmFnFm6wpFuiVed3wVfhZjP/0V1EMoJ6/tIP5T2sbpz90ZPpoXbtAv7wFoIIpa69lcr/khqkbfvxgK4PaXYyIXd0MzJfFvAA3tvYJGV/s6vN4IeuAqToW7HGacHdSRbtcs5qveHMBg/FUwnvTdj0b9lgaCojcwi/npJR7Ky7FxS1F6XPRrPzHL4I18mckQ+6Velir9EPA5WY4IYKJ2Fm8uMn6IGlrql86ZffFUdqVp5e0h4kCPEuNoNgl2mq9oOFYGJtoyBwh0URMDByfJzPkfryx7kSftjHBQFZKcGfDW/TtxTnr2w5UNXjTRFtodYufB+H j9IFoQNU 7gi713lvyLgz0nLnPp4hsxvblS47165sQd/cCytc3GVRqtxAMvAgGq1bq7dhpiim7mqJDNfbsOojn80EZ6+oh33ik26VeuQKqcg+uXmZ0bi88FyJUqF8W/jaBDt4T/j9cczS7tlIO+BEeARyT2qpWblF7i74sFj0+gnbkFaHwF1L079jRnCXcLFKc/+yaUFev5pZYW/x6zmSD/t+UNki7znNSWNW6zHp4eFI+KnvnnbxZ1hzeFUH8oRbmKPjMLY6GA2Om31ozptZcmqVT+QIZ6nnTh99CTYI1NadnSdXfzAQYfngMV87c8Ik+34hr7fLSVu4ahbuU9DgTv6jmIKrBygjb/dv5vhmQyJdZIYFE74a/BaegblK8vKaxxNe7uuM8B4xwCaFT7bM+I+wTRYKrr9P8dSf7UpN6jo5EWmxQEqbbktuXOyRDmzpLpv9dbTSBZ6JM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 11.10.24 16:25, Fares Mehanna wrote: >>> >>> >>>> On 11. Oct 2024, at 14:36, Mediouni, Mohamed wrote: >>>> >>>> >>>> >>>>> On 11. Oct 2024, at 14:04, David Hildenbrand wrote: >>>>> >>>>> On 10.10.24 17:52, Fares Mehanna wrote: >>>>>>>> In a series posted a few years ago [1], a proposal was put forward to allow the >>>>>>>> kernel to allocate memory local to a mm and thus push it out of reach for >>>>>>>> current and future speculation-based cross-process attacks. We still believe >>>>>>>> this is a nice thing to have. >>>>>>>> >>>>>>>> However, in the time passed since that post Linux mm has grown quite a few new >>>>>>>> goodies, so we'd like to explore possibilities to implement this functionality >>>>>>>> with less effort and churn leveraging the now available facilities. >>>>>>>> >>>>>>>> An RFC was posted few months back [2] to show the proof of concept and a simple >>>>>>>> test driver. >>>>>>>> >>>>>>>> In this RFC, we're using the same approach of implementing mm-local allocations >>>>>>>> piggy-backing on memfd_secret(), using regular user addresses but pinning the >>>>>>>> pages and flipping the user/supervisor flag on the respective PTEs to make them >>>>>>>> directly accessible from kernel. >>>>>>>> In addition to that we are submitting 5 patches to use the secret memory to hide >>>>>>>> the vCPU gp-regs and fp-regs on arm64 VHE systems. >>>>>>> >>>>>>> I'm a bit lost on what exactly we want to achieve. The point where we >>>>>>> start flipping user/supervisor flags confuses me :) >>>>>>> >>>>>>> With secretmem, you'd get memory allocated that >>>>>>> (a) Is accessible by user space -- mapped into user space. >>>>>>> (b) Is inaccessible by kernel space -- not mapped into the direct map >>>>>>> (c) GUP will fail, but copy_from / copy_to user will work. >>>>>>> >>>>>>> >>>>>>> Another way, without secretmem, would be to consider these "secrets" >>>>>>> kernel allocations that can be mapped into user space using mmap() of a >>>>>>> special fd. That is, they wouldn't have their origin in secretmem, but >>>>>>> in KVM as a kernel allocation. It could be achieved by using VM_MIXEDMAP >>>>>>> with vm_insert_pages(), manually removing them from the directmap. >>>>>>> >>>>>>> But, I am not sure who is supposed to access what. Let's explore the >>>>>>> requirements. I assume we want: >>>>>>> >>>>>>> (a) Pages accessible by user space -- mapped into user space. >>>>>>> (b) Pages inaccessible by kernel space -- not mapped into the direct map >>>>>>> (c) GUP to fail (no direct map). >>>>>>> (d) copy_from / copy_to user to fail? >>>>>>> >>>>>>> And on top of that, some way to access these pages on demand from kernel >>>>>>> space? (temporary CPU-local mapping?) >>>>>>> >>>>>>> Or how would the kernel make use of these allocations? >>>>>>> >>>>>>> -- >>>>>>> Cheers, >>>>>>> >>>>>>> David / dhildenb >>>>>> Hi David, >>>>> >>>>> Hi Fares! >>>>> >>>>>> Thanks for taking a look at the patches! >>>>>> We're trying to allocate a kernel memory that is accessible to the kernel but >>>>>> only when the context of the process is loaded. >>>>>> So this is a kernel memory that is not needed to operate the kernel itself, it >>>>>> is to store & process data on behalf of a process. The requirement for this >>>>>> memory is that it would never be touched unless the process is scheduled on this >>>>>> core. otherwise any other access will crash the kernel. >>>>>> So this memory should only be directly readable and writable by the kernel, but >>>>>> only when the process context is loaded. The memory shouldn't be readable or >>>>>> writable by the owner process at all. >>>>>> This is basically done by removing those pages from kernel linear address and >>>>>> attaching them only in the process mm_struct. So during context switching the >>>>>> kernel loses access to the secret memory scheduled out and gain access to the >>>>>> new process secret memory. >>>>>> This generally protects against speculation attacks, and if other process managed >>>>>> to trick the kernel to leak data from memory. In this case the kernel will crash >>>>>> if it tries to access other processes secret memory. >>>>>> Since this memory is special in the sense that it is kernel memory but only make >>>>>> sense in the term of the owner process, I tried in this patch series to explore >>>>>> the possibility of reusing memfd_secret() to allocate this memory in user virtual >>>>>> address space, manage it in a VMA, flipping the permissions while keeping the >>>>>> control of the mapping exclusively with the kernel. >>>>>> Right now it is: >>>>>> (a) Pages not accessible by user space -- even though they are mapped into user >>>>>> space, the PTEs are marked for kernel usage. >>>>> >>>>> Ah, that is the detail I was missing, now I see what you are trying to achieve, thanks! >>>>> >>>>> It is a bit architecture specific, because ... imagine architectures that have separate kernel+user space page table hierarchies, and not a simple PTE flag >> to change access permissions between kernel/user space. >>>>> >>>>> IIRC s390 is one such architecture that uses separate page tables for the user-space + kernel-space portions. >>>>> >>>>>> (b) Pages accessible by kernel space -- even though they are not mapped into the >>>>>> direct map, the PTEs in uvaddr are marked for kernel usage. >>>>>> (c) copy_from / copy_to user won't fail -- because it is in the user range, but >>>>>> this can be fixed by allocating specific range in user vaddr to this feature >>>>>> and check against this range there. >>>>>> (d) The secret memory vaddr is guessable by the owner process -- that can also >>>>>> be fixed by allocating bigger chunk of user vaddr for this feature and >>>>>> randomly placing the secret memory there. >>>>>> (e) Mapping is off-limits to the owner process by marking the VMA as locked, >>>>>> sealed and special. >>>>> >>>>> Okay, so in this RFC you are jumping through quite some hoops to have a kernel allocation unmapped from the direct map but mapped into a per-process page >> table only accessible by kernel space. :) >>>>> >>>>> So you really don't want this mapped into user space at all (consequently, no GUP, no access, no copy_from_user ...). In this RFC it's mapped but turned >> inaccessible by flipping the "kernel vs. user" switch. >>>>> >>>>>> Other alternative (that was implemented in the first submission) is to track those >>>>>> allocations in a non-shared kernel PGD per process, then handle creating, forking >>>>>> and context-switching this PGD. >>>>> >>>>> That sounds like a better approach. So we would remove the pages from the shared kernel direct map and map them into a separate kernel-portion in the per-MM >> page tables? >>>>> >>>>> Can you envision that would also work with architectures like s390x? I assume we would not only need the per-MM user space page table hierarchy, but also a >> per-MM kernel space page table hierarchy, into which we also map the common/shared-among-all-processes kernel space page tables (e.g., directmap). >>>> Yes, that’s also applicable to arm64. There’s currently no separate per-mm user space page hierarchy there. >>> typo, read kernel >> >> >> Okay, thanks. So going into that direction makes more sense. >> >> I do wonder if we really have to deal with fork() ... if the primary >> users don't really have meaning in the forked child (e.g., just like >> fork() with KVM IIRC) we might just get away by "losing" these >> allocations in the child process. >> >> Happy to learn why fork() must be supported. > > It really depends on the use cases of the kernel secret allocation, but in my > mind a troubling scenario: > 1. Process A had a resource X. > 2. Kernel decided to keep some data related to resource X in process A secret > memory. > 3. Process A decided to fork, now process B share the resource X. > 4. Process B started using resource X. <-- This will crash the kernel as the > used kernel page table on process B has no mapping for the secret memory used > in resource X. > > I haven't tried to trigger this crash myself though. > Right, and if we can rule out any users that are supposed to work after fork(), we can just disregard that in the first version. I never played with this, but let's assume you make use of these mm-local allocations in KVM context. What would happens if you fork() with a KVM fd and try accessing that fd from the other process using ioctls? I recall that KVM will not be "duplicated". What would happen if you send that fd over to a completely different process and try accessing that fd from the other process using ioctls? Of course, question being: if you have MM-local allocations in both cases and there is suddenly a different MM ... assuming that both cases are even possible (if they are not possible, great! :) ). I think I am supposed to know if these things are possible or not and what would happen, but it's late Friday and my brain is begging for some Weekend :D > I didn't think in depth about this issue yet, but I need to because duplicating > the secret memory mappings in the new forked process is easy (To give kernel > access on the secret memory), but tearing them down across all forked processes > is a bit complicated (To clean stale mappings on parent/child processes). Right > now tearing down the mapping will only happen on mm_struct which allocated the > secret memory. If an allocation is MM-local, I would assume that fork() would *duplicate* that allocation (leaving CoW out of the picture :D ), but that's where the fun begins (see above regarding my confusion about KVM and fork() behavior ... ). -- Cheers, David / dhildenb