From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.3 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46B1AC4320A for ; Wed, 25 Aug 2021 17:27:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BA8FC610CB for ; Wed, 25 Aug 2021 17:27:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org BA8FC610CB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 2EF786B0071; Wed, 25 Aug 2021 13:27:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A01C8D0002; Wed, 25 Aug 2021 13:27:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 167798D0001; Wed, 25 Aug 2021 13:27:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0082.hostedemail.com [216.40.44.82]) by kanga.kvack.org (Postfix) with ESMTP id F04006B0071 for ; Wed, 25 Aug 2021 13:27:26 -0400 (EDT) Received: from smtpin37.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 9B5FE183956F7 for ; Wed, 25 Aug 2021 17:27:26 +0000 (UTC) X-FDA: 78514284492.37.1A648C7 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf12.hostedemail.com (Postfix) with ESMTP id 20C7510000B4 for ; Wed, 25 Aug 2021 17:27:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1629912444; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AXaz0tPc7fZtJBb/dPevEF5ap0GWQE6Dpk8ZPpRNA/I=; b=WiIS/AD0e850d6Qbu5WFDBUsR1tKIj8amCunegNvGqTOUq0HJNTix4N/QCW7AK9wyWnDIM 4jFnrZSyDDGVs9KVyRyDYt7bj36kqNRBTvsAwpH0yYQl/VCux5qKGkF9hfmWEbvmbsKTF5 5TQz47SklMHVhC5SKHPopkwXVDAFqUQ= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-435-U1YGQc2-N6S4T0uPY0Opgg-1; Wed, 25 Aug 2021 13:27:23 -0400 X-MC-Unique: U1YGQc2-N6S4T0uPY0Opgg-1 Received: by mail-wm1-f69.google.com with SMTP id j135-20020a1c238d000000b002e87aa95b5aso2297837wmj.4 for ; Wed, 25 Aug 2021 10:27:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=AXaz0tPc7fZtJBb/dPevEF5ap0GWQE6Dpk8ZPpRNA/I=; b=iRF6vmRVLTQdBd3OXO0rdSauKgmKtAh9MmYkKj/4z2NpswR2K9ejDKOHnfywxJosct rozAz9yQHDnIKzk3IdylmOi631yPnnh+fl9qCDV2ePMCPdQwnY97Gklqh4Ks8VDiupgk Y9AjxdMQXwPWP1V7ANo0V+W6080rSaR3j+rI1GFwIsLGOFoQCe31U/9PcOOqtC7IYylq OgOYMIhdd+2Kio+9+rVbt6JiOYV2uMCM0ThKEB1qganUE0wy6qA7sAPlYurx7TK+vEpE a5ZBD9e0fXabbqyJ0PS5I8lx0AfGqo1Te0+VWLBgSRFWW2wIPXnC3KtyD+QFrlPEck7l CfXg== X-Gm-Message-State: AOAM532CrlGRganANeFe1OIeGuhUhfOGICXBBRGow5Zy/0rW6BrdqCUX QtoKF9IHS4oKEu2Ui/1LNVFRUP9FzEPtQ2cnXhR2lD5Gb/abfr1/lDJ048jJlacSq/4u5vhyhNX +ncRrauy3eyUsqr3xODp44nFijCSXqqMNpzHfwR48Ys3AlHDj1dqtpUY5c4U= X-Received: by 2002:a05:6000:1375:: with SMTP id q21mr22001465wrz.41.1629912441924; Wed, 25 Aug 2021 10:27:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzstO6fv4t/9anZzviPu7bKujAkZrZHhfA3pIWg39sTZMi/ePcEnlXG2Gdkj6lY2fgt85/EZQ== X-Received: by 2002:a05:6000:1375:: with SMTP id q21mr22001426wrz.41.1629912441574; Wed, 25 Aug 2021 10:27:21 -0700 (PDT) Received: from [192.168.3.132] (p4ff23d6b.dip0.t-ipconnect.de. [79.242.61.107]) by smtp.gmail.com with ESMTPSA id r129sm260673wmr.7.2021.08.25.10.27.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 25 Aug 2021 10:27:21 -0700 (PDT) To: Dan Williams Cc: Linux Kernel Mailing List , Arnd Bergmann , Greg Kroah-Hartman , "Michael S. Tsirkin" , Jason Wang , "Rafael J. Wysocki" , Andrew Morton , Hanjun Guo , Andy Shevchenko , virtualization@lists.linux-foundation.org, Linux MM References: <20210816142505.28359-1-david@redhat.com> <20210816142505.28359-2-david@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2 1/3] /dev/mem: disallow access to explicitly excluded system RAM regions Message-ID: Date: Wed, 25 Aug 2021 19:27:20 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Queue-Id: 20C7510000B4 Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="WiIS/AD0"; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf12.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com X-Rspamd-Server: rspam01 X-Stat-Signature: dfew743qi8dg77hsshd4iprkwygq1t9q X-HE-Tag: 1629912444-914135 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 25.08.21 19:07, Dan Williams wrote: > On Wed, Aug 25, 2021 at 12:23 AM David Hildenbrand w= rote: >> >> On 25.08.21 02:58, Dan Williams wrote: >>> On Mon, Aug 16, 2021 at 7:25 AM David Hildenbrand = wrote: >>>> >>>> virtio-mem dynamically exposes memory inside a device memory region = as >>>> system RAM to Linux, coordinating with the hypervisor which parts ar= e >>>> actually "plugged" and consequently usable/accessible. On the one ha= nd, the >>>> virtio-mem driver adds/removes whole memory blocks, creating/removin= g busy >>>> IORESOURCE_SYSTEM_RAM resources, on the other hand, it logically (un= )plugs >>>> memory inside added memory blocks, dynamically either exposing them = to >>>> the buddy or hiding them from the buddy and marking them PG_offline. >>>> >>>> virtio-mem wants to make sure that in a sane environment, nobody >>>> "accidentially" accesses unplugged memory inside the device managed >>>> region. After /proc/kcore has been sanitized and /dev/kmem has been >>>> removed, /dev/mem is the remaining interface that still allows uncon= trolled >>>> access to the device-managed region of virtio-mem devices from user >>>> space. >>>> >>>> There is no known sane use case for mapping virtio-mem device memory >>>> via /dev/mem while virtio-mem driver concurrently (un)plugs memory i= nside >>>> that region. So once the driver was loaded and detected the device >>>> along the device-managed region, we just want to disallow any access= via >>>> /dev/mem to it. >>>> >>>> Let's add the basic infrastructure to exclude some physical memory >>>> regions completely from /dev/mem access, on any architecture and und= er >>>> any system configuration (independent of CONFIG_STRICT_DEVMEM and >>>> independent of "iomem=3D"). >>> >>> I'm certainly on team "/dev/mem considered harmful", but this approac= h >>> feels awkward. It feels wrong for being non-committal about whether >>> CONFIG_STRICT_DEVMEM is in wide enough use that the safety can be >>> turned on all the time, and the configuration option dropped, or ther= e >>> are users clinging onto /dev/mem where they expect to be able to buil= d >>> a debug kernel to turn all of these restrictions off, even the >>> virtio-mem ones. This splits the difference and says some /dev/mem >>> accesses are always disallowed for "reasons", but I could say the sam= e >>> thing about pmem, there's no sane reason to allow /dev/mem which has >>> no idea about the responsibilities of properly touching pmem to get >>> access to it. >> >> For virtio-mem, there is no use case *and* access could be harmful; I >> don't even want to allow if for debugging purposes. If you want to >> inspect virtio-mem device memory content, use /proc/kcore, which >> performs proper synchronized access checks. Modifying random virtio-me= m >> memory via /dev/mem in a debug kernel will not be possible: if you >> really have to play with fire, use kdb or better don't load the >> virtio-mem driver during boot, such that the kernel won't even be maki= ng >> use of device memory. >> >> I don't want people disabling CONFIG_STRICT_DEVMEM, or booting with >> "iomem=3Drelaxed", and "accidentally" accessing any of virtio-mem memo= ry >> via /dev/mem, while it gets concurrently plugged/unplugged by the >> virtio-mem driver. Not even for debugging purposes. >=20 > That sounds more an argument that all of the existing "kernel is using > this region" cases should become mandatory exclusions. If unloading > the driver removes the exclusion then that's precisely > CONFIG_IO_STRICT_DEVMEM. Why is the virtio-mem driver more special > than any other driver that expects this integrity guarantee? Unloading the driver will only remove exclusion if the driver can be=20 unloaded cleanly -- if there is no memory added to Linux. Similar to=20 force-unbinding dax/kmem without offlining memory, the whole device=20 range will remain excluded. (unloading the driver is only even implemented because there is no way=20 to not implement it; there is no sane use case for virtio-mem to do that) There are 2 things that are relevant for virtio-mem memory in regards of=20 this series: 1. Kernel is currently using it (added virtio-mem memory). Don't allow=20 access. Pretty much like most other things we want to exclude, I agree. 2. Kernel is currently not using it (not yet added virtio-mem memory),=20 or not using it right now any more (removed virtio-mem memory). In=20 contrast to other devices (DIMM, PMEM, ...) there is no sane use case=20 for this memory, because the VM must not use it (as defined in the=20 virtio-spec). I care about 2) a lot because I don't want people looking at=20 /proc/iomem, figuring out that there is something to map. And by the=20 time they try to map it via /dev/mem, virtio-mem emoved that memory, yet=20 a /dev/mem mapping happened and we have invalid memory access. Mapping /dev/mem and accidentally being able to read/write virtio-mem=20 memory has to be forbidden in sane environments. Force unloading a=20 driver or preventing it from loading just to touch virtio-mem memory via=20 /dev/mem is not a sane environment, someone is explicitly is asking for=20 trouble, which is fine. >=20 >> We disallow mapping to some other regions independent of >> CONFIG_STRICT_DEVMEM already, so the idea to ignore CONFIG_STRICT_DEVM= EM >> is not completely new: >> >> "Note that with PAT support enabled, even in this case there are >> restrictions on /dev/mem use due to the cache aliasing requirements." >> >> Maybe you even want to do something similar with PMEM now that there i= s >> infrastructure for it and just avoid having to deal with revoking >> /dev/mem mappings later. >=20 > That would be like blocking writes to /dev/sda just because a > filesytem might later be mounted on it. If the /dev/mem access is not > actively colliding with other kernel operations what business does the > kernel have saying no? That the spec defines that that memory must not be read/written, because=20 there might not be any memory after all anymore backing the virtio-mem=20 device, or there is and the hypervisor will flag you as "malicious" and=20 eventually zap the VM. That's different to most physical devices I am=20 aware of. >=20 > I'm pushing on this topic because I am also considering an exclusion > on PCI configuration access to the "DOE mailbox" since it can disrupt > the kernel's operation, at the same time, root can go change PCI BARs > to nonsensical values whenever it wants which is also in the category > of "has no use case && could be harmful". Right. >=20 >> I think there are weird debugging/educational setups [1] that still >> require CONFIG_STRICT_DEVMEM=3Dn even with iomem=3Drelaxed. Take a loo= k at >> lib/devmem_is_allowed.c:devmem_is_allowed(), it disallows any access t= o >> (what's currently added as) System RAM. It might just do what people >> want when dealing with system RAM that doesn't suddenly vanish , so I >> don't ultimately see why we should remove CONFIG_STRICT_DEVMEM=3Dn. >=20 > Yes, I wanted to tease out more of your rationale on where the line > should be drawn, I think a mostly unfettered /dev/mem mode is here to > stay. I could most certainly be convinced to a) Leave CONFIG_STRICT_DEVMEM=3Dn untouched b) Restrict what I propose to CONFIG_STRICT_DEVMEM=3Dy. I could even go ahead and require CONFIG_STRICT_DEVMEM for virtio-mem. --=20 Thanks, David / dhildenb