From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33E5AC47087 for ; Fri, 28 May 2021 08:58:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D0CAB6124B for ; Fri, 28 May 2021 08:58:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D0CAB6124B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2E27F6B0070; Fri, 28 May 2021 04:58:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 291DF6B0071; Fri, 28 May 2021 04:58:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E5716B0072; Fri, 28 May 2021 04:58:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0154.hostedemail.com [216.40.44.154]) by kanga.kvack.org (Postfix) with ESMTP id CFB116B0070 for ; Fri, 28 May 2021 04:58:07 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 6C8928249980 for ; Fri, 28 May 2021 08:58:07 +0000 (UTC) X-FDA: 78190037814.33.EBA4B82 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf28.hostedemail.com (Postfix) with ESMTP id DFAAD2000BEA for ; Fri, 28 May 2021 08:57:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1622192286; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+WiQc46VNYZyiaojk5h9JxqplWT0OeXUCmh+32Qpg80=; b=PkiezJ65gKThqItW0miX2G3cHe1bi307Sl5gvkUN0OR+fpJyv29KTreEhqEeqimPlS28yO l2X3mMZUKos3Snl5P/RttUf3Mh9i+GckCJ4F+wl4BWZFo3dpkM5O/UliEq+l7kpRAWKloK WhTNEEwqtSWBU+/0nIgYotyTARGVEaI= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-204-BQKX-majNVGpcR1DyeZd8A-1; Fri, 28 May 2021 04:58:04 -0400 X-MC-Unique: BQKX-majNVGpcR1DyeZd8A-1 Received: by mail-wr1-f70.google.com with SMTP id 7-20020adf95070000b02901104ad3ef04so757317wrs.16 for ; Fri, 28 May 2021 01:58:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=+WiQc46VNYZyiaojk5h9JxqplWT0OeXUCmh+32Qpg80=; b=idjwsU59hwJXOPy7Sdc/7Kh/NwWGuH+/0W4tVD6Z5aiHgzQtxwS6cZVPZZ7EIv5bSH uAJCJP+8Zoj7gItbZorFWDJRUStyRJlYppkLmpenigahEDWyGT1fSMsWj4UsBfwZxA/w G3vorjjdDkMD7lNXO2HuAEx/7kUDJgiMP2qTDk7Bq6b3y0e9Gv1FencGngUtfgLfHmUj hDey6x2jW5/HL1vlHjbEGgE9AsMYByWodYhcm4Dl8gspXG5K6qBnlengLpvmLlCBUvpy x8inA9iNOiAm6/sUY4Tiq3iVaQ+LAjBKQr5aKojbWjQlPskkFVBep/me2rZ0nTQLVTbj lR5w== X-Gm-Message-State: AOAM5306eQJn/LSOfY5cL5/N2A9UvQYl4gxKomQTbpp/1yWxAmZuf00z ZMSHoDWi6R1cg55DUtGRgrLDk5tgFHW5+KsUt4bqRNb554T+S+CtnSROQCQ1trTRHm37hjgQqSV Y7JZWpf5eRZU= X-Received: by 2002:a1c:9a89:: with SMTP id c131mr7583540wme.49.1622192283377; Fri, 28 May 2021 01:58:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy9HSY/9ONSw0TXXli1OG4zWrRmNhz/PPNY2OWx7WiHUrnvujFLUoqSJrrWaY8vs4kjp++xtA== X-Received: by 2002:a1c:9a89:: with SMTP id c131mr7583507wme.49.1622192283052; Fri, 28 May 2021 01:58:03 -0700 (PDT) Received: from [192.168.3.132] (p5b0c6870.dip0.t-ipconnect.de. [91.12.104.112]) by smtp.gmail.com with ESMTPSA id u14sm13405368wmc.41.2021.05.28.01.58.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 28 May 2021 01:58:02 -0700 (PDT) Subject: Re: [PATCH v4] /dev/mem: Revoke mappings when a driver claims the region To: Dan Williams , Bjorn Helgaas Cc: Greg KH , Arnd Bergmann , Ingo Molnar , Kees Cook , Matthew Wilcox , Russell King , Andrew Morton , Linux Kernel Mailing List , Linux MM , Linux PCI , Daniel Vetter , =?UTF-8?Q?Krzysztof_Wilczy=c5=84ski?= , Jason Gunthorpe , Christoph Hellwig References: <159009507306.847224.8502634072429766747.stgit@dwillia2-desk3.amr.corp.intel.com> <20210527205845.GA1421476@bjorn-Precision-5520> From: David Hildenbrand Organization: Red Hat Message-ID: <7eaf4c10-292e-18d7-e8ce-3a6b72122381@redhat.com> Date: Fri, 28 May 2021 10:58:01 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PkiezJ65; spf=none (imf28.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: DFAAD2000BEA X-Stat-Signature: 7t93xn4bssebun36djbii71bc9xs49qo X-HE-Tag: 1622192278-242127 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 27.05.21 23:30, Dan Williams wrote: > On Thu, May 27, 2021 at 1:58 PM Bjorn Helgaas wrote: >> >> [+cc Daniel, Krzysztof, Jason, Christoph, linux-pci] >> >> On Thu, May 21, 2020 at 02:06:17PM -0700, Dan Williams wrote: >>> Close the hole of holding a mapping over kernel driver takeover event of >>> a given address range. >>> >>> Commit 90a545e98126 ("restrict /dev/mem to idle io memory ranges") >>> introduced CONFIG_IO_STRICT_DEVMEM with the goal of protecting the >>> kernel against scenarios where a /dev/mem user tramples memory that a >>> kernel driver owns. However, this protection only prevents *new* read(), >>> write() and mmap() requests. Established mappings prior to the driver >>> calling request_mem_region() are left alone. >>> >>> Especially with persistent memory, and the core kernel metadata that is >>> stored there, there are plentiful scenarios for a /dev/mem user to >>> violate the expectations of the driver and cause amplified damage. >>> >>> Teach request_mem_region() to find and shoot down active /dev/mem >>> mappings that it believes it has successfully claimed for the exclusive >>> use of the driver. Effectively a driver call to request_mem_region() >>> becomes a hole-punch on the /dev/mem device. >> >> This idea of hole-punching /dev/mem has since been extended to PCI >> BARs via [1]. >> >> Correct me if I'm wrong: I think this means that if a user process has >> mmapped a PCI BAR via sysfs, and a kernel driver subsequently requests >> that region via pci_request_region() or similar, we punch holes in the >> the user process mmap. The driver might be happy, but my guess is the >> user starts seeing segmentation violations for no obvious reason and >> is not happy. >> >> Apart from the user process issue, the implementation of [1] is >> problematic for PCI because the mmappable sysfs attributes now depend >> on iomem_init_inode(), an fs_initcall, which means they can't be >> static attributes, which ultimately leads to races in creating them. > > See the comments in iomem_get_mapping(), and revoke_iomem(): > > /* > * Check that the initialization has completed. Losing the race > * is ok because it means drivers are claiming resources before > * the fs_initcall level of init and prevent iomem_get_mapping users > * from establishing mappings. > */ > > ...the observation being that it is ok for the revocation inode to > come on later in the boot process because userspace won't be able to > use the fs yet. So any missed calls to revoke_iomem() would fall back > to userspace just seeing the resource busy in the first instance. I.e. > through the normal devmem_is_allowed() exclusion. > >> >> So I'm raising the question of whether this hole-punch is the right >> strategy. >> >> - Prior to revoke_iomem(), __request_region() was very >> self-contained and really only depended on the resource tree. Now >> it depends on a lot of higher-level MM machinery to shoot down >> mappings of other tasks. This adds quite a bit of complexity and >> some new ordering constraints. >> >> - Punching holes in the address space of an existing process seems >> unfriendly. Maybe the driver's __request_region() should fail >> instead, since the driver should be prepared to handle failure >> there anyway. > > It's prepared to handle failure, but in this case it is dealing with a > root user of 2 minds. > >> >> - [2] suggests that the hole punch protects drivers from /dev/mem >> writers, especially with persistent memory. I'm not really >> convinced. The hole punch does nothing to prevent a user process >> from mmapping and corrupting something before the driver loads. > > The motivation for this was a case that was swapping between /dev/mem > access and /dev/pmem0 access and they forgot to stop using /dev/mem > when they switched to /dev/pmem0. If root wants to use /dev/mem it can > use it, if root wants to stop the driver from loading it can set > mopdrobe policy or manually unbind, and if root asks the kernel to > load the driver while it is actively using /dev/mem something has to > give. Given root has other options to stop a driver the decision to > revoke userspace access when root messes up and causes a collision > seems prudent to me. > Is there a real use case for mapping pmem via /dev/mem or could we just prohibit the access to these areas completely? What's the use case for "swapping between /dev/mem access and /dev/pmem0 access" ? -- Thanks, David / dhildenb