From: Dan Williams <dan.j.williams@intel.com>
To: Bjorn Helgaas <helgaas@kernel.org>, Huang Ying <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
Dan Williams <dan.j.williams@intel.com>,
David Hildenbrand <david@redhat.com>,
"Davidlohr Bueso" <dave@stgolabs.net>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
Dave Jiang <dave.jiang@intel.com>,
Alison Schofield <alison.schofield@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
"Ira Weiny" <ira.weiny@intel.com>,
Alistair Popple <apopple@nvidia.com>,
"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>,
Bjorn Helgaas <bhelgaas@google.com>, Baoquan He <bhe@redhat.com>
Subject: Re: [PATCH -v2] Resource: fix region_intersects() for CXL memory
Date: Wed, 21 Aug 2024 18:43:43 -0700 [thread overview]
Message-ID: <66c697cf7b95a_760529414@dwillia2-mobl3.amr.corp.intel.com.notmuch> (raw)
In-Reply-To: <20240821184615.GA262749@bhelgaas>
Hi Bjorn,
Ying is out for the next week or so, so I will address your comments and
resubmit as I think this is a potentially urgent fix.
Bjorn Helgaas wrote:
> On Mon, Aug 19, 2024 at 10:34:13AM +0800, Huang Ying wrote:
> > On a system with CXL memory installed, the resource tree (/proc/iomem)
> > related to CXL memory looks like something as follows.
> >
> > 490000000-50fffffff : CXL Window 0
> > 490000000-50fffffff : region0
> > 490000000-50fffffff : dax0.0
> > 490000000-50fffffff : System RAM (kmem)
>
> I think the subject is too specific (the problem is something to do
> with the tree topology, not the fact that it's "CXL memory") and at
> the same time not specific enough ("fix" doesn't say anything about
> what was wrong or how it is fixed).
Agree, I will update this to be:
kernel/resource: Fix region_intersects() vs add_memory_driver_managed()
> IMO it could be improved by saying something about what is different
> about CXL, e.g., maybe it could mention checking children in addition
> to top-level resources.
CXL is but one source of a resource tree topology where "System RAM" is
a descendant of some other resource. I will fix up this changelog to
make it clear that dax/kmem and add_memory_driver_managed() potentiall
confuses region_intersects() in all cases since "System RAM" is never
one of the resources passed in to add_memory_driver_managed().
> > When the following command line is run to try writing some memory in
> > CXL memory range,
> >
> > $ dd if=data of=/dev/mem bs=1k seek=19136512 count=1
> > dd: error writing '/dev/mem': Bad address
> > 1+0 records in
> > 0+0 records out
> > 0 bytes copied, 0.0283507 s, 0.0 kB/s
>
> Took me a minute, but I guess the connection is that
> 19136512 * 1k = 0x490000000, which is the beginning of the CXL Window.
Yeah, so might as well write this in a way that makes that association
clearer:
$ dd if=data of=/dev/mem bs=$((1 << 10)) seek=$((0x490000000 >> 10)) count=1
> > the command fails as expected. However, the error code is wrong. It
> > should be "Operation not permitted" instead of "Bad address". And,
> > the following warning is reported in kernel log.
>
> This intro makes it sound like the problem being solved is the error
> code being wrong. But it seems like a more serious problem than that.
The concern was that this bug allowed System RAM protection bypass. That
does not seem to be the case on x86, but the worry is that other archs
are not saved in the same way and /dev/mem protections are impacted.
> > ioremap on RAM at 0x0000000490000000 - 0x0000000490000fff
>
> Incidental: it seems a little weird that this warning only exists on
> x86 and mips (and powerpc32 has a similar warning with different
> wording), but I assume we don't want to ioremap RAM on *any*
> architecture?
Put another way, we want "System RAM" presence to always fail
devmem_is_allowed() anywhere that "System RAM" appears in the ancestry.
> > WARNING: CPU: 2 PID: 416 at arch/x86/mm/ioremap.c:216 __ioremap_caller.constprop.0+0x131/0x35d
> > Modules linked in: cxl_pmem libnvdimm cbc encrypted_keys cxl_pmu
> > CPU: 2 UID: 0 PID: 416 Comm: dd Not tainted 6.11.0-rc3-kvm #40
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > RIP: 0010:__ioremap_caller.constprop.0+0x131/0x35d
> > ...
>
> > In the above resource tree, "System RAM" is a descendant of "CXL
> > Window 0" instead of a top level resource. So, region_intersects()
> > will report no System RAM resources in the CXL memory region
> > incorrectly, because it only checks the top level resources.
> > Consequently, devmem_is_allowed() will return 1 (allow access via
> > /dev/mem) for CXL memory region incorrectly. Fortunately, ioremap()
> > doesn't allow to map System RAM and reject the access.
> >
> > However, region_intersects() needs to be fixed to work correctly with
> > the resources tree with CXL Window as above. To fix it, if we found a
> > unmatched resource in the top level, we will continue to search
> > matched resources in its descendant resources. So, we will not miss
> > any matched resources in resource tree anymore. In the new
> > implementation,
> >
> > |------------- "CXL Window 0" ------------|
> > |-- "System RAM" --|
> >
> > will look as if
> >
> > |-- "System RAM" --||-- "CXL Window 0a" --|
>
> Where did "0a" come from? The /proc/iomem above mentioned
> "CXL Window 0"; is the "a" spurious? Same question applies to the
> code comment below.
Not sure where that came from, will clean up and provide a test that can
upstreammed as well.
next prev parent reply other threads:[~2024-08-22 1:44 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-19 2:34 Huang Ying
2024-08-19 8:13 ` Andy Shevchenko
2024-09-04 7:48 ` Huang, Ying
2024-09-04 12:43 ` Andy Shevchenko
2024-09-05 3:00 ` Huang, Ying
2024-09-05 10:57 ` Andy Shevchenko
2024-09-04 23:58 ` Dan Williams
2024-09-05 10:56 ` Andy Shevchenko
2024-09-05 11:08 ` David Hildenbrand
2024-09-05 12:36 ` Andy Shevchenko
2024-09-05 12:42 ` David Hildenbrand
2024-09-05 12:50 ` Andy Shevchenko
2024-09-05 12:57 ` David Hildenbrand
2024-10-07 14:24 ` Andy Shevchenko
2024-09-05 21:37 ` Dan Williams
2024-10-07 14:16 ` Andy Shevchenko
2024-09-06 1:07 ` Huang, Ying
2024-10-07 14:12 ` Andy Shevchenko
2024-10-08 2:52 ` Huang, Ying
2024-10-08 17:01 ` Andy Shevchenko
2024-10-08 19:02 ` Dan Williams
2024-10-08 19:18 ` Andy Shevchenko
2024-08-21 18:46 ` Bjorn Helgaas
2024-08-22 1:43 ` Dan Williams [this message]
2024-08-22 21:29 ` Bjorn Helgaas
2024-09-04 23:58 ` Dan Williams
2024-08-30 6:43 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=66c697cf7b95a_760529414@dwillia2-mobl3.amr.corp.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=akpm@linux-foundation.org \
--cc=alison.schofield@intel.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=apopple@nvidia.com \
--cc=bhe@redhat.com \
--cc=bhelgaas@google.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=helgaas@kernel.org \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=vishal.l.verma@intel.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox