From: "Huang, Ying" <ying.huang@intel.com>
To: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
Dan Williams <dan.j.williams@intel.com>,
David Hildenbrand <david@redhat.com>,
"Davidlohr Bueso" <dave@stgolabs.net>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
Dave Jiang <dave.jiang@intel.com>,
Alison Schofield <alison.schofield@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
"Ira Weiny" <ira.weiny@intel.com>,
Alistair Popple <apopple@nvidia.com>,
"Bjorn Helgaas" <bhelgaas@google.com>,
Baoquan He <bhe@redhat.com>
Subject: Re: [PATCH -v2] Resource: fix region_intersects() for CXL memory
Date: Wed, 04 Sep 2024 15:48:44 +0800 [thread overview]
Message-ID: <874j6vc10j.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <ZsL-wfDYsUmWKBep@smile.fi.intel.com> (Andy Shevchenko's message of "Mon, 19 Aug 2024 11:13:53 +0300")
Hi, Andy,
Thanks for your comments!
Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:
> On Mon, Aug 19, 2024 at 10:34:13AM +0800, Huang Ying wrote:
>> On a system with CXL memory installed, the resource tree (/proc/iomem)
>> related to CXL memory looks like something as follows.
>>
>> 490000000-50fffffff : CXL Window 0
>> 490000000-50fffffff : region0
>> 490000000-50fffffff : dax0.0
>> 490000000-50fffffff : System RAM (kmem)
>>
>> When the following command line is run to try writing some memory in
>> CXL memory range,
>>
>> $ dd if=data of=/dev/mem bs=1k seek=19136512 count=1
>> dd: error writing '/dev/mem': Bad address
>> 1+0 records in
>> 0+0 records out
>> 0 bytes copied, 0.0283507 s, 0.0 kB/s
>>
>> the command fails as expected. However, the error code is wrong. It
>> should be "Operation not permitted" instead of "Bad address". And,
>> the following warning is reported in kernel log.
>
>> ioremap on RAM at 0x0000000490000000 - 0x0000000490000fff
>> WARNING: CPU: 2 PID: 416 at arch/x86/mm/ioremap.c:216 __ioremap_caller.constprop.0+0x131/0x35d
>> Modules linked in: cxl_pmem libnvdimm cbc encrypted_keys cxl_pmu
>> CPU: 2 UID: 0 PID: 416 Comm: dd Not tainted 6.11.0-rc3-kvm #40
>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
>> RIP: 0010:__ioremap_caller.constprop.0+0x131/0x35d
>> Code: 2d 80 3d 24 6a a1 02 00 75 c1 48 8d 54 24 70 48 8d b4 24 90 00 00 00 48 c7 c7 40 3a 05 8a c6 05 07 6a a1 02 01 e8 a3 a0 01 00 <0f> 0b eb 9d 48 8b 84 24 90 00 00 00 48 8d 4c 24 60 89 ea 48 bf 00
>> RSP: 0018:ffff888105387bf0 EFLAGS: 00010282
>> RAX: 0000000000000000 RBX: 0000000490000fff RCX: 0000000000000000
>> RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffffed1020a70f73
>> RBP: 0000000000000000 R08: ffffed100d9fce92 R09: 0000000000000001
>> R10: ffffffff892348e7 R11: ffffed100d9fce91 R12: 0000000490000000
>> R13: 0000000000000001 R14: 0000000000000001 R15: ffff888105387ca0
>> FS: 00007f86c438c740(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 000055ba75b1b818 CR3: 0000000005231000 CR4: 0000000000350eb0
>> Call Trace:
>> <TASK>
>> ? __warn+0xd7/0x1b8
>> ? __ioremap_caller.constprop.0+0x131/0x35d
>> ? report_bug+0x136/0x19e
>> ? __ioremap_caller.constprop.0+0x131/0x35d
>> ? handle_bug+0x3c/0x64
>> ? exc_invalid_op+0x13/0x38
>> ? asm_exc_invalid_op+0x16/0x20
>> ? irq_work_claim+0x16/0x38
>> ? __ioremap_caller.constprop.0+0x131/0x35d
>> ? tracer_hardirqs_off+0x18/0x16d
>> ? kmem_cache_debug_flags+0x16/0x23
>> ? memremap+0xcb/0x184
>> ? iounmap+0xfe/0xfe
>> ? lock_sync+0xc7/0xc7
>> ? lock_sync+0xc7/0xc7
>> ? rcu_is_watching+0x1c/0x38
>> ? do_raw_read_unlock+0x37/0x42
>> ? _raw_read_unlock+0x1f/0x2f
>> memremap+0xcb/0x184
>> ? pfn_valid+0x159/0x159
>> ? resource_is_exclusive+0xba/0xc5
>> xlate_dev_mem_ptr+0x25/0x2f
>> write_mem+0x94/0xfb
>> vfs_write+0x128/0x26d
>> ? kernel_write+0x89/0x89
>> ? rcu_is_watching+0x1c/0x38
>> ? __might_fault+0x72/0xba
>> ? __might_fault+0x72/0xba
>> ? rcu_is_watching+0x1c/0x38
>> ? lock_release+0xba/0x13e
>> ? files_lookup_fd_raw+0x40/0x4b
>> ? __fget_light+0x64/0x89
>> ksys_write+0xac/0xfe
>> ? __ia32_sys_read+0x40/0x40
>> ? tracer_hardirqs_off+0x18/0x16d
>> ? tracer_hardirqs_on+0x11/0x146
>> do_syscall_64+0x9a/0xfd
>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>> RIP: 0033:0x7f86c4487140
>> Code: 40 00 48 8b 15 c1 9c 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 24 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
>> RSP: 002b:00007ffca9f62af8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
>> RAX: ffffffffffffffda RBX: 0000000000000400 RCX: 00007f86c4487140
>> RDX: 0000000000000400 RSI: 000055ba75b1a000 RDI: 0000000000000001
>> RBP: 000055ba75b1a000 R08: 0000000000000000 R09: 00007f86c457c080
>> R10: 00007f86c43a84d0 R11: 0000000000000202 R12: 0000000000000000
>> R13: 0000000000000000 R14: 000055ba75b1a000 R15: 0000000000000400
>> </TASK>
>> irq event stamp: 0
>> hardirqs last enabled at (0): [<0000000000000000>] 0x0
>> hardirqs last disabled at (0): [<ffffffff89091e85>] copy_process+0xb60/0x255f
>> softirqs last enabled at (0): [<ffffffff89091e85>] copy_process+0xb60/0x255f
>> softirqs last disabled at (0): [<0000000000000000>] 0x0
>
> Submitting Patches documentation suggests how to shrink the above to make it
> somewhat readable and important.
Thanks for the pointer. Will do that in the next version.
>> After investigation, we found the following bug.
>>
>> In the above resource tree, "System RAM" is a descendant of "CXL
>> Window 0" instead of a top level resource. So, region_intersects()
>> will report no System RAM resources in the CXL memory region
>> incorrectly, because it only checks the top level resources.
>> Consequently, devmem_is_allowed() will return 1 (allow access via
>> /dev/mem) for CXL memory region incorrectly. Fortunately, ioremap()
>> doesn't allow to map System RAM and reject the access.
>>
>> However, region_intersects() needs to be fixed to work correctly with
>> the resources tree with CXL Window as above. To fix it, if we found a
>> unmatched resource in the top level, we will continue to search
>> matched resources in its descendant resources. So, we will not miss
>> any matched resources in resource tree anymore. In the new
>> implementation,
>>
>> |------------- "CXL Window 0" ------------|
>> |-- "System RAM" --|
>>
>> will look as if
>>
>> |-- "System RAM" --||-- "CXL Window 0a" --|
>>
>> in effect.
>
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Davidlohr Bueso <dave@stgolabs.net>
>> Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Cc: Dave Jiang <dave.jiang@intel.com>
>> Cc: Alison Schofield <alison.schofield@intel.com>
>> Cc: Vishal Verma <vishal.l.verma@intel.com>
>> Cc: Ira Weiny <ira.weiny@intel.com>
>> Cc: Alistair Popple <apopple@nvidia.com>
>> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>> Cc: Bjorn Helgaas <bhelgaas@google.com>
>> Cc: Baoquan He <bhe@redhat.com>
>
> You may move Cc list after '---', so it won't unnecessarily pollute the commit
> message.
Emm... It appears that it's a common practice to include "Cc" in the
commit log.
>> ---
>> Changelogs:
>>
>> v2:
>>
>> - Fix a bug which occurs when descendant of a matched resource matches.
>>
>> - Revise the patch description and comments to make the algorithm clearer.
>> Thanks Dan and David's comments!
>>
>> - Link to v1: https://lore.kernel.org/linux-mm/20240816020723.771196-1-ying.huang@intel.com/
>
> ...
>
>> {
>> struct resource res;
>> int type = 0; int other = 0;
>> - struct resource *p;
>> + struct resource *p, *dp;
>> + resource_size_t ostart, oend;
>> + bool is_type, covered;
>
> Maybe keep more reversed xmas tree ordering?
OK.
>> res.start = start;
>> res.end = start + size - 1;
>>
>> for (p = parent->child; p ; p = p->sibling) {
>> - bool is_type = (((p->flags & flags) == flags) &&
>> - ((desc == IORES_DESC_NONE) ||
>> - (desc == p->desc)));
>> -
>> - if (resource_overlaps(p, &res))
>> - is_type ? type++ : other++;
>> + if (!resource_overlaps(p, &res))
>> + continue;
>> + is_type = (((p->flags & flags) == flags) &&
>> + ((desc == IORES_DESC_NONE) || (desc == p->desc)));
>
> In the original code and here the most outer parentheses are redundant.
OK. Will remove them.
>> + if (is_type) {
>> + type++;
>> + continue;
>> + }
>> + /*
>> + * Continue to search in descendant resources as if the
>> + * matched descendant resources cover some ranges of 'p'.
>> + *
>> + * |------------- "CXL Window 0" ------------|
>> + * |-- "System RAM" --|
>> + *
>> + * looks as if
>> + *
>> + * |-- "System RAM" --||-- "CXL Window 0a" --|
>> + *
>> + * in effect.
>> + */
>> + covered = false;
>
>> + ostart = max(res.start, p->start);
>> + oend = min(res.end, p->end);
>
> Isn't a reinvention of resource_intersection()? With that in place you may also
> drop the above resource_overlaps().
sizeof(struct resource) == 8 * sizeof(unsigned long)
Just want to avoid to define another struct resource on stack.
>> + for_each_resource(p, dp, false) {
>> + if (!resource_overlaps(dp, &res))
>> + continue;
>> + is_type = (((dp->flags & flags) == flags) &&
>> + ((desc == IORES_DESC_NONE) ||
>> + (desc == dp->desc)));
>
> Most outer parentheses are redundant. With them being dropped you also may
> unite the last two lines into a single one.
Sure. Will do that.
>> + if (is_type) {
>> + type++;
>> + if (dp->start > ostart)
>> + break;
>> + if (dp->end >= oend) {
>> + covered = true;
>> + break;
>> + }
>> + ostart = max(ostart, dp->end + 1);
>> + }
>> + }
>> + if (!covered)
>> + other++;
>> }
--
Best Regards,
Huang, Ying
next prev parent reply other threads:[~2024-09-04 7:52 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-19 2:34 Huang Ying
2024-08-19 8:13 ` Andy Shevchenko
2024-09-04 7:48 ` Huang, Ying [this message]
2024-09-04 12:43 ` Andy Shevchenko
2024-09-05 3:00 ` Huang, Ying
2024-09-05 10:57 ` Andy Shevchenko
2024-09-04 23:58 ` Dan Williams
2024-09-05 10:56 ` Andy Shevchenko
2024-09-05 11:08 ` David Hildenbrand
2024-09-05 12:36 ` Andy Shevchenko
2024-09-05 12:42 ` David Hildenbrand
2024-09-05 12:50 ` Andy Shevchenko
2024-09-05 12:57 ` David Hildenbrand
2024-10-07 14:24 ` Andy Shevchenko
2024-09-05 21:37 ` Dan Williams
2024-10-07 14:16 ` Andy Shevchenko
2024-09-06 1:07 ` Huang, Ying
2024-10-07 14:12 ` Andy Shevchenko
2024-10-08 2:52 ` Huang, Ying
2024-10-08 17:01 ` Andy Shevchenko
2024-10-08 19:02 ` Dan Williams
2024-10-08 19:18 ` Andy Shevchenko
2024-08-21 18:46 ` Bjorn Helgaas
2024-08-22 1:43 ` Dan Williams
2024-08-22 21:29 ` Bjorn Helgaas
2024-09-04 23:58 ` Dan Williams
2024-08-30 6:43 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874j6vc10j.fsf@yhuang6-desk2.ccr.corp.intel.com \
--to=ying.huang@intel.com \
--cc=akpm@linux-foundation.org \
--cc=alison.schofield@intel.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=apopple@nvidia.com \
--cc=bhe@redhat.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox