linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Fan Ni <fan.ni@samsung.com>
To: Gregory Price <gregory.price@memverge.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	Ira Weiny <ira.weiny@intel.com>,
	"David Hildenbrand" <david@redhat.com>,
	Dave Jiang <dave.jiang@intel.com>,
	"Davidlohr Bueso" <dave@stgolabs.net>,
	Kees Cook <keescook@chromium.org>,
	"Jonathan Cameron" <Jonathan.Cameron@huawei.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	Adam Manzanares <a.manzanares@samsung.com>
Subject: Re: [PATCH v2 00/20] CXL RAM and the 'Soft Reserved' => 'System RAM' default
Date: Wed, 22 Feb 2023 21:41:49 +0000	[thread overview]
Message-ID: <20230222214140.GA1276133@bgt-140510-bm03> (raw)
In-Reply-To: <Y+qB9T/PCZ6TpYlK@memverge.com>

On Mon, Feb 13, 2023 at 01:31:17PM -0500, Gregory Price wrote:

> On Mon, Feb 13, 2023 at 01:22:17PM -0500, Gregory Price wrote:
> > On Fri, Feb 10, 2023 at 01:05:21AM -0800, Dan Williams wrote:
> > > Changes since v1: [1]
> > > [... snip ...]
> > [... snip ...]
> > Really i see these decoders and device mappings setup:
> > port1 -> mem2
> > port2 -> mem1
> > port3 -> mem0
> 
> small correction:
> port1 -> mem1
> port3 -> mem0
> port2 -> mem2
> 
> > 
> > Therefore I should expect
> > decoder0.0 -> mem2
> > decoder0.1 -> mem1
> > decoder0.2 -> mem0
> > 
> 
> this end up mapping this way, which is still further jumbled.
> 
> Something feels like there's an off-by-one
> 

Currently, the naming of memdevs can be out-of-order due to the
following two reasons,
1. At kernel side, cxl port driver does async device probe, which can
change the memdev naming even within a single OS boot and among multiple
time of device enumeration. The pattern can be observed with following
steps in the guest,
	loop(){
	a) modprobe cxl_xxx
	b)cxl list  --> you will see the memdev name changes (like mem0->mem1).
	c) rmmod cxl_xxx
	}
This behaviour can be avoided by using sync device probe by making the
following change
--------------------------------------------
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 258004f34281..f3f90fad62b5 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -663,7 +663,7 @@ static struct pci_driver cxl_pci_driver = {
 	.probe			= cxl_pci_probe,
 	.err_handler		= &cxl_error_handlers,
 	.driver	= {
-		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,
+		.probe_type = PROBE_FORCE_SYNCHRONOUS,
 	},
 };
-------------------------------------------

The above patch, you will see consistent memdev naming within one
OS boot, however, the order can be still different from what we expect with
the qemu config options we use. We need to make some change at QEMU side
also as shown below.

2. Currently in Qemu, multiple components at the same topology level are
stored in a data structure called QLIST as defined in
include/qemu/queue.h. When enqueuing a component, current qemu code uses
QLIST_INSERT_HEAD to insert the item at the head, but when iterating, it
uses QLIST_FOREACH/QLIST_FOREACH_SAFE which is also from the head of the
list. That is to say, if we enqueue items P1,P2,P3 in order, when iterating,
we get P3,P2,P1. I have a simple test with the below code change(always
insert to the list tail), the order issue is fixed.

----------------------------------------------------------------------------
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index e029e7bf66..15491960e1 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -130,7 +130,7 @@ struct {                                                                \
         (listelm)->field.le_prev = &(elm)->field.le_next;               \
 } while (/*CONSTCOND*/0)
 
-#define QLIST_INSERT_HEAD(head, elm, field) do {                        \
+#define QLIST_INSERT_HEAD_OLD(head, elm, field) do {                    \
         if (((elm)->field.le_next = (head)->lh_first) != NULL)          \
                 (head)->lh_first->field.le_prev = &(elm)->field.le_next;\
         (head)->lh_first = (elm);                                       \
@@ -146,6 +146,20 @@ struct {                                                                \
         (elm)->field.le_prev = NULL;                                    \
 } while (/*CONSTCOND*/0)
 
+#define QLIST_INSERT_TAIL(head, elm, field) do {                        \
+        typeof(elm) last_p = (head)->lh_first;                          \
+        while (last_p && last_p->field.le_next)                         \
+            last_p = last_p->field.le_next;                             \
+        if (last_p)                                                     \
+            QLIST_INSERT_AFTER(last_p, elm, field);                     \
+        else                                                            \
+            QLIST_INSERT_HEAD_OLD(head, elm, field);                    \
+} while (/*CONSTCOND*/0)
+
+#define QLIST_INSERT_HEAD(head, elm, field) do {                        \
+        QLIST_INSERT_TAIL(head, elm, field);                            \
+} while (/*CONSTCOND*/0)
+
 /*
  * Like QLIST_REMOVE() but safe to call when elm is not in a list
  */
-----------------------------------------------------------------------------

The memdev naming order can also cause confusion when creating regions
for multiple memdevs under different HBs as in the kernel code, we enforce
HB check to ensure the target position matches the CFMW configuration.
To avoid the confusion, we can use "cxl list -TD" to find out the target
position for a memdev, but it is kind of annoying to do it before
creating region.


  parent reply	other threads:[~2023-02-22 21:41 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-10  9:05 Dan Williams
2023-02-10  9:05 ` [PATCH v2 01/20] cxl/memdev: Fix endpoint port removal Dan Williams
2023-02-10 17:28   ` Jonathan Cameron
2023-02-10 21:14     ` Dan Williams
2023-02-10 23:17   ` Verma, Vishal L
2023-02-10  9:05 ` [PATCH v2 02/20] cxl/Documentation: Update references to attributes added in v6.0 Dan Williams
2023-02-10  9:05 ` [PATCH v2 03/20] cxl/region: Add a mode attribute for regions Dan Williams
2023-02-10  9:05 ` [PATCH v2 04/20] cxl/region: Support empty uuids for non-pmem regions Dan Williams
2023-02-10 17:30   ` Jonathan Cameron
2023-02-10 23:34   ` Ira Weiny
2023-02-10  9:05 ` [PATCH v2 05/20] cxl/region: Validate region mode vs decoder mode Dan Williams
2023-02-10  9:05 ` [PATCH v2 06/20] cxl/region: Add volatile region creation support Dan Williams
2023-02-10  9:06 ` [PATCH v2 07/20] cxl/region: Refactor attach_target() for autodiscovery Dan Williams
2023-02-10  9:06 ` [PATCH v2 08/20] cxl/region: Cleanup target list on attach error Dan Williams
2023-02-10 17:31   ` Jonathan Cameron
2023-02-10 23:17   ` Verma, Vishal L
2023-02-10 23:46   ` Ira Weiny
2023-02-10  9:06 ` [PATCH v2 09/20] cxl/region: Move region-position validation to a helper Dan Williams
2023-02-10 17:34   ` Jonathan Cameron
2023-02-10  9:06 ` [PATCH v2 10/20] kernel/range: Uplevel the cxl subsystem's range_contains() helper Dan Williams
2023-02-10  9:06 ` [PATCH v2 11/20] cxl/region: Enable CONFIG_CXL_REGION to be toggled Dan Williams
2023-02-10  9:06 ` [PATCH v2 12/20] cxl/port: Split endpoint and switch port probe Dan Williams
2023-02-10 17:41   ` Jonathan Cameron
2023-02-10 23:21   ` Verma, Vishal L
2023-02-10  9:06 ` [PATCH v2 13/20] cxl/region: Add region autodiscovery Dan Williams
2023-02-10 18:09   ` Jonathan Cameron
2023-02-10 21:35     ` Dan Williams
2023-02-14 13:23       ` Jonathan Cameron
2023-02-14 16:43         ` Dan Williams
2023-02-10 21:49     ` Dan Williams
2023-02-11  0:29   ` Verma, Vishal L
2023-02-11  1:03     ` Dan Williams
     [not found]   ` <CGME20230213192752uscas1p1c49508da4b100c9ba6a1a3aa92ca03e5@uscas1p1.samsung.com>
2023-02-13 19:27     ` Fan Ni
     [not found]   ` <CGME20230228185348uscas1p1a5314a077383ee81ac228c1b9f1da2f8@uscas1p1.samsung.com>
2023-02-28 18:53     ` Fan Ni
2023-02-10  9:06 ` [PATCH v2 14/20] tools/testing/cxl: Define a fixed volatile configuration to parse Dan Williams
2023-02-10 18:12   ` Jonathan Cameron
2023-02-10 18:36   ` Dave Jiang
2023-02-11  0:39   ` Verma, Vishal L
2023-02-10  9:06 ` [PATCH v2 15/20] dax/hmem: Move HMAT and Soft reservation probe initcall level Dan Williams
2023-02-10 21:53   ` Dave Jiang
2023-02-10 21:57     ` Dave Jiang
2023-02-11  0:40   ` Verma, Vishal L
2023-02-10  9:06 ` [PATCH v2 16/20] dax/hmem: Drop unnecessary dax_hmem_remove() Dan Williams
2023-02-10 21:59   ` Dave Jiang
2023-02-11  0:41   ` Verma, Vishal L
2023-02-10  9:07 ` [PATCH v2 17/20] dax/hmem: Convey the dax range via memregion_info() Dan Williams
2023-02-10 22:03   ` Dave Jiang
2023-02-11  4:25   ` Verma, Vishal L
2023-02-10  9:07 ` [PATCH v2 18/20] dax/hmem: Move hmem device registration to dax_hmem.ko Dan Williams
2023-02-10 18:25   ` Jonathan Cameron
2023-02-10 22:09   ` Dave Jiang
2023-02-11  4:41   ` Verma, Vishal L
2023-02-10  9:07 ` [PATCH v2 19/20] dax: Assign RAM regions to memory-hotplug by default Dan Williams
2023-02-10 22:19   ` Dave Jiang
2023-02-11  5:57   ` Verma, Vishal L
2023-02-10  9:07 ` [PATCH v2 20/20] cxl/dax: Create dax devices for CXL RAM regions Dan Williams
2023-02-10 18:38   ` Jonathan Cameron
2023-02-10 22:42   ` Dave Jiang
2023-02-10 17:53 ` [PATCH v2 00/20] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
2023-02-11 14:04   ` Gregory Price
2023-02-13 18:22 ` Gregory Price
2023-02-13 18:31   ` Gregory Price
     [not found]     ` <CGME20230222214151uscas1p26d53b2e198f63a1f382fe575c6c25070@uscas1p2.samsung.com>
2023-02-22 21:41       ` Fan Ni [this message]
2023-02-22 22:18         ` Dan Williams
2023-02-14 13:35   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230222214140.GA1276133@bgt-140510-bm03 \
    --to=fan.ni@samsung.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=a.manzanares@samsung.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=gregory.price@memverge.com \
    --cc=ira.weiny@intel.com \
    --cc=keescook@chromium.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox