linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Gregory Price <gregory.price@memverge.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Gregory Price <gourry.memverge@gmail.com>,
	linux-mm@kvack.org, linux-doc@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-api@vger.kernel.org, x86@kernel.org,
	akpm@linux-foundation.org, arnd@arndb.de, tglx@linutronix.de,
	luto@kernel.org, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mhocko@kernel.org,
	tj@kernel.org, corbet@lwn.net, rakie.kim@sk.com,
	hyeongtak.ji@sk.com, honggyu.kim@sk.com, vtavarespetr@micron.com,
	peterz@infradead.org, jgroves@micron.com,
	ravis.opensrc@micron.com, sthanneeru@micron.com,
	emirakhur@micron.com, Hasan.Maruf@amd.com,
	seungjun.ha@samsung.com,
	Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
Subject: Re: [PATCH v5 02/11] mm/mempolicy: introduce MPOL_WEIGHTED_INTERLEAVE for weighted interleaving
Date: Tue, 26 Dec 2023 02:01:57 -0500	[thread overview]
Message-ID: <ZYp6ZRLZQVtTHest@memverge.com> (raw)
In-Reply-To: <8734vof3kq.fsf@yhuang6-desk2.ccr.corp.intel.com>

On Wed, Dec 27, 2023 at 04:32:37PM +0800, Huang, Ying wrote:
> Gregory Price <gourry.memverge@gmail.com> writes:
> 
> > +static unsigned int weighted_interleave_nid(struct mempolicy *pol, pgoff_t ilx)
> > +{
> > +	nodemask_t nodemask = pol->nodes;
> > +	unsigned int target, weight_total = 0;
> > +	int nid;
> > +	unsigned char weights[MAX_NUMNODES];
> 
> MAX_NUMNODSE could be as large as 1024.  1KB stack space may be too
> large?
> 

I've been struggling with a good solution to this.  We need a local copy
of weights to prevent weights from changing out from under us during
allocation (which may take quite some time), but it seemed unwise to
to allocate 1KB heap in this particular path.

Is my concern unfounded?  If so, I can go ahead and add the allocation
code.

> > +	unsigned char weight;
> > +
> > +	barrier();
> 
> Memory barrier needs comments.
> 

Barrier is to stabilize nodemask on the stack, but yes i'll carry the
comment from interleave_nid into this barrier as well.

> > +
> > +	/* first ensure we have a valid nodemask */
> > +	nid = first_node(nodemask);
> > +	if (nid == MAX_NUMNODES)
> > +		return nid;
> 
> It appears that this isn't necessary, because we can check whether
> weight_total == 0 after the next loop.
> 

fair, will snip.

> > +
> > +	/* Then collect weights on stack and calculate totals */
> > +	for_each_node_mask(nid, nodemask) {
> > +		weight = iw_table[nid];
> > +		weight_total += weight;
> > +		weights[nid] = weight;
> > +	}
> > +
> > +	/* Finally, calculate the node offset based on totals */
> > +	target = (unsigned int)ilx % weight_total;
> 
> Why use type casting?
> 

Artifact of old prototypes, snipped.

> > +
> > +	/* Stabilize the nodemask on the stack */
> > +	barrier();
> 
> I don't think barrier() is needed to wait for memory operations for
> stack.  It's usually used for cross-processor memory order.
>

This is present in the old interleave code.  To the best of my
understanding, the concern is for mempolicy->nodemask rebinding that can
occur when cgroups.cpusets.mems_allowed changes.

so we can't iterate over (mempolicy->nodemask), we have to take a local
copy.

My *best* understanding of the barrier here is to prevent the compiler
from reordering operations such that it attempts to optimize out the
local copy (or do lazy-fetch).

It is present in the original interleave code, so I pulled it forward to
this, but I have not tested whether this is a bit paranoid or not.

from `interleave_nid`:

 /*
  * The barrier will stabilize the nodemask in a register or on
  * the stack so that it will stop changing under the code.
  *
  * Between first_node() and next_node(), pol->nodes could be changed
  * by other threads. So we put pol->nodes in a local stack.
  */
 barrier();

> > +		/* Otherwise we adjust nr_pages down, and continue from there */
> > +		rem_pages -= pol->wil.cur_weight;
> > +		pol->wil.cur_weight = 0;
> > +		prev_node = node;
> 
> If pol->wil.cur_weight == 0, prev_node will be used without being
> initialized below.
> 

pol->wil.cur_weight is not used below.

> > +	}
> > +
> > +	/* Now we can continue allocating as if from 0 instead of an offset */
> > +	rounds = rem_pages / weight_total;
> > +	delta = rem_pages % weight_total;
> > +	for (i = 0; i < nnodes; i++) {
> > +		node = next_node_in(prev_node, nodes);
> > +		weight = weights[node];
> > +		node_pages = weight * rounds;
> > +		if (delta) {
> > +			if (delta > weight) {
> > +				node_pages += weight;
> > +				delta -= weight;
> > +			} else {
> > +				node_pages += delta;
> > +				delta = 0;
> > +			}
> > +		}
> > +		/* We may not make it all the way around */
> > +		if (!node_pages)
> > +			break;
> > +		/* If an over-allocation would occur, floor it */
> > +		if (node_pages + total_allocated > nr_pages) {
> 
> Why is this possible?
> 

this may have been a paranoid artifact from an early prototype, will
snip and validate.

~Gregory


  reply	other threads:[~2023-12-27 15:42 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-23 18:10 [PATCH v5 00/11] mempolicy2, mbind2, and weighted interleave Gregory Price
2023-12-23 18:10 ` [PATCH v5 07/11] mm/mempolicy: add userland mempolicy arg structure Gregory Price
2023-12-23 18:11 ` [PATCH v5 10/11] mm/mempolicy: add the mbind2 syscall Gregory Price
2024-01-02 14:47   ` Geert Uytterhoeven
2023-12-25  7:54 ` [PATCH v5 00/11] mempolicy2, mbind2, and weighted interleave Huang, Ying
2023-12-26  7:45   ` Gregory Price
2024-01-02  4:27     ` Huang, Ying
2024-01-02 19:06       ` Gregory Price
2024-01-03  3:15         ` Huang, Ying
     [not found] ` <20231223181101.1954-2-gregory.price@memverge.com>
2023-12-27  6:42   ` [PATCH v5 01/11] mm/mempolicy: implement the sysfs-based weighted_interleave interface Huang, Ying
2023-12-26  6:48     ` Gregory Price
2024-01-02  7:41       ` Huang, Ying
2024-01-02 19:45         ` Gregory Price
2024-01-03  2:45           ` Huang, Ying
2024-01-03  2:59             ` Gregory Price
2024-01-03  6:03               ` Huang, Ying
2024-01-03  2:46         ` Gregory Price
     [not found] ` <20231223181101.1954-3-gregory.price@memverge.com>
2023-12-27  8:32   ` [PATCH v5 02/11] mm/mempolicy: introduce MPOL_WEIGHTED_INTERLEAVE for weighted interleaving Huang, Ying
2023-12-26  7:01     ` Gregory Price [this message]
2023-12-26  8:06       ` Gregory Price
2023-12-26 11:32       ` Gregory Price
2024-01-02  8:42       ` Huang, Ying
2024-01-02 20:30         ` Gregory Price
2024-01-03  5:46           ` Huang, Ying
2024-01-03 22:09             ` Gregory Price
2024-01-04  5:39               ` Huang, Ying
2024-01-04 18:59                 ` Gregory Price
2024-01-05  6:51                   ` Huang, Ying
2024-01-05  7:25                     ` Gregory Price
2024-01-08  7:08                       ` Huang, Ying
     [not found] ` <20231223181101.1954-4-gregory.price@memverge.com>
2023-12-27  8:39   ` [PATCH v5 03/11] mm/mempolicy: refactor sanitize_mpol_flags for reuse Huang, Ying
2023-12-26  7:05     ` Gregory Price
2023-12-26 11:48       ` Gregory Price
2024-01-02  9:09         ` Huang, Ying
     [not found] ` <20231223181101.1954-9-gregory.price@memverge.com>
2024-01-02 14:38   ` [PATCH v5 08/11] mm/mempolicy: add set_mempolicy2 syscall Geert Uytterhoeven
     [not found] ` <20231223181101.1954-10-gregory.price@memverge.com>
2024-01-02 14:46   ` [PATCH v5 09/11] mm/mempolicy: add get_mempolicy2 syscall Geert Uytterhoeven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZYp6ZRLZQVtTHest@memverge.com \
    --to=gregory.price@memverge.com \
    --cc=Hasan.Maruf@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=emirakhur@micron.com \
    --cc=gourry.memverge@gmail.com \
    --cc=honggyu.kim@sk.com \
    --cc=hpa@zytor.com \
    --cc=hyeongtak.ji@sk.com \
    --cc=jgroves@micron.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rakie.kim@sk.com \
    --cc=ravis.opensrc@micron.com \
    --cc=seungjun.ha@samsung.com \
    --cc=sthanneeru.opensrc@micron.com \
    --cc=sthanneeru@micron.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vtavarespetr@micron.com \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox