Re: [LSFMM] automating measuring memory fragmentation

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Karim Manaouil <kmanaouil.dev@gmail.com>
To: Yu Zhao <yuzhao@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>,
	David Bueso <dave@stgolabs.net>, Michal Hocko <mhocko@suse.com>,
	Dan Williams <dan.j.williams@intel.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Daniel Gomez <da.gomez@samsung.com>,
	linux-mm <linux-mm@kvack.org>,
	lsf-pc@lists.linux-foundation.org,
	Karim Manaouil <kmanaouil.dev@gmail.com>
Subject: Re: [LSFMM] automating measuring memory fragmentation
Date: Thu, 16 May 2024 22:32:47 +0100	[thread overview]
Message-ID: <ZkZ7fwkBQ_pBEImO@localhost.localdomain> (raw)
In-Reply-To: <CAOUHufYGEVYGFMmA6sY_dp4DEBbRQ_MPgn=m16fzfmnKs3vysA@mail.gmail.com>

On Thu, May 16, 2024 at 02:05:24PM -0600, Yu Zhao wrote: 
> For example, if we have two systems, one has lower fragmentation for
> some orders but higher fragmentation for the rest, and the other is
> the opposite. How would we be able to use a single measure to describe
> this? IOW, I don't think a single measurement can describe all orders
> in a comparable way, which would be the weakest requirement we would
> have to impose.

> As I (badly) explained earlier, a single value can't do that because
> different orders are not on the same footing (or so to speak), unless
> we are only interested in one non-zero order. So we would need
> fragmentation_index[NR_non_zero_orders].

> No, for example, A can allocate 4 order-1 but 0 order-2, and B can
> allocate 2 order-1 *or* 1 order-2, which one would you say is better
> or worse? This, IMO, depends on which order you are trying to
> allocate. Does it make sense?

But higher order pages can always be broken down into lower order pages.
However, the inverse is not always gauranteed (they may not be buddies,
or compaction/reclaim isn't helpful).

Obviously, I would rather have one order-4 page than two order-3 pages.
You can always satisfy an allocation for an order n if a page with an
order higher than n is available.

One way to measure fragmentation is to compare how far we are from some 
perfect value. The perfect value represents the case when all the free
memory is available as blocks of pageblock_order or MAX_PAGE_ORDER.

I can do this as a one shot calculation, for example with

static void estimate_numa_fragmentation(void)
{
	pg_data_t *pgdat;
	struct zone *z;
	unsigned long fragscore;
	unsigned long bestscore;
	unsigned long nr_free;
	int order;

	for_each_online_pgdat(pgdat) {
		nr_free = fragscore = 0;
		z = pgdat->node_zones;
		while (z < (pgdat->node_zones + pgdat->nr_zones)) {
			if (!populated_zone(z)) {
				z++;
				continue;
			}
			spin_lock_irq(&z->lock);
			for (order = 0; order < NR_PAGE_ORDERS; order++) {
				nr_free += z->free_area[order].nr_free << order;
				fragscore += z->free_area[order].nr_free << (order * 2);
			}
			spin_unlock_irq(&z->lock);
			z++;
			cond_resched();
		}
		bestscore = nr_free << MAX_PAGE_ORDER;
		fragscore = ((bestscore - fragscore) * 100) / bestscore;
		pr_info("fragscore on node %d: %lu\n", pgdat->node_id, fragscore);
	}
}

But there must be a way to streamline the calculation and update the value
with low overhead over time.

Cheers
Karim
PhD Student
Edinburgh University

next prev parent reply	other threads:[~2024-05-16 21:32 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-15 19:34 Luis Chamberlain
2024-05-16  5:15 ` Yu Zhao
2024-05-16  6:23   ` Luis Chamberlain
2024-05-16 20:05     ` Yu Zhao
2024-05-16 21:32       ` Karim Manaouil [this message]
2024-05-16 21:36         ` Yu Zhao
2024-05-20 14:34         ` Vlastimil Babka (SUSE)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZkZ7fwkBQ_pBEImO@localhost.localdomain \
    --to=kmanaouil.dev@gmail.com \
    --cc=da.gomez@samsung.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave@stgolabs.net \
    --cc=jhubbard@nvidia.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@suse.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox