linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ackerley Tng <ackerleytng@google.com>
To: Ackerley Tng <ackerleytng@google.com>
Cc: mike.kravetz@oracle.com, willy@infradead.org,
	sidhartha.kumar@oracle.com,  akpm@linux-foundation.org,
	linux-fsdevel@vger.kernel.org,  linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, muchun.song@linux.dev,  jhubbard@nvidia.com,
	vannapurve@google.com, erdemaktas@google.com,
	 rientjes@google.com
Subject: Re: [PATCH 2/2] fs: hugetlbfs: Fix logic to skip allocation on hit in page cache
Date: Mon, 08 May 2023 20:40:07 +0000	[thread overview]
Message-ID: <diqz3546ftg8.fsf@ackerleytng-ctop.c.googlers.com> (raw)
In-Reply-To: <diqz5y92g51y.fsf@ackerleytng-ctop.c.googlers.com> (message from Ackerley Tng on Mon, 08 May 2023 16:29:29 +0000)


I figured it out, the bug is still in the use of page_cache_next_miss(),
but my earlier suggestion that xas_next always advances the pointer is
wrong.

Mike is right in that xas_next() is effectively xas_load() for the first
call after an XA_STATE() initialiation.

However, when max_scan is set to 1, xas.xa_index has no chance to be
advanced since the loop condition while(max_scan--) terminates the loop
after 1 iteration.

Hence, after loading happens with xas_next() regardless of the checks
within the loop (!entry, or xa_is_value(entry), etc), xa.xas_index is
not advanced, and the index returned always == the index passed in to
page_cache_next_miss().

Hence, in hugetlb_fallocate(), it always appears to be a page cache
miss.

Here's code from a selftest that can be added to lib/test_xarray.c:

/* Modified from page_cache_next_miss() */
static unsigned long xa_next_empty(struct xarray *xa, unsigned long start,  
unsigned long max_scan)
{
	XA_STATE(xas, xa, start);

	while (max_scan--) {
		void *entry = xas_next(&xas);
		if (!entry) {
			printk("entry not present");
			break;
		}
		if (xa_is_value(entry)) {
			printk("xa_is_value instead of pointer");
		}
		if (xas.xa_index == 0) {
			printk("wraparound");
			break;
		}
	}

	if (max_scan == -1)
		printk("exceeded max_scan");

	return xas.xa_index;
}

/* Replace this function in lib/test_xarray.c to run */
static noinline void check_find(struct xarray *xa)
{
	unsigned long max_scan;

	xa_init(&xa);
	xa_store_range(&xa, 3, 5, malloc(10), GFP_KERNEL);

	max_scan = 1;
	for (int i = 1; i < 8; i++)
		printk(" => xa_next_empty(xa, %d, %ld): %ld\n", i, max_scan,  
xa_next_empty(&xa, i, max_scan));

	printk("\n");

	max_scan = 2;
	for (int i = 1; i < 8; i++)
		printk(" => xa_next_empty(xa, %d, %ld): %ld\n", i, max_scan,  
xa_next_empty(&xa, i, max_scan));
}

Result:

entry not present => xa_next_empty(xa, 1, 1): 1
entry not present => xa_next_empty(xa, 2, 1): 2
exceeded max_scan => xa_next_empty(xa, 3, 1): 3
exceeded max_scan => xa_next_empty(xa, 4, 1): 4
exceeded max_scan => xa_next_empty(xa, 5, 1): 5
entry not present => xa_next_empty(xa, 6, 1): 6
entry not present => xa_next_empty(xa, 7, 1): 7

entry not present => xa_next_empty(xa, 1, 2): 1
entry not present => xa_next_empty(xa, 2, 2): 2
exceeded max_scan => xa_next_empty(xa, 3, 2): 4
exceeded max_scan => xa_next_empty(xa, 4, 2): 5
exceeded max_scan => xa_next_empty(xa, 5, 2): 6
entry not present => xa_next_empty(xa, 6, 2): 6
entry not present => xa_next_empty(xa, 7, 2): 7

Since the xarray was set up with pointers in indices 3, 4 and 5, we
expect xa_next_empty() or page_cache_next_miss() to return the next
index (4, 5 and 6 respectively), but when used with a max_scan of 1, we
just get the index passed in.

While max_scan could be increased to fix this bug, I feel that having a
separate function like filemap_has_folio() makes the intent more
explicit and is less reliant on internal values of struct xa_state.

xas.xa_index could take other values to indicate wraparound or sibling
nodes and I think it is better to use a higher level abstraction like
xa_load() (used in filemap_has_folio()). In addition xa_load() also
takes the locks it needs, which helps :).

I could refactor the other usage of page_cache_next_miss() in
hugetlbfs_pagecache_present() if you'd like!

On this note, the docstring of page_cache_next_miss() is also
inaccurate. The return value is not always outside the range specified
if max_scan is too small.


      reply	other threads:[~2023-05-08 20:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-02 23:56 [PATCH 0/2] Fix fallocate error in hugetlbfs when fallocating again Ackerley Tng
2023-05-02 23:56 ` [PATCH 1/2] mm: filemap: Add filemap_has_folio function Ackerley Tng
2023-05-02 23:56 ` [PATCH 2/2] fs: hugetlbfs: Fix logic to skip allocation on hit in page cache Ackerley Tng
2023-05-03  3:05   ` Mike Kravetz
2023-05-04  0:14     ` Mike Kravetz
2023-05-08 16:29       ` Ackerley Tng
2023-05-08 20:40         ` Ackerley Tng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=diqz3546ftg8.fsf@ackerleytng-ctop.c.googlers.com \
    --to=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=erdemaktas@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=rientjes@google.com \
    --cc=sidhartha.kumar@oracle.com \
    --cc=vannapurve@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox