Re: [PATCH] selftests/mm: Simplify byte pattern checking in mremap_test

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Dev Jain <dev.jain@arm.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
	"David Hildenbrand (Arm)" <david@kernel.org>,
	akpm@linux-foundation.org, shuah@kernel.org
Cc: ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org,
	rppt@kernel.org, surenb@google.com, mhocko@suse.com,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, anshuman.khandual@arm.com,
	Sarthak Sharma <sarthak.sharma@arm.com>
Subject: Re: [PATCH] selftests/mm: Simplify byte pattern checking in mremap_test
Date: Tue, 14 Apr 2026 17:27:37 +0530	[thread overview]
Message-ID: <62baa8cd-eecc-41f4-88ab-f9605c7f2fcf@arm.com> (raw)
In-Reply-To: <b5fde792-6b9d-4bd4-8a64-f3a270ceb621@arm.com>



On 14/04/26 1:01 pm, Ryan Roberts wrote:
> On 14/04/2026 06:09, Dev Jain wrote:
>>
>>
>> On 14/04/26 12:57 am, David Hildenbrand (Arm) wrote:
>>> On 4/10/26 16:30, Dev Jain wrote:
>>>> The original version of mremap_test (7df666253f26: "kselftests: vm: add
>>>> mremap tests") validated remapped contents byte-by-byte and printed a
>>>> mismatch index in case the bytes streams are not equal. That made
>>>> validation expensive in both cases: for "no mismatch" (the common case when
>>>> mremap is not buggy), it still walked all bytes in C; for "mismatch", it
>>>> broke out of the loop after printing the mismatch index.
>>>>
>>>> Later, my commit 7033c6cc9620 ("selftests/mm: mremap_test: optimize
>>>> execution time from minutes to seconds using chunkwise memcmp") tried to
>>>> optimize both cases by using chunk-wise memcmp() and only scanning bytes
>>>> within a range which has been determined by memcmp as mismatching.
>>>>
>>>> But get_sqrt() in that commit is buggy: `high = mid - 1` is applied
>>>> unconditionally. This makes the speed of checking the mismatch index
>>>> suboptimal.
>>>
>>> So is that the only problem with 7033c6cc9620: the speed?
>>
>> Yes.
>>
>> I'll explain the algorithm in 7033c6cc9620.
>>
>> The problem statement is: given two buffers of equal length n, find the
>> first mismatch index.
>>
>> Algorithm: Divide the buffers into sqrt(n) chunks. Do a memcmp() over
>> each chunk. If all of them succeed, the buffers are equal, giving the
>> result in O(sqrt(n)) * t, where t = time taken by memcmp().
>>
>> Otherwise, worst case is that we find the mismatch in the last chunk.
>> Now brute-force iterate this chunk to find the mismatch. Since chunk
>> size is sqrt(n), complexity is again
>> sqrt(n) * t + sqrt(n) = O(sqrt(n)) * t.
>>
>> So if get_sqrt() computes a wrong square root, we lose this time
>> complexity.
>>
>> Maybe there is an optimal value of x = #number of chunks of the buffer,
>> which may not be sqrt(n).
>>
>> But given the information we have, a CS course on algorithms will
>> say this is one of the optimal ways to do it.
>>
>>>
>>>>
>>>> The mismatch index does not provide useful debugging value here: if
>>>> validation fails, we know mremap behavior is wrong, and the specific byte
>>>> offset does not make root-causing easier.
>>>
>>> Fully agreed.
>>>
>>>>
>>>> So instead of fixing get_sqrt(), bite the bullet, drop mismatch index
>>>> scanning and just compare the two byte streams with memcmp().
>>>
>>> How does this affect the execution time of the test?
>>
>> I just checked with ./mremap_test -t 0, the variance is very high on my
>> system.
>>
>> In the common case of the test passing:
>>
>> before patch, there are multiple sub-length calls to memcmp.
>> after patch, there is a single full-length call to memcmp.
>>
>> So the time should reduce but may not be very distinguishable.
> 
> My intuition would be the opposite; if you hafve a 4096 byte buffer, I would
> have thought that a single memcmp would be significantly faster than sqrt(4096)
> = 64 calls, each over 64 bytes.
> 
> If you want to keep the common case fast, but also find the first differing
> offset on failure, I expect you can exploit the fact that the buffers are all
> page aligned. With some prompting, Codex gave me this:
> 
> 
> ---8<---
> static size_t first_mismatch_offset(const void *buf1, const void *buf2,
> 				    size_t len)
> {
> 	const uint64_t *ptr1 = buf1;
> 	const uint64_t *ptr2 = buf2;
> 	size_t word;
> 	size_t words = len / sizeof(*ptr1);
> 
> 	assert(!((uintptr_t)buf1 & (sizeof(*ptr1) - 1)));
> 	assert(!((uintptr_t)buf2 & (sizeof(*ptr2) - 1)));
> 	assert(!(len & (sizeof(*ptr1) - 1)));
> 
> 	if (!memcmp(buf1, buf2, len))
> 		return len;
> 
> 	for (word = 0; word < words; word++) {
> 		if (ptr1[word] != ptr2[word]) {
> 			const unsigned char *bytes1 =
> 				(const unsigned char *)&ptr1[word];
> 			const unsigned char *bytes2 =
> 				(const unsigned char *)&ptr2[word];
> 			size_t i;
> 
> 			for (i = 0; i < sizeof(*ptr1); i++) {
> 				if (bytes1[i] != bytes2[i])
> 					return word * sizeof(*ptr1) + i;
> 			}
> 		}
> 	}
> 
> 	return len;
> }
> ---8<---
> 
> I've not benchmarked it though...

Interesting, thanks Ryan.

It may be faster from a constant-factor PoV - cache, CPU etc not from a
time complexity PoV.

But the point is that this is not a problem worthy of solving : )

> 
> Thanks,
> Ryan
> 
> 
> 
>>
>>>
>>>>
>>>> Reported-by: Sarthak Sharma <sarthak.sharma@arm.com>
>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>
>>> Fixes: 7033c6cc9620 ("selftests/mm: mremap_test: optimize execution time
>>> from minutes to seconds using chunkwise memcmp")
>>>
>>> ?
>>
>> Not needed. 7033c6cc9620 does not create any incorrectness in the checking
>> of mismatch index.
>>
>>>
>>>> ---
>>>> Sorry for sending two patchsets the same day - the problem was made known
>>>> to me today, and I couldn't help myself but fix it immediately, imagine
>>>> my embarrassment when I found out that I made a typo in the binary search
>>>> code which I had been writing consistently throughout college :)
>>>
>>> :)
>>>
>>>>
>>>> Applies on mm-unstable.
>>>>
>>>>  tools/testing/selftests/mm/mremap_test.c | 109 +++--------------------
>>>>  1 file changed, 10 insertions(+), 99 deletions(-)
>>>
>>> I mean, it certainly looks like a nice cleanup.
>>>
>>
>

next prev parent reply	other threads:[~2026-04-14 11:57 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-10 14:30 Dev Jain
2026-04-13 19:27 ` David Hildenbrand (Arm)
2026-04-14  5:09   ` Dev Jain
2026-04-14  7:31     ` Ryan Roberts
2026-04-14 11:57       ` Dev Jain [this message]
2026-04-14  8:01     ` David Hildenbrand (Arm)
2026-04-14  9:47       ` David Laight
2026-04-14  9:53         ` David Hildenbrand (Arm)
2026-04-14 12:00         ` Dev Jain
2026-04-14 11:53       ` Dev Jain
2026-04-14 11:38 ` Sarthak Sharma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62baa8cd-eecc-41f4-88ab-f9605c7f2fcf@arm.com \
    --to=dev.jain@arm.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=david@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=sarthak.sharma@arm.com \
    --cc=shuah@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox