* [linux-next:master] [mm/migrate] b28dd7507f: ltp.move_pages04.fail
@ 2024-08-21 4:44 kernel test robot
2024-08-21 8:44 ` David Hildenbrand
0 siblings, 1 reply; 11+ messages in thread
From: kernel test robot @ 2024-08-21 4:44 UTC (permalink / raw)
To: David Hildenbrand
Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
Alexander Gordeev, Christian Borntraeger, Claudio Imbrenda,
Gerald Schaefer, Heiko Carstens, Janosch Frank, Jonathan Corbet,
Matthew Wilcox, Sven Schnelle, Vasily Gorbik, Ryan Roberts,
Zi Yan, ltp, oliver.sang
Hello,
kernel test robot noticed "ltp.move_pages04.fail" on:
commit: b28dd7507f2dd7923325eab6ea1f291416dcc396 ("mm/migrate: convert add_page_for_migration() from follow_page() to folio_walk")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
[test failed on linux-next/master bb1b0acdcd66e0d8eedee3570d249e076b89ab32]
in testcase: ltp
version: ltp-x86_64-14c1f76-1_20240817
with following parameters:
test: numa/move_pages04
compiler: gcc-12
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) with 256G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202408211026.636ade1a-oliver.sang@intel.com
Running tests.......
<<<test_start>>>
tag=move_pages04 stime=1724192393
cmdline="move_pages04"
contacts=""
analysis=exit
<<<test_output>>>
move_pages04 1 TFAIL : move_pages04.c:142: status[1] is ENOENT, expected EFAULT
incrementing stop
<<<execution_status>>>
initiation_status="ok"
duration=0 termination_type=exited termination_id=1 corefile=no
cutime=0 cstime=2
<<<test_end>>>
INFO: ltp-pan reported some tests FAIL
LTP Version: 20240524-180-g642c02725
###############################################################
Done executing testcases.
LTP Version: 20240524-180-g642c02725
###############################################################
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240821/202408211026.636ade1a-oliver.sang@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [linux-next:master] [mm/migrate] b28dd7507f: ltp.move_pages04.fail
2024-08-21 4:44 [linux-next:master] [mm/migrate] b28dd7507f: ltp.move_pages04.fail kernel test robot
@ 2024-08-21 8:44 ` David Hildenbrand
2024-08-21 9:15 ` David Hildenbrand
0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-08-21 8:44 UTC (permalink / raw)
To: kernel test robot
Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
Alexander Gordeev, Christian Borntraeger, Claudio Imbrenda,
Gerald Schaefer, Heiko Carstens, Janosch Frank, Jonathan Corbet,
Matthew Wilcox, Sven Schnelle, Vasily Gorbik, Ryan Roberts,
Zi Yan, ltp, Kirill A. Shutemov
On 21.08.24 06:44, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed "ltp.move_pages04.fail" on:
>
> commit: b28dd7507f2dd7923325eab6ea1f291416dcc396 ("mm/migrate: convert add_page_for_migration() from follow_page() to folio_walk")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> [test failed on linux-next/master bb1b0acdcd66e0d8eedee3570d249e076b89ab32]
>
> in testcase: ltp
> version: ltp-x86_64-14c1f76-1_20240817
> with following parameters:
>
> test: numa/move_pages04
>
>
>
> compiler: gcc-12
> test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) with 256G memory
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202408211026.636ade1a-oliver.sang@intel.com
>
>
>
> Running tests.......
> <<<test_start>>>
> tag=move_pages04 stime=1724192393
> cmdline="move_pages04"
> contacts=""
> analysis=exit
> <<<test_output>>>
> move_pages04 1 TFAIL : move_pages04.c:142: status[1] is ENOENT, expected EFAULT
This change is to be expected, and I touched on that in the patch
description. I am rather surprised that we have a test for that
handling, especially because it changed already (see below).
The man page says:
-EFAULT: This is a zero page or the memory area is not mapped
by the process.
"memory area not mapped" to me translates to "there is no mmap()", not
"there is no page mapped". And it says:
-ENOENT: The page is not present.
It's not really specifies what happens when "The memory area is mapped
by the process, but no page is faulted in.".
And the old handling was even inconsistent: to achieve the old behavior,
we abused FOLL_DUMP, which triggers in mm/gup.c:no_page_table():
(1) is_vm_hugetlb_page() *and* a hugetlb page is in the pagecache?
Return -EFAULT. Otherwise return NULL -> -ENOENT.
(2) vma_is_anonymous() || !vma->vm_ops->fault ? Return -EFAULT.
Otherwise return NULL -> -ENOENT.
So, if nothing is mapped, for things like shmem we would always return
"-ENOENT", for anonymous memory always "-EFAULT", and for hugetlb
"-ENOENT" or "-EFAULT", depending on the page cache state. Inconsistent,
and that handling is only in place because we abused FOLL_DUMP.
(there are other issues in the old implementation: on PMD migration
entries we would likely have returned -EFAULT in some cases where we
should have returned -ENOENT ...)
While writing folio_walk, I temporarily had a version that would return
error codes instead of NULL to indicate "there is something, but we
cannot return it" and "there is nothing", but it didn't feel right. And
I'm not really interested in revisiting that :)
----
Staring at the test, I realized the that behavior *changed* already,
because we wanted to fix the "zero page" and started abusing FOLL_DUMP,
but ended up changing the behavior for unpopulated (nothing mapped)
memory as well:
* NAME
* move_pages04.c
*
* DESCRIPTION
* Failure when page does not exit.
*
* ALGORITHM
*
* 1. Pass zero page (allocated, but not written to) as one of the
* page addresses to move_pages().
* 2. Check if the corresponding status is set to:
* -ENOENT for kernels < 4.3
* -EFAULT for kernels >= 4.3 [1]
*
* [1]
* d899844e9c98 "mm: fix status code which move_pages() returns for
zero page"
*
Likely test is *wrong*, because it claims to test the "zero page" but it
just passes "unpopulated" memory.
Let me dig deeper into the test.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [linux-next:master] [mm/migrate] b28dd7507f: ltp.move_pages04.fail
2024-08-21 8:44 ` David Hildenbrand
@ 2024-08-21 9:15 ` David Hildenbrand
2024-08-28 10:37 ` [LTP] " Cyril Hrubis
0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-08-21 9:15 UTC (permalink / raw)
To: kernel test robot
Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
Alexander Gordeev, Christian Borntraeger, Claudio Imbrenda,
Gerald Schaefer, Heiko Carstens, Janosch Frank, Jonathan Corbet,
Matthew Wilcox, Sven Schnelle, Vasily Gorbik, Ryan Roberts,
Zi Yan, ltp, Kirill A. Shutemov
On 21.08.24 10:44, David Hildenbrand wrote:
> On 21.08.24 06:44, kernel test robot wrote:
>>
>>
>> Hello,
>>
>> kernel test robot noticed "ltp.move_pages04.fail" on:
>>
>> commit: b28dd7507f2dd7923325eab6ea1f291416dcc396 ("mm/migrate: convert add_page_for_migration() from follow_page() to folio_walk")
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>
>> [test failed on linux-next/master bb1b0acdcd66e0d8eedee3570d249e076b89ab32]
>>
>> in testcase: ltp
>> version: ltp-x86_64-14c1f76-1_20240817
>> with following parameters:
>>
>> test: numa/move_pages04
>>
>>
>>
>> compiler: gcc-12
>> test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) with 256G memory
>>
>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>>
>>
>>
>>
>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <oliver.sang@intel.com>
>> | Closes: https://lore.kernel.org/oe-lkp/202408211026.636ade1a-oliver.sang@intel.com
>>
>>
>>
>> Running tests.......
>> <<<test_start>>>
>> tag=move_pages04 stime=1724192393
>> cmdline="move_pages04"
>> contacts=""
>> analysis=exit
>> <<<test_output>>>
>> move_pages04 1 TFAIL : move_pages04.c:142: status[1] is ENOENT, expected EFAULT
>
> This change is to be expected, and I touched on that in the patch
> description. I am rather surprised that we have a test for that
> handling, especially because it changed already (see below).
>
> The man page says:
>
> -EFAULT: This is a zero page or the memory area is not mapped
> by the process.
>
> "memory area not mapped" to me translates to "there is no mmap()", not
> "there is no page mapped". And it says:
>
> -ENOENT: The page is not present.
>
>
> It's not really specifies what happens when "The memory area is mapped
> by the process, but no page is faulted in.".
>
> And the old handling was even inconsistent: to achieve the old behavior,
> we abused FOLL_DUMP, which triggers in mm/gup.c:no_page_table():
>
> (1) is_vm_hugetlb_page() *and* a hugetlb page is in the pagecache?
> Return -EFAULT. Otherwise return NULL -> -ENOENT.
> (2) vma_is_anonymous() || !vma->vm_ops->fault ? Return -EFAULT.
> Otherwise return NULL -> -ENOENT.
>
> So, if nothing is mapped, for things like shmem we would always return
> "-ENOENT", for anonymous memory always "-EFAULT", and for hugetlb
> "-ENOENT" or "-EFAULT", depending on the page cache state. Inconsistent,
> and that handling is only in place because we abused FOLL_DUMP.
>
> (there are other issues in the old implementation: on PMD migration
> entries we would likely have returned -EFAULT in some cases where we
> should have returned -ENOENT ...)
>
> While writing folio_walk, I temporarily had a version that would return
> error codes instead of NULL to indicate "there is something, but we
> cannot return it" and "there is nothing", but it didn't feel right. And
> I'm not really interested in revisiting that :)
>
> ----
>
> Staring at the test, I realized the that behavior *changed* already,
> because we wanted to fix the "zero page" and started abusing FOLL_DUMP,
> but ended up changing the behavior for unpopulated (nothing mapped)
> memory as well:
>
> * NAME
> * move_pages04.c
> *
> * DESCRIPTION
> * Failure when page does not exit.
> *
> * ALGORITHM
> *
> * 1. Pass zero page (allocated, but not written to) as one of the
> * page addresses to move_pages().
> * 2. Check if the corresponding status is set to:
> * -ENOENT for kernels < 4.3
> * -EFAULT for kernels >= 4.3 [1]
> *
> * [1]
> * d899844e9c98 "mm: fix status code which move_pages() returns for
> zero page"
> *
>
> Likely test is *wrong*, because it claims to test the "zero page" but it
> just passes "unpopulated" memory.
>
> Let me dig deeper into the test.
Okay, and it even looks like the test caught the unintended change for
"unpopulated memory", but instead we decided to change the test to
expect the other return code ... because there was some confusion about
"zero page".
Long story short: the test needs to be fixed.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [LTP] [linux-next:master] [mm/migrate] b28dd7507f: ltp.move_pages04.fail
2024-08-21 9:15 ` David Hildenbrand
@ 2024-08-28 10:37 ` Cyril Hrubis
2024-08-28 10:51 ` David Hildenbrand
0 siblings, 1 reply; 11+ messages in thread
From: Cyril Hrubis @ 2024-08-28 10:37 UTC (permalink / raw)
To: David Hildenbrand
Cc: kernel test robot, Claudio Imbrenda, Ryan Roberts, lkp,
Vasily Gorbik, Jonathan Corbet, Kirill A. Shutemov,
Matthew Wilcox, Linux Memory Management List, Sven Schnelle,
Zi Yan, Gerald Schaefer, oe-lkp, Andrew Morton,
Christian Borntraeger, Alexander Gordeev, ltp, Janosch Frank
Hi!
> Okay, and it even looks like the test caught the unintended change for
> "unpopulated memory", but instead we decided to change the test to
> expect the other return code ... because there was some confusion about
> "zero page".
>
> Long story short: the test needs to be fixed.
Will do, but we need the patch to land into some kernel version first so
taht we can add the range of kernels where the kernel wrongly returns
EFAULT.
Or alternatively if you are going to backport this to stable trees we
can revert the test change that expect -EFAULT so the test expects only
-ENOENT.
--
Cyril Hrubis
chrubis@suse.cz
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [LTP] [linux-next:master] [mm/migrate] b28dd7507f: ltp.move_pages04.fail
2024-08-28 10:37 ` [LTP] " Cyril Hrubis
@ 2024-08-28 10:51 ` David Hildenbrand
2024-08-28 12:23 ` Cyril Hrubis
0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-08-28 10:51 UTC (permalink / raw)
To: Cyril Hrubis
Cc: kernel test robot, Claudio Imbrenda, Ryan Roberts, lkp,
Vasily Gorbik, Jonathan Corbet, Kirill A. Shutemov,
Matthew Wilcox, Linux Memory Management List, Sven Schnelle,
Zi Yan, Gerald Schaefer, oe-lkp, Andrew Morton,
Christian Borntraeger, Alexander Gordeev, ltp, Janosch Frank
On 28.08.24 12:37, Cyril Hrubis wrote:
> Hi!
>> Okay, and it even looks like the test caught the unintended change for
>> "unpopulated memory", but instead we decided to change the test to
>> expect the other return code ... because there was some confusion about
>> "zero page".
>>
>> Long story short: the test needs to be fixed.
>
> Will do, but we need the patch to land into some kernel version first so
> taht we can add the range of kernels where the kernel wrongly returns
> EFAULT.
>
> Or alternatively if you are going to backport this to stable trees we
> can revert the test change that expect -EFAULT so the test expects only
> -ENOENT.
>
I am not yet sure if we should simply allow either -EFAULT or -ENOENT for
the "nothing mapped" case in the check (below).
Alternatively, I agree, wee need to have this in the kernel so we can
check for versions.
What would be your preference?
Currently I have (WIP):
From c152535cdfae194819b6df450cdc29a60f8cdb8d Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Wed, 21 Aug 2024 10:58:34 +0200
Subject: [PATCH] move_pages04: properly check for "no page mapped" and "shared
zero page mapped"
While the kernel commit d899844e9c98 ("mm: fix status code which
move_pages() returns for zero page") fixed the return value when the
shared zero page was encountered to match what was state in the man page,
it unfortunately also changed the behavior when no page is mapped yet --
when no page was faulted in/populated on demand.
Then, this test started failing, and we thought we would be testing for
the "zero page" case, but actually we were testing for the "no page mapped"
yet case, and didn't realize that the kernel commit had unintended side
effects.
As we are changing the behavior back to return "-ENOENT" for the "no
page mapped" case, while still making keeping the shared zero page to
return "-EFAULT" the test starts failing again ...
The man page clearly spells out that the expectation for the zero page is
"-EFAULT", and that "-EFAULT" can also be returned if "the memory area is
not mapped by the process" -- which means that there is no VMA/mmap()
covering that address.
The man page isn't completely clear what the expected return value for the
"no page mapped" case is. It documents "-ENOENT" for "The page is not
present", which can be interpreted to include "there is nothing mapped"
and not just "there is something, it' just simply not suitable".
We'll clarify the man page soon, to be clearer that the expectation is
to get "-ENOENT" in that case, like the kernel originally did. But we'll
also add a note that some kernel versions will return either -ENOENT
or -EFAULT.
So let's test for both cases, and make sure we allow both -ENOENT and
-EFAULT for the "no page mapped" case.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
.../kernel/syscalls/move_pages/move_pages04.c | 82 ++++++++++++++-----
1 file changed, 61 insertions(+), 21 deletions(-)
diff --git a/testcases/kernel/syscalls/move_pages/move_pages04.c b/testcases/kernel/syscalls/move_pages/move_pages04.c
index f53453ab4..f51a73b6c 100644
--- a/testcases/kernel/syscalls/move_pages/move_pages04.c
+++ b/testcases/kernel/syscalls/move_pages/move_pages04.c
@@ -26,13 +26,16 @@
* move_pages04.c
*
* DESCRIPTION
- * Failure when page does not exit.
+ * Failure when no page is mapped or the shared zero page is mapped.
*
* ALGORITHM
*
- * 1. Pass zero page (allocated, but not written to) as one of the
- * page addresses to move_pages().
- * 2. Check if the corresponding status is set to:
+ * 1. Pass the address of a valid memory area where no page is mapped yet
+ * (not read/written) and the address of the shared zero page
+ * (read, but not written to) as page addresses to move_pages().
+ * 2. Check if the corresponding status for "no page mapped" is set to
+ * either -ENOENT or -EFAULT.
+ * 3. Check if the corresponding status for "shared zero page" is set to:
* -ENOENT for kernels < 4.3
* -EFAULT for kernels >= 4.3 [1]
*
@@ -64,10 +67,11 @@
#include "test.h"
#include "move_pages_support.h"
-#define TEST_PAGES 2
+#define TEST_PAGES 3
#define TEST_NODES 2
#define TOUCHED_PAGES 1
-#define UNTOUCHED_PAGE (TEST_PAGES - 1)
+#define NO_PAGE (TEST_PAGES - 1)
+#define ZERO_PAGE (NO_PAGE - 1)
void setup(void);
void cleanup(void);
@@ -89,12 +93,12 @@ int main(int argc, char **argv)
int lc;
unsigned int from_node;
unsigned int to_node;
- int ret, exp_status;
+ int ret, exp_zero_page_status;
if ((tst_kvercmp(4, 3, 0)) >= 0)
- exp_status = -EFAULT;
+ exp_zero_page_status = -EFAULT;
else
- exp_status = -ENOENT;
+ exp_zero_page_status = -ENOENT;
ret = get_allowed_nodes(NH_MEMS, 2, &from_node, &to_node);
if (ret < 0)
@@ -106,6 +110,7 @@ int main(int argc, char **argv)
int nodes[TEST_PAGES];
int status[TEST_PAGES];
unsigned long onepage = get_page_size();
+ char tmp;
/* reset tst_count in case we are looping */
tst_count = 0;
@@ -114,13 +119,30 @@ int main(int argc, char **argv)
if (ret == -1)
continue;
- /* Allocate page and do not touch it. */
- pages[UNTOUCHED_PAGE] = numa_alloc_onnode(onepage, from_node);
- if (pages[UNTOUCHED_PAGE] == NULL) {
- tst_resm(TBROK, "failed allocating page on node %d",
+ /*
+ * Allocate memory and do not touch it. Consequently, no
+ * page will be faulted in / mapped into the page tables.
+ */
+ pages[NO_PAGE] = numa_alloc_onnode(onepage, from_node);
+ if (pages[NO_PAGE] == NULL) {
+ tst_resm(TBROK, "failed allocating memory on node %d",
+ from_node);
+ goto err_free_pages;
+ }
+
+ /*
+ * Allocate memory, read from it, but do not write to it. This
+ * will populate the shared zeropage.
+ */
+ pages[ZERO_PAGE] = numa_alloc_onnode(onepage, from_node);
+ if (pages[ZERO_PAGE] == NULL) {
+ tst_resm(TBROK, "failed allocating memory on node %d",
from_node);
goto err_free_pages;
}
+ /* Make the compiler not optimize-out the read. */
+ tmp = *((char *)pages[ZERO_PAGE]);
+ asm volatile("" : "+r" (tmp));
for (i = 0; i < TEST_PAGES; i++)
nodes[i] = to_node;
@@ -135,20 +157,38 @@ int main(int argc, char **argv)
tst_resm(TINFO, "move_pages() returned %d", ret);
}
- if (status[UNTOUCHED_PAGE] == exp_status) {
+ switch (status[NO_PAGE]) {
+ case -ENOENT:
+ case -EFAULT:
+ /*
+ * Before 4.3, the kernel returned -ENOENT. With 4.3
+ * that behavior was changed by accident to return
+ * -EFAULT for some mapping types (including anonymous
+ * memory we use here). Newer kernels are expected to
+ * change that behavior again back to -ENOENT.
+ */
tst_resm(TPASS, "status[%d] has expected value",
- UNTOUCHED_PAGE);
+ NO_PAGE);
+ break;
+ default:
+ tst_resm(TFAIL, "status[%d] is %s, expected %s or %s",
+ NO_PAGE,
+ tst_strerrno(-status[NO_PAGE]),
+ tst_strerrno(ENOENT), tst_strerrno(EFAULT));
+ }
+
+ if (status[ZERO_PAGE] == exp_zero_page_status) {
+ tst_resm(TPASS, "status[%d] has expected value",
+ ZERO_PAGE);
} else {
tst_resm(TFAIL, "status[%d] is %s, expected %s",
- UNTOUCHED_PAGE,
- tst_strerrno(-status[UNTOUCHED_PAGE]),
- tst_strerrno(-exp_status));
+ ZERO_PAGE,
+ tst_strerrno(-status[ZERO_PAGE]),
+ tst_strerrno(-exp_zero_page_status));
}
err_free_pages:
- /* This is capable of freeing both the touched and
- * untouched pages.
- */
+ /* This is capable of freeing all memory we allocated. */
free_pages(pages, TEST_PAGES);
}
#else
--
2.46.0
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [LTP] [linux-next:master] [mm/migrate] b28dd7507f: ltp.move_pages04.fail
2024-08-28 10:51 ` David Hildenbrand
@ 2024-08-28 12:23 ` Cyril Hrubis
2024-08-28 12:28 ` David Hildenbrand
0 siblings, 1 reply; 11+ messages in thread
From: Cyril Hrubis @ 2024-08-28 12:23 UTC (permalink / raw)
To: David Hildenbrand
Cc: kernel test robot, Claudio Imbrenda, Ryan Roberts, lkp,
Vasily Gorbik, Jonathan Corbet, Kirill A. Shutemov,
Matthew Wilcox, Linux Memory Management List, Sven Schnelle,
Zi Yan, Gerald Schaefer, oe-lkp, Andrew Morton,
Christian Borntraeger, Alexander Gordeev, ltp, Janosch Frank
Hi!
> I am not yet sure if we should simply allow either -EFAULT or -ENOENT for
> the "nothing mapped" case in the check (below).
>
> Alternatively, I agree, wee need to have this in the kernel so we can
> check for versions.
>
> What would be your preference?
If we are going to stick to ENOENT for page that wasn't faulted in the
kernel from now on we should stick to it in the test as well.
Also I think there is a third case that we do not cover either, what
happens when we pass an address that is not mapped at all, e.g. NULL? Do
we get EFAULT as well?
--
Cyril Hrubis
chrubis@suse.cz
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [LTP] [linux-next:master] [mm/migrate] b28dd7507f: ltp.move_pages04.fail
2024-08-28 12:23 ` Cyril Hrubis
@ 2024-08-28 12:28 ` David Hildenbrand
2024-08-28 12:30 ` Cyril Hrubis
0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-08-28 12:28 UTC (permalink / raw)
To: Cyril Hrubis
Cc: kernel test robot, Claudio Imbrenda, Ryan Roberts, lkp,
Vasily Gorbik, Jonathan Corbet, Kirill A. Shutemov,
Matthew Wilcox, Linux Memory Management List, Sven Schnelle,
Zi Yan, Gerald Schaefer, oe-lkp, Andrew Morton,
Christian Borntraeger, Alexander Gordeev, ltp, Janosch Frank
On 28.08.24 14:23, Cyril Hrubis wrote:
> Hi!
>> I am not yet sure if we should simply allow either -EFAULT or -ENOENT for
>> the "nothing mapped" case in the check (below).
>>
>> Alternatively, I agree, wee need to have this in the kernel so we can
>> check for versions.
>>
>> What would be your preference?
>
> If we are going to stick to ENOENT for page that wasn't faulted in the
> kernel from now on we should stick to it in the test as well.
Right, it will make kernels >= 4.3 fail, though, until this series is
upstream. I mean, it highlights a BUG, but we had a similar condition
with the zeropage and worked around it in the test to keep it passing.
>
> Also I think there is a third case that we do not cover either, what
> happens when we pass an address that is not mapped at all, e.g. NULL? Do
> we get EFAULT as well?
Yes, that's documented as EFAULT and should behave that way. I can
extend the test to handle that as well.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [LTP] [linux-next:master] [mm/migrate] b28dd7507f: ltp.move_pages04.fail
2024-08-28 12:28 ` David Hildenbrand
@ 2024-08-28 12:30 ` Cyril Hrubis
2024-08-29 13:49 ` David Hildenbrand
0 siblings, 1 reply; 11+ messages in thread
From: Cyril Hrubis @ 2024-08-28 12:30 UTC (permalink / raw)
To: David Hildenbrand
Cc: kernel test robot, Claudio Imbrenda, Ryan Roberts, lkp,
Vasily Gorbik, Jonathan Corbet, Kirill A. Shutemov,
Matthew Wilcox, Linux Memory Management List, Sven Schnelle,
Zi Yan, Gerald Schaefer, oe-lkp, Andrew Morton,
Christian Borntraeger, Alexander Gordeev, ltp, Janosch Frank
Hi!
> > If we are going to stick to ENOENT for page that wasn't faulted in the
> > kernel from now on we should stick to it in the test as well.
>
> Right, it will make kernels >= 4.3 fail, though, until this series is
> upstream. I mean, it highlights a BUG, but we had a similar condition
> with the zeropage and worked around it in the test to keep it passing.
What we do in this case is that you are free to send a patch, we will do
review but the final merge will happen once the code has been released
in the upstream kernel.
> > Also I think there is a third case that we do not cover either, what
> > happens when we pass an address that is not mapped at all, e.g. NULL? Do
> > we get EFAULT as well?
>
> Yes, that's documented as EFAULT and should behave that way. I can
> extend the test to handle that as well.
Ideally the test should be ported to the new test API as well, but I can
do that later on the top of your work.
--
Cyril Hrubis
chrubis@suse.cz
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [LTP] [linux-next:master] [mm/migrate] b28dd7507f: ltp.move_pages04.fail
2024-08-28 12:30 ` Cyril Hrubis
@ 2024-08-29 13:49 ` David Hildenbrand
2024-08-29 14:31 ` Cyril Hrubis
0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-08-29 13:49 UTC (permalink / raw)
To: Cyril Hrubis
Cc: kernel test robot, Claudio Imbrenda, Ryan Roberts, lkp,
Vasily Gorbik, Jonathan Corbet, Kirill A. Shutemov,
Matthew Wilcox, Linux Memory Management List, Sven Schnelle,
Zi Yan, Gerald Schaefer, oe-lkp, Andrew Morton,
Christian Borntraeger, Alexander Gordeev, ltp, Janosch Frank
On 28.08.24 14:30, Cyril Hrubis wrote:
> Hi!
>>> If we are going to stick to ENOENT for page that wasn't faulted in the
>>> kernel from now on we should stick to it in the test as well.
>>
>> Right, it will make kernels >= 4.3 fail, though, until this series is
>> upstream. I mean, it highlights a BUG, but we had a similar condition
>> with the zeropage and worked around it in the test to keep it passing.
>
> What we do in this case is that you are free to send a patch, we will do
> review but the final merge will happen once the code has been released
> in the upstream kernel.
Yes, let me send something out for discussion.
>
>>> Also I think there is a third case that we do not cover either, what
>>> happens when we pass an address that is not mapped at all, e.g. NULL? Do
>>> we get EFAULT as well?
>>
>> Yes, that's documented as EFAULT and should behave that way. I can
>> extend the test to handle that as well.
>
> Ideally the test should be ported to the new test API as well, but I can
> do that later on the top of your work.
I tried, and it all looked easy, until I realized that these tests use a
share code-base:
testcases/kernel/syscalls/move_pages/move_pages_support.c
That is also written using the old API. I assume mixing APIs might not
work as expected ...
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [LTP] [linux-next:master] [mm/migrate] b28dd7507f: ltp.move_pages04.fail
2024-08-29 13:49 ` David Hildenbrand
@ 2024-08-29 14:31 ` Cyril Hrubis
2024-08-29 14:38 ` David Hildenbrand
0 siblings, 1 reply; 11+ messages in thread
From: Cyril Hrubis @ 2024-08-29 14:31 UTC (permalink / raw)
To: David Hildenbrand
Cc: kernel test robot, Claudio Imbrenda, Ryan Roberts, lkp,
Vasily Gorbik, Jonathan Corbet, Kirill A. Shutemov,
Matthew Wilcox, Linux Memory Management List, Sven Schnelle,
Zi Yan, Gerald Schaefer, oe-lkp, Andrew Morton,
Christian Borntraeger, Alexander Gordeev, ltp, Janosch Frank
Hi!
> >>> Also I think there is a third case that we do not cover either, what
> >>> happens when we pass an address that is not mapped at all, e.g. NULL? Do
> >>> we get EFAULT as well?
> >>
> >> Yes, that's documented as EFAULT and should behave that way. I can
> >> extend the test to handle that as well.
> >
> > Ideally the test should be ported to the new test API as well, but I can
> > do that later on the top of your work.
>
> I tried, and it all looked easy, until I realized that these tests use a
> share code-base:
> testcases/kernel/syscalls/move_pages/move_pages_support.c
>
> That is also written using the old API. I assume mixing APIs might not
> work as expected ...
The tst_resm() and tst_brkm() calls are redirected proplery in the case
that the test runs with a new API so generally it should work fine as
long as the cleanup callback is set to NULL for the tst_brkm(). That
was one of the design decision we took years ago, because we knew that
we are not going to reimplement thousands of tests instantly and that
the old and new API would have to live alongside for a decade.
I glanced over the code and I do not see anything in the
move_paves_support.c or in the kernel/lib/numa_helper.c that would break
when executed under the new library test.
--
Cyril Hrubis
chrubis@suse.cz
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [LTP] [linux-next:master] [mm/migrate] b28dd7507f: ltp.move_pages04.fail
2024-08-29 14:31 ` Cyril Hrubis
@ 2024-08-29 14:38 ` David Hildenbrand
0 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-08-29 14:38 UTC (permalink / raw)
To: Cyril Hrubis
Cc: kernel test robot, Claudio Imbrenda, Ryan Roberts, lkp,
Vasily Gorbik, Jonathan Corbet, Kirill A. Shutemov,
Matthew Wilcox, Linux Memory Management List, Sven Schnelle,
Zi Yan, Gerald Schaefer, oe-lkp, Andrew Morton,
Christian Borntraeger, Alexander Gordeev, ltp, Janosch Frank
On 29.08.24 16:31, Cyril Hrubis wrote:
> Hi!
>>>>> Also I think there is a third case that we do not cover either, what
>>>>> happens when we pass an address that is not mapped at all, e.g. NULL? Do
>>>>> we get EFAULT as well?
>>>>
>>>> Yes, that's documented as EFAULT and should behave that way. I can
>>>> extend the test to handle that as well.
>>>
>>> Ideally the test should be ported to the new test API as well, but I can
>>> do that later on the top of your work.
>>
>> I tried, and it all looked easy, until I realized that these tests use a
>> share code-base:
>> testcases/kernel/syscalls/move_pages/move_pages_support.c
>>
>> That is also written using the old API. I assume mixing APIs might not
>> work as expected ...
>
> The tst_resm() and tst_brkm() calls are redirected proplery in the case
> that the test runs with a new API so generally it should work fine as
> long as the cleanup callback is set to NULL for the tst_brkm(). That
> was one of the design decision we took years ago, because we knew that
> we are not going to reimplement thousands of tests instantly and that
> the old and new API would have to live alongside for a decade.
Ah, the "NULL" is the magic bit.
>
> I glanced over the code and I do not see anything in the
> move_paves_support.c or in the kernel/lib/numa_helper.c that would break
> when executed under the new library test.
Okay, good. I can send a follow up to convert that file once the
fix/extension was reviewed.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-08-29 14:38 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-21 4:44 [linux-next:master] [mm/migrate] b28dd7507f: ltp.move_pages04.fail kernel test robot
2024-08-21 8:44 ` David Hildenbrand
2024-08-21 9:15 ` David Hildenbrand
2024-08-28 10:37 ` [LTP] " Cyril Hrubis
2024-08-28 10:51 ` David Hildenbrand
2024-08-28 12:23 ` Cyril Hrubis
2024-08-28 12:28 ` David Hildenbrand
2024-08-28 12:30 ` Cyril Hrubis
2024-08-29 13:49 ` David Hildenbrand
2024-08-29 14:31 ` Cyril Hrubis
2024-08-29 14:38 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox