* Re: [PATCH] tools/mm: Add madvise tool
2025-09-04 17:57 [PATCH] tools/mm: Add madvise tool kirill
@ 2025-09-04 19:07 ` Liam R. Howlett
2025-09-05 3:26 ` wang lian
` (6 subsequent siblings)
7 siblings, 0 replies; 24+ messages in thread
From: Liam R. Howlett @ 2025-09-04 19:07 UTC (permalink / raw)
To: kirill
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, Usama Arif, Kiryl Shutsemau
* kirill@shutemov.name <kirill@shutemov.name> [250904 13:57]:
> From: Kiryl Shutsemau <kas@kernel.org>
>
> Add a simple tool that allows to issue an advice on a process or a file.
>
> It can be useful to experiment with effects of an advice on a workload
> without modifying the workload itself.
>
> Only supports advices available for process_madvise().
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Nice, thanks!
Should this go in MAINTAINERS?
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> tools/mm/.gitignore | 4 +-
> tools/mm/Makefile | 2 +-
> tools/mm/madvise.c | 170 ++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 174 insertions(+), 2 deletions(-)
> create mode 100644 tools/mm/madvise.c
>
> diff --git a/tools/mm/.gitignore b/tools/mm/.gitignore
> index 922879f93fc8..b713fcf4a2e0 100644
> --- a/tools/mm/.gitignore
> +++ b/tools/mm/.gitignore
> @@ -1,4 +1,6 @@
> # SPDX-License-Identifier: GPL-2.0-only
> -slabinfo
> +madvise
> page-types
> page_owner_sort
> +slabinfo
> +thp_swap_allocator_test
> diff --git a/tools/mm/Makefile b/tools/mm/Makefile
> index f5725b5c23aa..db315a48adcd 100644
> --- a/tools/mm/Makefile
> +++ b/tools/mm/Makefile
> @@ -3,7 +3,7 @@
> #
> include ../scripts/Makefile.include
>
> -BUILD_TARGETS=page-types slabinfo page_owner_sort thp_swap_allocator_test
> +BUILD_TARGETS= madvise page-types page_owner_sort slabinfo thp_swap_allocator_test
> INSTALL_TARGETS = $(BUILD_TARGETS) thpmaps
>
> LIB_DIR = ../lib/api
> diff --git a/tools/mm/madvise.c b/tools/mm/madvise.c
> new file mode 100644
> index 000000000000..038b3e1076ea
> --- /dev/null
> +++ b/tools/mm/madvise.c
> @@ -0,0 +1,170 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define _GNU_SOURCE
> +#include <fcntl.h>
> +#include <limits.h>
> +#include <stdarg.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <sys/mman.h>
> +#include <sys/syscall.h>
> +
> +static void usage(void)
> +{
> + printf("madvise TARGET ADVICE START END\n\n");
> + printf("Arguments:\n");
> + printf("\t<TARGET>\n");
> + printf("\t\tA process ID or a file to give the advice to.\n\n");
> + printf("\t\tUse \"./\" prefix if the file name is all digits.\n\n");
> + printf("\t<ADVICE>\n");
> + printf("\t\tcold\t\t- Deactivate a given range of pages\n");
> + printf("\t\tcollapse\t- Collapse pages in a given range into THPs\n");
> + printf("\t\tpageout\t\t- Reclaim a given range of pages\n");
> + printf("\t\twillneed\t- The specified data will be accessed in the near future\n");
> + printf("\n\t\tSee madvise(2) for more details.\n\n");
> + printf("\t<START>/<END>\n");
> + printf("\t\tStart and end addressed for the advice. Must be page-aligned.\n\n");
> + printf("\t\tFor PID case, it is addresses in the target process address space.\n\n");
> + printf("\t\tFor file case, it is offsets in the file.\n\n");
> +}
> +
> +static void error(const char *fmt, ...)
> +{
> + if (fmt) {
> + va_list argp;
> +
> + va_start(argp, fmt);
> + vfprintf(stderr, fmt, argp);
> + va_end(argp);
> + printf("\n");
> + }
> +
> + usage();
> + exit(-1);
> +}
> +
> +#define PMD_SIZE_FILE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
> +static unsigned long read_pmd_pagesize(void)
> +{
> + int fd;
> + char buf[20];
> + ssize_t num_read;
> +
> + fd = open(PMD_SIZE_FILE_PATH, O_RDONLY);
> + if (fd == -1)
> + return 0;
> +
> + num_read = read(fd, buf, 19);
> + if (num_read < 1) {
> + close(fd);
> + return 0;
> + }
> + buf[num_read] = '\0';
> + close(fd);
> +
> + return strtoul(buf, NULL, 10);
> +}
> +
> +static int pidfd_open(pid_t pid, unsigned int flags)
> +{
> + return syscall(SYS_pidfd_open, pid, flags);
> +}
> +
> +int main(int argc, const char *argv[])
> +{
> + unsigned long pid, start, end, page_size;
> + int advice;
> + char *err;
> + int fd;
> +
> + if (argc != 5)
> + error(NULL);
> +
> + pid = strtoul(argv[1], &err, 10);
> + if (*err || err == argv[1] ||
> + pid > INT_MAX || (pid_t)pid <= 0) {
> + // Not a PID, assume argv[1] is a file name
> + pid = 0;
> + }
> +
> + if (pid) {
> + fd = pidfd_open(pid, 0);
> + if (fd < 0)
> + perror("pidfd_open()"), exit(-1);
> + } else {
> + fd = open(argv[1], O_RDWR);
> + if (fd < 0)
> + perror("open"), exit(-1);
> + }
> +
> + if (!strcmp(argv[2], "cold"))
> + advice = MADV_COLD;
> + else if (!strcmp(argv[2], "collapse"))
> + advice = MADV_COLLAPSE;
> + else if (!strcmp(argv[2], "pageout"))
> + advice = MADV_PAGEOUT;
> + else if (!strcmp(argv[2], "willneed"))
> + advice = MADV_WILLNEED;
> + else
> + error("Unknown advice: %s\n", argv[2]);
> +
> + page_size = sysconf(_SC_PAGE_SIZE);
> +
> + start = strtoul(argv[3], &err, 0);
> + if (*err || err == argv[3])
> + error("Cannot parse start address\n");
> + if (start % page_size)
> + error("Start address is not aligned to page size\n");
> + end = strtoul(argv[4], &err, 0);
> + if (*err || err == argv[4])
> + error("Cannot parse end address\n");
> + if (end % page_size)
> + error("End address is not aligned to page size\n");
> +
> + if (pid) {
> + struct iovec vec = {
> + .iov_base = (void *)start,
> + .iov_len = end - start,
> + };
> + ssize_t ret;
> +
> + ret = process_madvise(fd, &vec, 1, advice, 0);
> + if (ret < 0)
> + perror("process_madvise"), exit(-1);
> +
> + if ((unsigned long)ret != end - start)
> + printf("Partial advice occurred. Stopped at %#lx\n", start + ret);
> + } else {
> + unsigned long addr, hpage_pmd_size;
> + void *p;
> + int ret;
> +
> + hpage_pmd_size = read_pmd_pagesize();
> + if (!hpage_pmd_size) {
> + printf("Reading PMD pagesize failed");
> + exit(-1);
> + }
> +
> + // Allocate virtual address space to align the target mmap to PMD size
> + // Some advices require this.
> + p = mmap(NULL, end - start + hpage_pmd_size, PROT_NONE,
> + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> + if (p == MAP_FAILED)
> + perror("mmap0"), exit(-1);
> + addr = (unsigned long)p;
> + addr += hpage_pmd_size - 1;
> + addr &= ~(hpage_pmd_size - 1);
> +
> + p = mmap((void *)addr, end - start,
> + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED | MAP_POPULATE, fd, start);
> + if (p == MAP_FAILED)
> + perror("mmap"), exit(-1);
> +
> + ret = madvise(p, end - start, advice);
> + if (ret)
> + perror("madvise"), exit(-1);
> + }
> +
> + return 0;
> +}
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] tools/mm: Add madvise tool
2025-09-04 17:57 [PATCH] tools/mm: Add madvise tool kirill
2025-09-04 19:07 ` Liam R. Howlett
@ 2025-09-05 3:26 ` wang lian
2025-09-05 10:28 ` Kiryl Shutsemau
2025-09-05 9:17 ` Michal Hocko
` (5 subsequent siblings)
7 siblings, 1 reply; 24+ messages in thread
From: wang lian @ 2025-09-05 3:26 UTC (permalink / raw)
To: kirill
Cc: Liam.Howlett, akpm, david, kas, linux-mm, lorenzo.stoakes,
mhocko, rppt, surenb, usamaarif642, vbabka, wang lian
> Add a simple tool that allows to issue an advice on a process or a file.
> It can be useful to experiment with effects of an advice on a workload
> without modifying the workload itself.
> Only supports advices available for process_madvise().
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> ---
...
> +#define PMD_SIZE_FILE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
Hi Kiryl,
Thanks for this patch, this is a very nice and useful tool.
I just have some super minor nitpick on coding style:
(a) Maybe move the #define PMD_SIZE_FILE_PATH to the top of the file,
right after the #include statements?
(b) use consistent comment style like this /* */
(c) when i test, when process_madvise test fail it give some message,
i think it should add into in the prompt message ,like this
> printf("madvise TARGET ADVICE START END\n\n");
printf("should run as root\n\n");
printf("\t\tStart and end addressed for the advice. Must be page-aligned and valid target pid address.\n\n");
Other than that, the patch looks great to me.
LGTM!
Reviewed-by: wang lian <lianux.mm@gmail.com>
Tested-by: wang lian <lianux.mm@gmail.com>
Best regards,
wang lian
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] tools/mm: Add madvise tool
2025-09-05 3:26 ` wang lian
@ 2025-09-05 10:28 ` Kiryl Shutsemau
2025-09-08 1:09 ` wang lian
0 siblings, 1 reply; 24+ messages in thread
From: Kiryl Shutsemau @ 2025-09-05 10:28 UTC (permalink / raw)
To: wang lian
Cc: Liam.Howlett, akpm, david, linux-mm, lorenzo.stoakes, mhocko,
rppt, surenb, usamaarif642, vbabka
On Fri, Sep 05, 2025 at 11:26:13AM +0800, wang lian wrote:
>
> > Add a simple tool that allows to issue an advice on a process or a file.
>
> > It can be useful to experiment with effects of an advice on a workload
> > without modifying the workload itself.
>
> > Only supports advices available for process_madvise().
>
> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> > ---
>
> ...
> > +#define PMD_SIZE_FILE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
>
> Hi Kiryl,
>
> Thanks for this patch, this is a very nice and useful tool.
>
> I just have some super minor nitpick on coding style:
> (a) Maybe move the #define PMD_SIZE_FILE_PATH to the top of the file,
> right after the #include statements?
Maybe. Will do if v2 is required.
> (b) use consistent comment style like this /* */
See answer to David.
> (c) when i test, when process_madvise test fail it give some message,
> i think it should add into in the prompt message ,like this
> > printf("madvise TARGET ADVICE START END\n\n");
> printf("should run as root\n\n");
> printf("\t\tStart and end addressed for the advice. Must be page-aligned and valid target pid address.\n\n");
It doesn't need root. If your target is PID, it requires the target
process to be ptraceable for you. Whether it is or not depending on your
setup.
> Other than that, the patch looks great to me.
> LGTM!
> Reviewed-by: wang lian <lianux.mm@gmail.com>
> Tested-by: wang lian <lianux.mm@gmail.com>
>
> Best regards,
> wang lian
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] tools/mm: Add madvise tool
2025-09-05 10:28 ` Kiryl Shutsemau
@ 2025-09-08 1:09 ` wang lian
0 siblings, 0 replies; 24+ messages in thread
From: wang lian @ 2025-09-08 1:09 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Liam.Howlett, akpm, david, linux-mm, lorenzo.stoakes, mhocko,
rppt, surenb, usamaarif642, vbabka
[-- Attachment #1: Type: text/plain, Size: 1823 bytes --]
> On Sep 5, 2025, at 18:28, Kiryl Shutsemau <kirill@shutemov.name> wrote:
>
> On Fri, Sep 05, 2025 at 11:26:13AM +0800, wang lian wrote:
>>
>>> Add a simple tool that allows to issue an advice on a process or a file.
>>
>>> It can be useful to experiment with effects of an advice on a workload
>>> without modifying the workload itself.
>>
>>> Only supports advices available for process_madvise().
>>
>>> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
>>> ---
>>
>> ...
>>> +#define PMD_SIZE_FILE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
>>
>> Hi Kiryl,
>>
>> Thanks for this patch, this is a very nice and useful tool.
>>
>> I just have some super minor nitpick on coding style:
>> (a) Maybe move the #define PMD_SIZE_FILE_PATH to the top of the file,
>> right after the #include statements?
>
> Maybe. Will do if v2 is required.
>
>> (b) use consistent comment style like this /* */
>
> See answer to David.
>
>> (c) when i test, when process_madvise test fail it give some message,
>> i think it should add into in the prompt message ,like this
>>> printf("madvise TARGET ADVICE START END\n\n");
>> printf("should run as root\n\n");
>> printf("\t\tStart and end addressed for the advice. Must be page-aligned and valid target pid address.\n\n");
>
> It doesn't need root. If your target is PID, it requires the target
> process to be ptraceable for you. Whether it is or not depending on your
> setup.
Thanks for your clarify.
>
>> Other than that, the patch looks great to me.
>> LGTM!
>> Reviewed-by: wang lian <lianux.mm@gmail.com <mailto:lianux.mm@gmail.com>>
>> Tested-by: wang lian <lianux.mm@gmail.com <mailto:lianux.mm@gmail.com>>
>>
>> Best regards,
>> wang lian
>
> --
> Kiryl Shutsemau / Kirill A. Shutemov
[-- Attachment #2: Type: text/html, Size: 11542 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] tools/mm: Add madvise tool
2025-09-04 17:57 [PATCH] tools/mm: Add madvise tool kirill
2025-09-04 19:07 ` Liam R. Howlett
2025-09-05 3:26 ` wang lian
@ 2025-09-05 9:17 ` Michal Hocko
2025-09-05 10:16 ` Lorenzo Stoakes
2025-09-05 10:21 ` Kiryl Shutsemau
2025-09-05 10:19 ` David Hildenbrand
` (4 subsequent siblings)
7 siblings, 2 replies; 24+ messages in thread
From: Michal Hocko @ 2025-09-05 9:17 UTC (permalink / raw)
To: kirill
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, linux-mm, Usama Arif, Kiryl Shutsemau
On Thu 04-09-25 18:57:29, Kirill A. Shutemov wrote:
> From: Kiryl Shutsemau <kas@kernel.org>
>
> Add a simple tool that allows to issue an advice on a process or a file.
>
> It can be useful to experiment with effects of an advice on a workload
> without modifying the workload itself.
Is there any reason to have this in the tree? This seems like a very
trivial tools that doesn't really need to be in a lockstep with the
kernel source.
Also would it make more sense to send the pidfd fd to the tool directly
so that it can benefit from a racefree pid->fd translation?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] tools/mm: Add madvise tool
2025-09-05 9:17 ` Michal Hocko
@ 2025-09-05 10:16 ` Lorenzo Stoakes
2025-09-05 10:56 ` Michal Hocko
2025-09-05 10:21 ` Kiryl Shutsemau
1 sibling, 1 reply; 24+ messages in thread
From: Lorenzo Stoakes @ 2025-09-05 10:16 UTC (permalink / raw)
To: Michal Hocko
Cc: kirill, Andrew Morton, David Hildenbrand, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, linux-mm,
Usama Arif, Kiryl Shutsemau
On Fri, Sep 05, 2025 at 11:17:10AM +0200, Michal Hocko wrote:
>
> Is there any reason to have this in the tree? This seems like a very
> trivial tools that doesn't really need to be in a lockstep with the
> kernel source.
To give my own 2 pence on this :>)
I think this is actually quite useful because it invokes process_madvise() and
so can be used for experiments against other processes in a way that'd you'd
have to write code for otherwise.
I think it's useful to keep in sync with kernel as if we add more remote process
madvise functionality this can be updated with it.
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] tools/mm: Add madvise tool
2025-09-05 10:16 ` Lorenzo Stoakes
@ 2025-09-05 10:56 ` Michal Hocko
2025-09-05 11:02 ` Lorenzo Stoakes
0 siblings, 1 reply; 24+ messages in thread
From: Michal Hocko @ 2025-09-05 10:56 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: kirill, Andrew Morton, David Hildenbrand, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, linux-mm,
Usama Arif, Kiryl Shutsemau
On Fri 05-09-25 11:16:38, Lorenzo Stoakes wrote:
> On Fri, Sep 05, 2025 at 11:17:10AM +0200, Michal Hocko wrote:
> >
> > Is there any reason to have this in the tree? This seems like a very
> > trivial tools that doesn't really need to be in a lockstep with the
> > kernel source.
>
> To give my own 2 pence on this :>)
>
> I think this is actually quite useful because it invokes process_madvise() and
> so can be used for experiments against other processes in a way that'd you'd
> have to write code for otherwise.
I am not questioning usefulness of the tool itself.
> I think it's useful to keep in sync with kernel as if we add more remote process
> madvise functionality this can be updated with it.
I just do not see any value in having it in the tree. It doesn't use (or
rely on) any kernel infrastructure and it can happily live outside of
the tree. New madvise functionality requires a runtime support rather
than a support in the current source tree.
We are certainly not pulling in other potentially useful tools
into the kernel tree in for some it was even really hard/impossible to
be merged yet they would benefit from a close lockstep with kernel
sources.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] tools/mm: Add madvise tool
2025-09-05 10:56 ` Michal Hocko
@ 2025-09-05 11:02 ` Lorenzo Stoakes
0 siblings, 0 replies; 24+ messages in thread
From: Lorenzo Stoakes @ 2025-09-05 11:02 UTC (permalink / raw)
To: Michal Hocko
Cc: kirill, Andrew Morton, David Hildenbrand, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, linux-mm,
Usama Arif, Kiryl Shutsemau
On Fri, Sep 05, 2025 at 12:56:24PM +0200, Michal Hocko wrote:
> I am not questioning usefulness of the tool itself.
>
> > I think it's useful to keep in sync with kernel as if we add more remote process
> > madvise functionality this can be updated with it.
>
> I just do not see any value in having it in the tree. It doesn't use (or
> rely on) any kernel infrastructure and it can happily live outside of
> the tree. New madvise functionality requires a runtime support rather
> than a support in the current source tree.
True, and actually given the fact it's using standard headers it's actually
reliant on the host system knowing about new MADV_xxx anyway.
So actually on reflection no lockstep is required at all here.
>
> We are certainly not pulling in other potentially useful tools
> into the kernel tree in for some it was even really hard/impossible to
> be merged yet they would benefit from a close lockstep with kernel
> sources.
Yeah indeed.
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] tools/mm: Add madvise tool
2025-09-05 9:17 ` Michal Hocko
2025-09-05 10:16 ` Lorenzo Stoakes
@ 2025-09-05 10:21 ` Kiryl Shutsemau
2025-09-05 11:02 ` Michal Hocko
1 sibling, 1 reply; 24+ messages in thread
From: Kiryl Shutsemau @ 2025-09-05 10:21 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, linux-mm, Usama Arif
On Fri, Sep 05, 2025 at 11:17:10AM +0200, Michal Hocko wrote:
> On Thu 04-09-25 18:57:29, Kirill A. Shutemov wrote:
> > From: Kiryl Shutsemau <kas@kernel.org>
> >
> > Add a simple tool that allows to issue an advice on a process or a file.
> >
> > It can be useful to experiment with effects of an advice on a workload
> > without modifying the workload itself.
>
> Is there any reason to have this in the tree? This seems like a very
> trivial tools that doesn't really need to be in a lockstep with the
> kernel source.
There's no strong reason to move it in lockstep with kernel code. At
least for now.
The tool is useful to experiment with both kernel and workload behaviour
on an advice. Like, I see some oddities with MADV_COLLAPSE that makes it
give up easily where it shouldn't (will investigate later). I wouldn't
see them otherwise.
But you don't see it fitting kernel tree, I would not push back much.
I found the tool useful and decided to share.
> Also would it make more sense to send the pidfd fd to the tool directly
> so that it can benefit from a racefree pid->fd translation?
Hm. Do we have an easy way to open pidfd from shell?
I am not sure we care that much about the race for this tool.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] tools/mm: Add madvise tool
2025-09-05 10:21 ` Kiryl Shutsemau
@ 2025-09-05 11:02 ` Michal Hocko
0 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2025-09-05 11:02 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, linux-mm, Usama Arif
On Fri 05-09-25 11:21:30, Kirill A. Shutemov wrote:
> On Fri, Sep 05, 2025 at 11:17:10AM +0200, Michal Hocko wrote:
[...]
> > Also would it make more sense to send the pidfd fd to the tool directly
> > so that it can benefit from a racefree pid->fd translation?
>
> Hm. Do we have an easy way to open pidfd from shell?
Not that I am aware of.
> I am not sure we care that much about the race for this tool.
Fair enough.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] tools/mm: Add madvise tool
2025-09-04 17:57 [PATCH] tools/mm: Add madvise tool kirill
` (2 preceding siblings ...)
2025-09-05 9:17 ` Michal Hocko
@ 2025-09-05 10:19 ` David Hildenbrand
2025-09-05 10:24 ` Kiryl Shutsemau
2025-09-05 10:37 ` Lorenzo Stoakes
` (3 subsequent siblings)
7 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2025-09-05 10:19 UTC (permalink / raw)
To: kirill, Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm
Cc: Usama Arif, Kiryl Shutsemau
On 04.09.25 19:57, kirill@shutemov.name wrote:
> From: Kiryl Shutsemau <kas@kernel.org>
>
> Add a simple tool that allows to issue an advice on a process or a file.
>
> It can be useful to experiment with effects of an advice on a workload
> without modifying the workload itself.
>
> Only supports advices available for process_madvise().
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> ---
> + if (pid) {
> + struct iovec vec = {
> + .iov_base = (void *)start,
> + .iov_len = end - start,
> + };
> + ssize_t ret;
> +
> + ret = process_madvise(fd, &vec, 1, advice, 0);
> + if (ret < 0)
> + perror("process_madvise"), exit(-1);
> +
> + if ((unsigned long)ret != end - start)
> + printf("Partial advice occurred. Stopped at %#lx\n", start + ret);
> + } else {
> + unsigned long addr, hpage_pmd_size;
> + void *p;
> + int ret;
> +
> + hpage_pmd_size = read_pmd_pagesize();
> + if (!hpage_pmd_size) {
> + printf("Reading PMD pagesize failed");
> + exit(-1);
> + }
> +
> + // Allocate virtual address space to align the target mmap to PMD size
> + // Some advices require this.
I assume the kernel coding-style applies to tools/ as well, so
/*
* ...
*/
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] tools/mm: Add madvise tool
2025-09-05 10:19 ` David Hildenbrand
@ 2025-09-05 10:24 ` Kiryl Shutsemau
2025-09-05 10:33 ` David Hildenbrand
0 siblings, 1 reply; 24+ messages in thread
From: Kiryl Shutsemau @ 2025-09-05 10:24 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, Usama Arif
On Fri, Sep 05, 2025 at 12:19:11PM +0200, David Hildenbrand wrote:
> On 04.09.25 19:57, kirill@shutemov.name wrote:
> > From: Kiryl Shutsemau <kas@kernel.org>
> >
> > Add a simple tool that allows to issue an advice on a process or a file.
> >
> > It can be useful to experiment with effects of an advice on a workload
> > without modifying the workload itself.
> >
> > Only supports advices available for process_madvise().
> >
> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> > ---
>
> > + if (pid) {
> > + struct iovec vec = {
> > + .iov_base = (void *)start,
> > + .iov_len = end - start,
> > + };
> > + ssize_t ret;
> > +
> > + ret = process_madvise(fd, &vec, 1, advice, 0);
> > + if (ret < 0)
> > + perror("process_madvise"), exit(-1);
> > +
> > + if ((unsigned long)ret != end - start)
> > + printf("Partial advice occurred. Stopped at %#lx\n", start + ret);
> > + } else {
> > + unsigned long addr, hpage_pmd_size;
> > + void *p;
> > + int ret;
> > +
> > + hpage_pmd_size = read_pmd_pagesize();
> > + if (!hpage_pmd_size) {
> > + printf("Reading PMD pagesize failed");
> > + exit(-1);
> > + }
> > +
> > + // Allocate virtual address space to align the target mmap to PMD size
> > + // Some advices require this.
>
> I assume the kernel coding-style applies to tools/ as well, so
>
> /*
> * ...
> */
I thought C99 comments are fine now, no? Like slab_common.c uses them.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] tools/mm: Add madvise tool
2025-09-05 10:24 ` Kiryl Shutsemau
@ 2025-09-05 10:33 ` David Hildenbrand
2025-09-05 10:35 ` Kiryl Shutsemau
0 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2025-09-05 10:33 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, Usama Arif
On 05.09.25 12:24, Kiryl Shutsemau wrote:
> On Fri, Sep 05, 2025 at 12:19:11PM +0200, David Hildenbrand wrote:
>> On 04.09.25 19:57, kirill@shutemov.name wrote:
>>> From: Kiryl Shutsemau <kas@kernel.org>
>>>
>>> Add a simple tool that allows to issue an advice on a process or a file.
>>>
>>> It can be useful to experiment with effects of an advice on a workload
>>> without modifying the workload itself.
>>>
>>> Only supports advices available for process_madvise().
>>>
>>> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
>>> ---
>>
>>> + if (pid) {
>>> + struct iovec vec = {
>>> + .iov_base = (void *)start,
>>> + .iov_len = end - start,
>>> + };
>>> + ssize_t ret;
>>> +
>>> + ret = process_madvise(fd, &vec, 1, advice, 0);
>>> + if (ret < 0)
>>> + perror("process_madvise"), exit(-1);
>>> +
>>> + if ((unsigned long)ret != end - start)
>>> + printf("Partial advice occurred. Stopped at %#lx\n", start + ret);
>>> + } else {
>>> + unsigned long addr, hpage_pmd_size;
>>> + void *p;
>>> + int ret;
>>> +
>>> + hpage_pmd_size = read_pmd_pagesize();
>>> + if (!hpage_pmd_size) {
>>> + printf("Reading PMD pagesize failed");
>>> + exit(-1);
>>> + }
>>> +
>>> + // Allocate virtual address space to align the target mmap to PMD size
>>> + // Some advices require this.
>>
>> I assume the kernel coding-style applies to tools/ as well, so
>>
>> /*
>> * ...
>> */
>
> I thought C99 comments are fine now, no? Like slab_common.c uses them.
Was there a recent discussion around that?
Coding style says "The preferred style for long (multi-line) comments is:"
slab_common.c seems to be the mostly the only thing in MM, for some
weird reason (and it's even being inconsistent).
So please, just default to /* for MM stuff unless there is a pretty good
reason not to.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] tools/mm: Add madvise tool
2025-09-05 10:33 ` David Hildenbrand
@ 2025-09-05 10:35 ` Kiryl Shutsemau
0 siblings, 0 replies; 24+ messages in thread
From: Kiryl Shutsemau @ 2025-09-05 10:35 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, Usama Arif
On Fri, Sep 05, 2025 at 12:33:31PM +0200, David Hildenbrand wrote:
> On 05.09.25 12:24, Kiryl Shutsemau wrote:
> > On Fri, Sep 05, 2025 at 12:19:11PM +0200, David Hildenbrand wrote:
> > > On 04.09.25 19:57, kirill@shutemov.name wrote:
> > > > From: Kiryl Shutsemau <kas@kernel.org>
> > > >
> > > > Add a simple tool that allows to issue an advice on a process or a file.
> > > >
> > > > It can be useful to experiment with effects of an advice on a workload
> > > > without modifying the workload itself.
> > > >
> > > > Only supports advices available for process_madvise().
> > > >
> > > > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> > > > ---
> > >
> > > > + if (pid) {
> > > > + struct iovec vec = {
> > > > + .iov_base = (void *)start,
> > > > + .iov_len = end - start,
> > > > + };
> > > > + ssize_t ret;
> > > > +
> > > > + ret = process_madvise(fd, &vec, 1, advice, 0);
> > > > + if (ret < 0)
> > > > + perror("process_madvise"), exit(-1);
> > > > +
> > > > + if ((unsigned long)ret != end - start)
> > > > + printf("Partial advice occurred. Stopped at %#lx\n", start + ret);
> > > > + } else {
> > > > + unsigned long addr, hpage_pmd_size;
> > > > + void *p;
> > > > + int ret;
> > > > +
> > > > + hpage_pmd_size = read_pmd_pagesize();
> > > > + if (!hpage_pmd_size) {
> > > > + printf("Reading PMD pagesize failed");
> > > > + exit(-1);
> > > > + }
> > > > +
> > > > + // Allocate virtual address space to align the target mmap to PMD size
> > > > + // Some advices require this.
> > >
> > > I assume the kernel coding-style applies to tools/ as well, so
> > >
> > > /*
> > > * ...
> > > */
> >
> > I thought C99 comments are fine now, no? Like slab_common.c uses them.
>
> Was there a recent discussion around that?
>
> Coding style says "The preferred style for long (multi-line) comments is:"
>
> slab_common.c seems to be the mostly the only thing in MM, for some weird
> reason (and it's even being inconsistent).
>
> So please, just default to /* for MM stuff unless there is a pretty good
> reason not to.
Ack.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] tools/mm: Add madvise tool
2025-09-04 17:57 [PATCH] tools/mm: Add madvise tool kirill
` (3 preceding siblings ...)
2025-09-05 10:19 ` David Hildenbrand
@ 2025-09-05 10:37 ` Lorenzo Stoakes
2025-09-05 13:35 ` Kiryl Shutsemau
2025-09-05 16:30 ` Kiryl Shutsemau
2025-09-05 10:45 ` Usama Arif
` (2 subsequent siblings)
7 siblings, 2 replies; 24+ messages in thread
From: Lorenzo Stoakes @ 2025-09-05 10:37 UTC (permalink / raw)
To: kirill
Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, Usama Arif, Kiryl Shutsemau
On Thu, Sep 04, 2025 at 06:57:29PM +0100, kirill@shutemov.name wrote:
> From: Kiryl Shutsemau <kas@kernel.org>
>
> Add a simple tool that allows to issue an advice on a process or a file.
>
> It can be useful to experiment with effects of an advice on a workload
> without modifying the workload itself.
>
> Only supports advices available for process_madvise().
It's a mega nit but 'advice' is plural already (also used in the singular
because English is a troll language but there we go!)
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
This is nice. I really only have nits, so with those addressed LGTM and:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Thanks for doing this, this is useful! :) may use it myself in fact.
> ---
> tools/mm/.gitignore | 4 +-
> tools/mm/Makefile | 2 +-
> tools/mm/madvise.c | 170 ++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 174 insertions(+), 2 deletions(-)
> create mode 100644 tools/mm/madvise.c
>
> diff --git a/tools/mm/.gitignore b/tools/mm/.gitignore
> index 922879f93fc8..b713fcf4a2e0 100644
> --- a/tools/mm/.gitignore
> +++ b/tools/mm/.gitignore
> @@ -1,4 +1,6 @@
> # SPDX-License-Identifier: GPL-2.0-only
> -slabinfo
> +madvise
> page-types
> page_owner_sort
> +slabinfo
> +thp_swap_allocator_test
Nice to add this at the same time :)
> diff --git a/tools/mm/Makefile b/tools/mm/Makefile
> index f5725b5c23aa..db315a48adcd 100644
> --- a/tools/mm/Makefile
> +++ b/tools/mm/Makefile
> @@ -3,7 +3,7 @@
> #
> include ../scripts/Makefile.include
>
> -BUILD_TARGETS=page-types slabinfo page_owner_sort thp_swap_allocator_test
> +BUILD_TARGETS= madvise page-types page_owner_sort slabinfo thp_swap_allocator_test
> INSTALL_TARGETS = $(BUILD_TARGETS) thpmaps
>
> LIB_DIR = ../lib/api
> diff --git a/tools/mm/madvise.c b/tools/mm/madvise.c
> new file mode 100644
> index 000000000000..038b3e1076ea
> --- /dev/null
> +++ b/tools/mm/madvise.c
> @@ -0,0 +1,170 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define _GNU_SOURCE
> +#include <fcntl.h>
> +#include <limits.h>
> +#include <stdarg.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <sys/mman.h>
> +#include <sys/syscall.h>
> +
> +static void usage(void)
> +{
> + printf("madvise TARGET ADVICE START END\n\n");
> + printf("Arguments:\n");
> + printf("\t<TARGET>\n");
> + printf("\t\tA process ID or a file to give the advice to.\n\n");
> + printf("\t\tUse \"./\" prefix if the file name is all digits.\n\n");
It kinda sucks to do it this way, why not just default to a file but have a -p
or --pid param for pid?
> + printf("\t<ADVICE>\n");
> + printf("\t\tcold\t\t- Deactivate a given range of pages\n");
> + printf("\t\tcollapse\t- Collapse pages in a given range into THPs\n");
Is MADV_COLLAPSE useful for a file you map and unmap? Maybe make
process_madvise() only?
> + printf("\t\tpageout\t\t- Reclaim a given range of pages\n");
> + printf("\t\twillneed\t- The specified data will be accessed in the near future\n");
> + printf("\n\t\tSee madvise(2) for more details.\n\n");
> + printf("\t<START>/<END>\n");
> + printf("\t\tStart and end addressed for the advice. Must be page-aligned.\n\n");
> + printf("\t\tFor PID case, it is addresses in the target process address space.\n\n");
> + printf("\t\tFor file case, it is offsets in the file.\n\n");
> +}
> +
> +static void error(const char *fmt, ...)
> +{
> + if (fmt) {
> + va_list argp;
> +
> + va_start(argp, fmt);
> + vfprintf(stderr, fmt, argp);
> + va_end(argp);
> + printf("\n");
> + }
> +
> + usage();
> + exit(-1);
> +}
> +
> +#define PMD_SIZE_FILE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
> +static unsigned long read_pmd_pagesize(void)
> +{
> + int fd;
> + char buf[20];
> + ssize_t num_read;
> +
> + fd = open(PMD_SIZE_FILE_PATH, O_RDONLY);
> + if (fd == -1)
> + return 0;
> +
> + num_read = read(fd, buf, 19);
> + if (num_read < 1) {
> + close(fd);
> + return 0;
> + }
> + buf[num_read] = '\0';
> + close(fd);
> +
> + return strtoul(buf, NULL, 10);
> +}
> +
> +static int pidfd_open(pid_t pid, unsigned int flags)
> +{
> + return syscall(SYS_pidfd_open, pid, flags);
> +}
> +
> +int main(int argc, const char *argv[])
> +{
> + unsigned long pid, start, end, page_size;
> + int advice;
> + char *err;
> + int fd;
> +
> + if (argc != 5)
> + error(NULL);
> +
> + pid = strtoul(argv[1], &err, 10);
> + if (*err || err == argv[1] ||
> + pid > INT_MAX || (pid_t)pid <= 0) {
> + // Not a PID, assume argv[1] is a file name
> + pid = 0;
> + }
> +
> + if (pid) {
> + fd = pidfd_open(pid, 0);
> + if (fd < 0)
> + perror("pidfd_open()"), exit(-1);
> + } else {
> + fd = open(argv[1], O_RDWR);
> + if (fd < 0)
> + perror("open"), exit(-1);
> + }
> +
> + if (!strcmp(argv[2], "cold"))
> + advice = MADV_COLD;
> + else if (!strcmp(argv[2], "collapse"))
> + advice = MADV_COLLAPSE;
> + else if (!strcmp(argv[2], "pageout"))
> + advice = MADV_PAGEOUT;
> + else if (!strcmp(argv[2], "willneed"))
> + advice = MADV_WILLNEED;
> + else
> + error("Unknown advice: %s\n", argv[2]);
> +
> + page_size = sysconf(_SC_PAGE_SIZE);
> +
> + start = strtoul(argv[3], &err, 0);
> + if (*err || err == argv[3])
> + error("Cannot parse start address\n");
> + if (start % page_size)
> + error("Start address is not aligned to page size\n");
> + end = strtoul(argv[4], &err, 0);
> + if (*err || err == argv[4])
> + error("Cannot parse end address\n");
> + if (end % page_size)
> + error("End address is not aligned to page size\n");
> +
> + if (pid) {
> + struct iovec vec = {
> + .iov_base = (void *)start,
> + .iov_len = end - start,
> + };
> + ssize_t ret;
> +
> + ret = process_madvise(fd, &vec, 1, advice, 0);
> + if (ret < 0)
> + perror("process_madvise"), exit(-1);
> +
> + if ((unsigned long)ret != end - start)
> + printf("Partial advice occurred. Stopped at %#lx\n", start + ret);
With a single iovec this should never happen. But I guess no harm in having it.
> + } else {
> + unsigned long addr, hpage_pmd_size;
> + void *p;
> + int ret;
> +
> + hpage_pmd_size = read_pmd_pagesize();
> + if (!hpage_pmd_size) {
> + printf("Reading PMD pagesize failed");
> + exit(-1);
> + }
> +
> + // Allocate virtual address space to align the target mmap to PMD size
> + // Some advices require this.
> + p = mmap(NULL, end - start + hpage_pmd_size, PROT_NONE,
> + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> + if (p == MAP_FAILED)
> + perror("mmap0"), exit(-1);
> + addr = (unsigned long)p;
> + addr += hpage_pmd_size - 1;
> + addr &= ~(hpage_pmd_size - 1);
> +
> + p = mmap((void *)addr, end - start,
> + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED | MAP_POPULATE, fd, start);
> + if (p == MAP_FAILED)
> + perror("mmap"), exit(-1);
> +
> + ret = madvise(p, end - start, advice);
> + if (ret)
> + perror("madvise"), exit(-1);
I mean we exit immediately so it's probably not all that important, but not
munmap()'ing or closing the fd here.
> + }
> +
> + return 0;
> +}
> --
> 2.50.1
>
>
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] tools/mm: Add madvise tool
2025-09-05 10:37 ` Lorenzo Stoakes
@ 2025-09-05 13:35 ` Kiryl Shutsemau
2025-09-05 16:30 ` Kiryl Shutsemau
1 sibling, 0 replies; 24+ messages in thread
From: Kiryl Shutsemau @ 2025-09-05 13:35 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, Usama Arif
On Fri, Sep 05, 2025 at 11:37:23AM +0100, Lorenzo Stoakes wrote:
> On Thu, Sep 04, 2025 at 06:57:29PM +0100, kirill@shutemov.name wrote:
> > From: Kiryl Shutsemau <kas@kernel.org>
> >
> > Add a simple tool that allows to issue an advice on a process or a file.
> >
> > It can be useful to experiment with effects of an advice on a workload
> > without modifying the workload itself.
> >
> > Only supports advices available for process_madvise().
>
> It's a mega nit but 'advice' is plural already (also used in the singular
> because English is a troll language but there we go!)
Oof.
> >
> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
>
> This is nice. I really only have nits, so with those addressed LGTM and:
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Thanks for doing this, this is useful! :) may use it myself in fact.
>
> > ---
> > tools/mm/.gitignore | 4 +-
> > tools/mm/Makefile | 2 +-
> > tools/mm/madvise.c | 170 ++++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 174 insertions(+), 2 deletions(-)
> > create mode 100644 tools/mm/madvise.c
> >
> > diff --git a/tools/mm/.gitignore b/tools/mm/.gitignore
> > index 922879f93fc8..b713fcf4a2e0 100644
> > --- a/tools/mm/.gitignore
> > +++ b/tools/mm/.gitignore
> > @@ -1,4 +1,6 @@
> > # SPDX-License-Identifier: GPL-2.0-only
> > -slabinfo
> > +madvise
> > page-types
> > page_owner_sort
> > +slabinfo
> > +thp_swap_allocator_test
>
> Nice to add this at the same time :)
>
> > diff --git a/tools/mm/Makefile b/tools/mm/Makefile
> > index f5725b5c23aa..db315a48adcd 100644
> > --- a/tools/mm/Makefile
> > +++ b/tools/mm/Makefile
> > @@ -3,7 +3,7 @@
> > #
> > include ../scripts/Makefile.include
> >
> > -BUILD_TARGETS=page-types slabinfo page_owner_sort thp_swap_allocator_test
> > +BUILD_TARGETS= madvise page-types page_owner_sort slabinfo thp_swap_allocator_test
> > INSTALL_TARGETS = $(BUILD_TARGETS) thpmaps
> >
> > LIB_DIR = ../lib/api
> > diff --git a/tools/mm/madvise.c b/tools/mm/madvise.c
> > new file mode 100644
> > index 000000000000..038b3e1076ea
> > --- /dev/null
> > +++ b/tools/mm/madvise.c
> > @@ -0,0 +1,170 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#define _GNU_SOURCE
> > +#include <fcntl.h>
> > +#include <limits.h>
> > +#include <stdarg.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <unistd.h>
> > +#include <sys/mman.h>
> > +#include <sys/syscall.h>
> > +
> > +static void usage(void)
> > +{
> > + printf("madvise TARGET ADVICE START END\n\n");
> > + printf("Arguments:\n");
> > + printf("\t<TARGET>\n");
> > + printf("\t\tA process ID or a file to give the advice to.\n\n");
> > + printf("\t\tUse \"./\" prefix if the file name is all digits.\n\n");
>
> It kinda sucks to do it this way, why not just default to a file but have a -p
> or --pid param for pid?
I wanted to keep argv parsing trivial, but I can change it.
> > + printf("\t<ADVICE>\n");
> > + printf("\t\tcold\t\t- Deactivate a given range of pages\n");
> > + printf("\t\tcollapse\t- Collapse pages in a given range into THPs\n");
>
> Is MADV_COLLAPSE useful for a file you map and unmap? Maybe make
> process_madvise() only?
It is actually the primary case I am interested in at the moment:
collapse tmpfs file under workload. It works, but it is very fragile and
fails easily.
MAP_POPULATE is a workaround. We shouldn't require PMD to be present for
file collapse. It also put an question if MADV_COLLAPSE suppouse to
imply populate?
Alignment requirement is also silly for file mapping as we can still
benefit from file collapse due to PMD mapping in other process or from
less entries on LRU.
I also saw -ENOMEMs I cannot explain yet. I don't see why it would give
up easily on allocation.
> > + printf("\t\tpageout\t\t- Reclaim a given range of pages\n");
> > + printf("\t\twillneed\t- The specified data will be accessed in the near future\n");
> > + printf("\n\t\tSee madvise(2) for more details.\n\n");
> > + printf("\t<START>/<END>\n");
> > + printf("\t\tStart and end addressed for the advice. Must be page-aligned.\n\n");
> > + printf("\t\tFor PID case, it is addresses in the target process address space.\n\n");
> > + printf("\t\tFor file case, it is offsets in the file.\n\n");
> > +}
> > +
> > +static void error(const char *fmt, ...)
> > +{
> > + if (fmt) {
> > + va_list argp;
> > +
> > + va_start(argp, fmt);
> > + vfprintf(stderr, fmt, argp);
> > + va_end(argp);
> > + printf("\n");
> > + }
> > +
> > + usage();
> > + exit(-1);
> > +}
> > +
> > +#define PMD_SIZE_FILE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
> > +static unsigned long read_pmd_pagesize(void)
> > +{
> > + int fd;
> > + char buf[20];
> > + ssize_t num_read;
> > +
> > + fd = open(PMD_SIZE_FILE_PATH, O_RDONLY);
> > + if (fd == -1)
> > + return 0;
> > +
> > + num_read = read(fd, buf, 19);
> > + if (num_read < 1) {
> > + close(fd);
> > + return 0;
> > + }
> > + buf[num_read] = '\0';
> > + close(fd);
> > +
> > + return strtoul(buf, NULL, 10);
> > +}
> > +
> > +static int pidfd_open(pid_t pid, unsigned int flags)
> > +{
> > + return syscall(SYS_pidfd_open, pid, flags);
> > +}
> > +
> > +int main(int argc, const char *argv[])
> > +{
> > + unsigned long pid, start, end, page_size;
> > + int advice;
> > + char *err;
> > + int fd;
> > +
> > + if (argc != 5)
> > + error(NULL);
> > +
> > + pid = strtoul(argv[1], &err, 10);
> > + if (*err || err == argv[1] ||
> > + pid > INT_MAX || (pid_t)pid <= 0) {
> > + // Not a PID, assume argv[1] is a file name
> > + pid = 0;
> > + }
> > +
> > + if (pid) {
> > + fd = pidfd_open(pid, 0);
> > + if (fd < 0)
> > + perror("pidfd_open()"), exit(-1);
> > + } else {
> > + fd = open(argv[1], O_RDWR);
> > + if (fd < 0)
> > + perror("open"), exit(-1);
> > + }
> > +
> > + if (!strcmp(argv[2], "cold"))
> > + advice = MADV_COLD;
> > + else if (!strcmp(argv[2], "collapse"))
> > + advice = MADV_COLLAPSE;
> > + else if (!strcmp(argv[2], "pageout"))
> > + advice = MADV_PAGEOUT;
> > + else if (!strcmp(argv[2], "willneed"))
> > + advice = MADV_WILLNEED;
> > + else
> > + error("Unknown advice: %s\n", argv[2]);
> > +
> > + page_size = sysconf(_SC_PAGE_SIZE);
> > +
> > + start = strtoul(argv[3], &err, 0);
> > + if (*err || err == argv[3])
> > + error("Cannot parse start address\n");
> > + if (start % page_size)
> > + error("Start address is not aligned to page size\n");
> > + end = strtoul(argv[4], &err, 0);
> > + if (*err || err == argv[4])
> > + error("Cannot parse end address\n");
> > + if (end % page_size)
> > + error("End address is not aligned to page size\n");
> > +
> > + if (pid) {
> > + struct iovec vec = {
> > + .iov_base = (void *)start,
> > + .iov_len = end - start,
> > + };
> > + ssize_t ret;
> > +
> > + ret = process_madvise(fd, &vec, 1, advice, 0);
> > + if (ret < 0)
> > + perror("process_madvise"), exit(-1);
> > +
> > + if ((unsigned long)ret != end - start)
> > + printf("Partial advice occurred. Stopped at %#lx\n", start + ret);
>
> With a single iovec this should never happen. But I guess no harm in having it.
process_madvise(2):
The advice might be applied to only a part of iovec...
As caller controls the range, it seems possible to me.
> > + } else {
> > + unsigned long addr, hpage_pmd_size;
> > + void *p;
> > + int ret;
> > +
> > + hpage_pmd_size = read_pmd_pagesize();
> > + if (!hpage_pmd_size) {
> > + printf("Reading PMD pagesize failed");
> > + exit(-1);
> > + }
> > +
> > + // Allocate virtual address space to align the target mmap to PMD size
> > + // Some advices require this.
> > + p = mmap(NULL, end - start + hpage_pmd_size, PROT_NONE,
> > + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > + if (p == MAP_FAILED)
> > + perror("mmap0"), exit(-1);
> > + addr = (unsigned long)p;
> > + addr += hpage_pmd_size - 1;
> > + addr &= ~(hpage_pmd_size - 1);
> > +
> > + p = mmap((void *)addr, end - start,
> > + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED | MAP_POPULATE, fd, start);
> > + if (p == MAP_FAILED)
> > + perror("mmap"), exit(-1);
> > +
> > + ret = madvise(p, end - start, advice);
> > + if (ret)
> > + perror("madvise"), exit(-1);
>
> I mean we exit immediately so it's probably not all that important, but not
> munmap()'ing or closing the fd here.
We have kernel for this :P
>
> > + }
> > +
> > + return 0;
> > +}
> > --
> > 2.50.1
> >
> >
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] tools/mm: Add madvise tool
2025-09-05 10:37 ` Lorenzo Stoakes
2025-09-05 13:35 ` Kiryl Shutsemau
@ 2025-09-05 16:30 ` Kiryl Shutsemau
1 sibling, 0 replies; 24+ messages in thread
From: Kiryl Shutsemau @ 2025-09-05 16:30 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, Usama Arif
On Fri, Sep 05, 2025 at 11:37:23AM +0100, Lorenzo Stoakes wrote:
> On Thu, Sep 04, 2025 at 06:57:29PM +0100, kirill@shutemov.name wrote:
> > From: Kiryl Shutsemau <kas@kernel.org>
> >
> > Add a simple tool that allows to issue an advice on a process or a file.
> >
> > It can be useful to experiment with effects of an advice on a workload
> > without modifying the workload itself.
> >
> > Only supports advices available for process_madvise().
>
> It's a mega nit but 'advice' is plural already (also used in the singular
> because English is a troll language but there we go!)
Oof.
> >
> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
>
> This is nice. I really only have nits, so with those addressed LGTM and:
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Thanks for doing this, this is useful! :) may use it myself in fact.
>
> > ---
> > tools/mm/.gitignore | 4 +-
> > tools/mm/Makefile | 2 +-
> > tools/mm/madvise.c | 170 ++++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 174 insertions(+), 2 deletions(-)
> > create mode 100644 tools/mm/madvise.c
> >
> > diff --git a/tools/mm/.gitignore b/tools/mm/.gitignore
> > index 922879f93fc8..b713fcf4a2e0 100644
> > --- a/tools/mm/.gitignore
> > +++ b/tools/mm/.gitignore
> > @@ -1,4 +1,6 @@
> > # SPDX-License-Identifier: GPL-2.0-only
> > -slabinfo
> > +madvise
> > page-types
> > page_owner_sort
> > +slabinfo
> > +thp_swap_allocator_test
>
> Nice to add this at the same time :)
>
> > diff --git a/tools/mm/Makefile b/tools/mm/Makefile
> > index f5725b5c23aa..db315a48adcd 100644
> > --- a/tools/mm/Makefile
> > +++ b/tools/mm/Makefile
> > @@ -3,7 +3,7 @@
> > #
> > include ../scripts/Makefile.include
> >
> > -BUILD_TARGETS=page-types slabinfo page_owner_sort thp_swap_allocator_test
> > +BUILD_TARGETS= madvise page-types page_owner_sort slabinfo thp_swap_allocator_test
> > INSTALL_TARGETS = $(BUILD_TARGETS) thpmaps
> >
> > LIB_DIR = ../lib/api
> > diff --git a/tools/mm/madvise.c b/tools/mm/madvise.c
> > new file mode 100644
> > index 000000000000..038b3e1076ea
> > --- /dev/null
> > +++ b/tools/mm/madvise.c
> > @@ -0,0 +1,170 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#define _GNU_SOURCE
> > +#include <fcntl.h>
> > +#include <limits.h>
> > +#include <stdarg.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <unistd.h>
> > +#include <sys/mman.h>
> > +#include <sys/syscall.h>
> > +
> > +static void usage(void)
> > +{
> > + printf("madvise TARGET ADVICE START END\n\n");
> > + printf("Arguments:\n");
> > + printf("\t<TARGET>\n");
> > + printf("\t\tA process ID or a file to give the advice to.\n\n");
> > + printf("\t\tUse \"./\" prefix if the file name is all digits.\n\n");
>
> It kinda sucks to do it this way, why not just default to a file but have a -p
> or --pid param for pid?
I wanted to keep argv parsing trivial, but I can change it.
> > + printf("\t<ADVICE>\n");
> > + printf("\t\tcold\t\t- Deactivate a given range of pages\n");
> > + printf("\t\tcollapse\t- Collapse pages in a given range into THPs\n");
>
> Is MADV_COLLAPSE useful for a file you map and unmap? Maybe make
> process_madvise() only?
It is actually the primary case I am interested in at the moment:
collapse tmpfs file under workload. It works, but it is very fragile and
fails easily.
MAP_POPULATE is a workaround. We shouldn't require PMD to be present for
file collapse. It also put an question if MADV_COLLAPSE suppouse to
imply populate?
Alignment requirement is also silly for file mapping as we can still
benefit from file collapse due to PMD mapping in other process or from
less entries on LRU.
I also saw -ENOMEMs I cannot explain yet. I don't see why it would give
up easily on allocation.
> > + printf("\t\tpageout\t\t- Reclaim a given range of pages\n");
> > + printf("\t\twillneed\t- The specified data will be accessed in the near future\n");
> > + printf("\n\t\tSee madvise(2) for more details.\n\n");
> > + printf("\t<START>/<END>\n");
> > + printf("\t\tStart and end addressed for the advice. Must be page-aligned.\n\n");
> > + printf("\t\tFor PID case, it is addresses in the target process address space.\n\n");
> > + printf("\t\tFor file case, it is offsets in the file.\n\n");
> > +}
> > +
> > +static void error(const char *fmt, ...)
> > +{
> > + if (fmt) {
> > + va_list argp;
> > +
> > + va_start(argp, fmt);
> > + vfprintf(stderr, fmt, argp);
> > + va_end(argp);
> > + printf("\n");
> > + }
> > +
> > + usage();
> > + exit(-1);
> > +}
> > +
> > +#define PMD_SIZE_FILE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
> > +static unsigned long read_pmd_pagesize(void)
> > +{
> > + int fd;
> > + char buf[20];
> > + ssize_t num_read;
> > +
> > + fd = open(PMD_SIZE_FILE_PATH, O_RDONLY);
> > + if (fd == -1)
> > + return 0;
> > +
> > + num_read = read(fd, buf, 19);
> > + if (num_read < 1) {
> > + close(fd);
> > + return 0;
> > + }
> > + buf[num_read] = '\0';
> > + close(fd);
> > +
> > + return strtoul(buf, NULL, 10);
> > +}
> > +
> > +static int pidfd_open(pid_t pid, unsigned int flags)
> > +{
> > + return syscall(SYS_pidfd_open, pid, flags);
> > +}
> > +
> > +int main(int argc, const char *argv[])
> > +{
> > + unsigned long pid, start, end, page_size;
> > + int advice;
> > + char *err;
> > + int fd;
> > +
> > + if (argc != 5)
> > + error(NULL);
> > +
> > + pid = strtoul(argv[1], &err, 10);
> > + if (*err || err == argv[1] ||
> > + pid > INT_MAX || (pid_t)pid <= 0) {
> > + // Not a PID, assume argv[1] is a file name
> > + pid = 0;
> > + }
> > +
> > + if (pid) {
> > + fd = pidfd_open(pid, 0);
> > + if (fd < 0)
> > + perror("pidfd_open()"), exit(-1);
> > + } else {
> > + fd = open(argv[1], O_RDWR);
> > + if (fd < 0)
> > + perror("open"), exit(-1);
> > + }
> > +
> > + if (!strcmp(argv[2], "cold"))
> > + advice = MADV_COLD;
> > + else if (!strcmp(argv[2], "collapse"))
> > + advice = MADV_COLLAPSE;
> > + else if (!strcmp(argv[2], "pageout"))
> > + advice = MADV_PAGEOUT;
> > + else if (!strcmp(argv[2], "willneed"))
> > + advice = MADV_WILLNEED;
> > + else
> > + error("Unknown advice: %s\n", argv[2]);
> > +
> > + page_size = sysconf(_SC_PAGE_SIZE);
> > +
> > + start = strtoul(argv[3], &err, 0);
> > + if (*err || err == argv[3])
> > + error("Cannot parse start address\n");
> > + if (start % page_size)
> > + error("Start address is not aligned to page size\n");
> > + end = strtoul(argv[4], &err, 0);
> > + if (*err || err == argv[4])
> > + error("Cannot parse end address\n");
> > + if (end % page_size)
> > + error("End address is not aligned to page size\n");
> > +
> > + if (pid) {
> > + struct iovec vec = {
> > + .iov_base = (void *)start,
> > + .iov_len = end - start,
> > + };
> > + ssize_t ret;
> > +
> > + ret = process_madvise(fd, &vec, 1, advice, 0);
> > + if (ret < 0)
> > + perror("process_madvise"), exit(-1);
> > +
> > + if ((unsigned long)ret != end - start)
> > + printf("Partial advice occurred. Stopped at %#lx\n", start + ret);
>
> With a single iovec this should never happen. But I guess no harm in having it.
process_madvise(2):
The advice might be applied to only a part of iovec...
As caller controls the range, it seems possible to me.
> > + } else {
> > + unsigned long addr, hpage_pmd_size;
> > + void *p;
> > + int ret;
> > +
> > + hpage_pmd_size = read_pmd_pagesize();
> > + if (!hpage_pmd_size) {
> > + printf("Reading PMD pagesize failed");
> > + exit(-1);
> > + }
> > +
> > + // Allocate virtual address space to align the target mmap to PMD size
> > + // Some advices require this.
> > + p = mmap(NULL, end - start + hpage_pmd_size, PROT_NONE,
> > + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > + if (p == MAP_FAILED)
> > + perror("mmap0"), exit(-1);
> > + addr = (unsigned long)p;
> > + addr += hpage_pmd_size - 1;
> > + addr &= ~(hpage_pmd_size - 1);
> > +
> > + p = mmap((void *)addr, end - start,
> > + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED | MAP_POPULATE, fd, start);
> > + if (p == MAP_FAILED)
> > + perror("mmap"), exit(-1);
> > +
> > + ret = madvise(p, end - start, advice);
> > + if (ret)
> > + perror("madvise"), exit(-1);
>
> I mean we exit immediately so it's probably not all that important, but not
> munmap()'ing or closing the fd here.
We have kernel for this :P
>
> > + }
> > +
> > + return 0;
> > +}
> > --
> > 2.50.1
> >
> >
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] tools/mm: Add madvise tool
2025-09-04 17:57 [PATCH] tools/mm: Add madvise tool kirill
` (4 preceding siblings ...)
2025-09-05 10:37 ` Lorenzo Stoakes
@ 2025-09-05 10:45 ` Usama Arif
2025-09-05 10:59 ` Mike Rapoport
2025-09-10 11:34 ` [ANNOUNCE] mm-tools: Random tools MM-related tools Kiryl Shutsemau
7 siblings, 0 replies; 24+ messages in thread
From: Usama Arif @ 2025-09-05 10:45 UTC (permalink / raw)
To: kirill, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm
Cc: Kiryl Shutsemau
On 04/09/2025 18:57, kirill@shutemov.name wrote:
> From: Kiryl Shutsemau <kas@kernel.org>
>
> Add a simple tool that allows to issue an advice on a process or a file.
>
> It can be useful to experiment with effects of an advice on a workload
> without modifying the workload itself.
>
> Only supports advices available for process_madvise().
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> ---
> tools/mm/.gitignore | 4 +-
> tools/mm/Makefile | 2 +-
> tools/mm/madvise.c | 170 ++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 174 insertions(+), 2 deletions(-)
> create mode 100644 tools/mm/madvise.c
>
> diff --git a/tools/mm/.gitignore b/tools/mm/.gitignore
> index 922879f93fc8..b713fcf4a2e0 100644
> --- a/tools/mm/.gitignore
> +++ b/tools/mm/.gitignore
> @@ -1,4 +1,6 @@
> # SPDX-License-Identifier: GPL-2.0-only
> -slabinfo
> +madvise
> page-types
> page_owner_sort
> +slabinfo
> +thp_swap_allocator_test
> diff --git a/tools/mm/Makefile b/tools/mm/Makefile
> index f5725b5c23aa..db315a48adcd 100644
> --- a/tools/mm/Makefile
> +++ b/tools/mm/Makefile
> @@ -3,7 +3,7 @@
> #
> include ../scripts/Makefile.include
>
> -BUILD_TARGETS=page-types slabinfo page_owner_sort thp_swap_allocator_test
> +BUILD_TARGETS= madvise page-types page_owner_sort slabinfo thp_swap_allocator_test
> INSTALL_TARGETS = $(BUILD_TARGETS) thpmaps
>
> LIB_DIR = ../lib/api
> diff --git a/tools/mm/madvise.c b/tools/mm/madvise.c
> new file mode 100644
> index 000000000000..038b3e1076ea
> --- /dev/null
> +++ b/tools/mm/madvise.c
> @@ -0,0 +1,170 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define _GNU_SOURCE
> +#include <fcntl.h>
> +#include <limits.h>
> +#include <stdarg.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <sys/mman.h>
> +#include <sys/syscall.h>
> +
> +static void usage(void)
> +{
> + printf("madvise TARGET ADVICE START END\n\n");
> + printf("Arguments:\n");
> + printf("\t<TARGET>\n");
> + printf("\t\tA process ID or a file to give the advice to.\n\n");
> + printf("\t\tUse \"./\" prefix if the file name is all digits.\n\n");
> + printf("\t<ADVICE>\n");
> + printf("\t\tcold\t\t- Deactivate a given range of pages\n");
> + printf("\t\tcollapse\t- Collapse pages in a given range into THPs\n");
> + printf("\t\tpageout\t\t- Reclaim a given range of pages\n");
> + printf("\t\twillneed\t- The specified data will be accessed in the near future\n");
> + printf("\n\t\tSee madvise(2) for more details.\n\n");
> + printf("\t<START>/<END>\n");
> + printf("\t\tStart and end addressed for the advice. Must be page-aligned.\n\n");
> + printf("\t\tFor PID case, it is addresses in the target process address space.\n\n");
> + printf("\t\tFor file case, it is offsets in the file.\n\n");
> +}
> +
> +static void error(const char *fmt, ...)
> +{
> + if (fmt) {
> + va_list argp;
> +
> + va_start(argp, fmt);
> + vfprintf(stderr, fmt, argp);
> + va_end(argp);
> + printf("\n");
> + }
> +
> + usage();
> + exit(-1);
> +}
> +
> +#define PMD_SIZE_FILE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
> +static unsigned long read_pmd_pagesize(void)
> +{
> + int fd;
> + char buf[20];
> + ssize_t num_read;
> +
> + fd = open(PMD_SIZE_FILE_PATH, O_RDONLY);
> + if (fd == -1)
> + return 0;
> +
> + num_read = read(fd, buf, 19);
> + if (num_read < 1) {
> + close(fd);
> + return 0;
> + }
> + buf[num_read] = '\0';
> + close(fd);
> +
> + return strtoul(buf, NULL, 10);
> +}
> +
> +static int pidfd_open(pid_t pid, unsigned int flags)
> +{
> + return syscall(SYS_pidfd_open, pid, flags);
> +}
> +
> +int main(int argc, const char *argv[])
> +{
> + unsigned long pid, start, end, page_size;
> + int advice;
> + char *err;
> + int fd;
> +
> + if (argc != 5)
> + error(NULL);
> +
> + pid = strtoul(argv[1], &err, 10);
> + if (*err || err == argv[1] ||
> + pid > INT_MAX || (pid_t)pid <= 0) {
> + // Not a PID, assume argv[1] is a file name
> + pid = 0;
> + }
> +
> + if (pid) {
> + fd = pidfd_open(pid, 0);
> + if (fd < 0)
> + perror("pidfd_open()"), exit(-1);
> + } else {
> + fd = open(argv[1], O_RDWR);
> + if (fd < 0)
> + perror("open"), exit(-1);
> + }
> +
> + if (!strcmp(argv[2], "cold"))
> + advice = MADV_COLD;
> + else if (!strcmp(argv[2], "collapse"))
> + advice = MADV_COLLAPSE;
> + else if (!strcmp(argv[2], "pageout"))
> + advice = MADV_PAGEOUT;
> + else if (!strcmp(argv[2], "willneed"))
> + advice = MADV_WILLNEED;
> + else
> + error("Unknown advice: %s\n", argv[2]);
> +
> + page_size = sysconf(_SC_PAGE_SIZE);
> +
> + start = strtoul(argv[3], &err, 0);
> + if (*err || err == argv[3])
> + error("Cannot parse start address\n");
> + if (start % page_size)
> + error("Start address is not aligned to page size\n");
> + end = strtoul(argv[4], &err, 0);
> + if (*err || err == argv[4])
> + error("Cannot parse end address\n");
> + if (end % page_size)
> + error("End address is not aligned to page size\n");
> +
> + if (pid) {
> + struct iovec vec = {
> + .iov_base = (void *)start,
> + .iov_len = end - start,
> + };
> + ssize_t ret;
> +
> + ret = process_madvise(fd, &vec, 1, advice, 0);
> + if (ret < 0)
> + perror("process_madvise"), exit(-1);
> +
> + if ((unsigned long)ret != end - start)
> + printf("Partial advice occurred. Stopped at %#lx\n", start + ret);
> + } else {
> + unsigned long addr, hpage_pmd_size;
> + void *p;
> + int ret;
> +
> + hpage_pmd_size = read_pmd_pagesize();
> + if (!hpage_pmd_size) {
> + printf("Reading PMD pagesize failed");
> + exit(-1);
> + }
> +
> + // Allocate virtual address space to align the target mmap to PMD size
> + // Some advices require this.
> + p = mmap(NULL, end - start + hpage_pmd_size, PROT_NONE,
> + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> + if (p == MAP_FAILED)
> + perror("mmap0"), exit(-1);
> + addr = (unsigned long)p;
> + addr += hpage_pmd_size - 1;
> + addr &= ~(hpage_pmd_size - 1);
> +
> + p = mmap((void *)addr, end - start,
> + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED | MAP_POPULATE, fd, start);
> + if (p == MAP_FAILED)
> + perror("mmap"), exit(-1);
> +
> + ret = madvise(p, end - start, advice);
> + if (ret)
> + perror("madvise"), exit(-1);
Would be good to munmap here and check output?
> + }
> +
> + return 0;
> +}
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] tools/mm: Add madvise tool
2025-09-04 17:57 [PATCH] tools/mm: Add madvise tool kirill
` (5 preceding siblings ...)
2025-09-05 10:45 ` Usama Arif
@ 2025-09-05 10:59 ` Mike Rapoport
2025-09-05 13:36 ` Kiryl Shutsemau
2025-09-10 11:34 ` [ANNOUNCE] mm-tools: Random tools MM-related tools Kiryl Shutsemau
7 siblings, 1 reply; 24+ messages in thread
From: Mike Rapoport @ 2025-09-05 10:59 UTC (permalink / raw)
To: kirill
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
Michal Hocko, linux-mm, Usama Arif, Kiryl Shutsemau
On Thu, Sep 04, 2025 at 06:57:29PM +0100, kirill@shutemov.name wrote:
> From: Kiryl Shutsemau <kas@kernel.org>
>
> Add a simple tool that allows to issue an advice on a process or a file.
>
> It can be useful to experiment with effects of an advice on a workload
> without modifying the workload itself.
>
> Only supports advices available for process_madvise().
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> ---
> tools/mm/.gitignore | 4 +-
> tools/mm/Makefile | 2 +-
> tools/mm/madvise.c | 170 ++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 174 insertions(+), 2 deletions(-)
> create mode 100644 tools/mm/madvise.c
>
...
> + unsigned long addr, hpage_pmd_size;
> + void *p;
> + int ret;
> +
> + hpage_pmd_size = read_pmd_pagesize();
> + if (!hpage_pmd_size) {
> + printf("Reading PMD pagesize failed");
> + exit(-1);
Shouldn't it fail only for collapse? Other advices don't depend on THP.
> + }
> +
> + // Allocate virtual address space to align the target mmap to PMD size
> + // Some advices require this.
> + p = mmap(NULL, end - start + hpage_pmd_size, PROT_NONE,
> + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> + if (p == MAP_FAILED)
> + perror("mmap0"), exit(-1);
> + addr = (unsigned long)p;
> + addr += hpage_pmd_size - 1;
> + addr &= ~(hpage_pmd_size - 1);
> +
> + p = mmap((void *)addr, end - start,
> + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED | MAP_POPULATE, fd, start);
> + if (p == MAP_FAILED)
> + perror("mmap"), exit(-1);
> +
> + ret = madvise(p, end - start, advice);
> + if (ret)
> + perror("madvise"), exit(-1);
> + }
> +
> + return 0;
> +}
> --
> 2.50.1
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] tools/mm: Add madvise tool
2025-09-05 10:59 ` Mike Rapoport
@ 2025-09-05 13:36 ` Kiryl Shutsemau
2025-09-06 8:00 ` Mike Rapoport
0 siblings, 1 reply; 24+ messages in thread
From: Kiryl Shutsemau @ 2025-09-05 13:36 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
Michal Hocko, linux-mm, Usama Arif
On Fri, Sep 05, 2025 at 01:59:10PM +0300, Mike Rapoport wrote:
> On Thu, Sep 04, 2025 at 06:57:29PM +0100, kirill@shutemov.name wrote:
> > From: Kiryl Shutsemau <kas@kernel.org>
> >
> > Add a simple tool that allows to issue an advice on a process or a file.
> >
> > It can be useful to experiment with effects of an advice on a workload
> > without modifying the workload itself.
> >
> > Only supports advices available for process_madvise().
> >
> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> > ---
> > tools/mm/.gitignore | 4 +-
> > tools/mm/Makefile | 2 +-
> > tools/mm/madvise.c | 170 ++++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 174 insertions(+), 2 deletions(-)
> > create mode 100644 tools/mm/madvise.c
> >
>
> ...
>
> > + unsigned long addr, hpage_pmd_size;
> > + void *p;
> > + int ret;
> > +
> > + hpage_pmd_size = read_pmd_pagesize();
> > + if (!hpage_pmd_size) {
> > + printf("Reading PMD pagesize failed");
> > + exit(-1);
>
> Shouldn't it fail only for collapse? Other advices don't depend on THP.
True. But aligning everybody wouldn't hurt.
I would argue MADV_COLLAPSE shouldn't require this either. You don't
need PMD to collapse page cache folio.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] tools/mm: Add madvise tool
2025-09-05 13:36 ` Kiryl Shutsemau
@ 2025-09-06 8:00 ` Mike Rapoport
2025-09-08 10:04 ` Kiryl Shutsemau
0 siblings, 1 reply; 24+ messages in thread
From: Mike Rapoport @ 2025-09-06 8:00 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
Michal Hocko, linux-mm, Usama Arif
On Fri, Sep 05, 2025 at 02:36:36PM +0100, Kiryl Shutsemau wrote:
> On Fri, Sep 05, 2025 at 01:59:10PM +0300, Mike Rapoport wrote:
> > On Thu, Sep 04, 2025 at 06:57:29PM +0100, kirill@shutemov.name wrote:
> > > From: Kiryl Shutsemau <kas@kernel.org>
> > >
> > > Add a simple tool that allows to issue an advice on a process or a file.
> > >
> > > It can be useful to experiment with effects of an advice on a workload
> > > without modifying the workload itself.
> > >
> > > Only supports advices available for process_madvise().
> > >
> > > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> > > ---
> > > tools/mm/.gitignore | 4 +-
> > > tools/mm/Makefile | 2 +-
> > > tools/mm/madvise.c | 170 ++++++++++++++++++++++++++++++++++++++++++++
> > > 3 files changed, 174 insertions(+), 2 deletions(-)
> > > create mode 100644 tools/mm/madvise.c
> > >
> >
> > ...
> >
> > > + unsigned long addr, hpage_pmd_size;
> > > + void *p;
> > > + int ret;
> > > +
> > > + hpage_pmd_size = read_pmd_pagesize();
> > > + if (!hpage_pmd_size) {
> > > + printf("Reading PMD pagesize failed");
> > > + exit(-1);
> >
> > Shouldn't it fail only for collapse? Other advices don't depend on THP.
>
> True. But aligning everybody wouldn't hurt.
Right, but this make the tool unusable for kernels built with !THP.
> I would argue MADV_COLLAPSE shouldn't require this either. You don't
> need PMD to collapse page cache folio.
>
> --
> Kiryl Shutsemau / Kirill A. Shutemov
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] tools/mm: Add madvise tool
2025-09-06 8:00 ` Mike Rapoport
@ 2025-09-08 10:04 ` Kiryl Shutsemau
0 siblings, 0 replies; 24+ messages in thread
From: Kiryl Shutsemau @ 2025-09-08 10:04 UTC (permalink / raw)
To: Mike Rapoport
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
Michal Hocko, linux-mm, Usama Arif
On Sat, Sep 06, 2025 at 11:00:36AM +0300, Mike Rapoport wrote:
> On Fri, Sep 05, 2025 at 02:36:36PM +0100, Kiryl Shutsemau wrote:
> > On Fri, Sep 05, 2025 at 01:59:10PM +0300, Mike Rapoport wrote:
> > > On Thu, Sep 04, 2025 at 06:57:29PM +0100, kirill@shutemov.name wrote:
> > > > From: Kiryl Shutsemau <kas@kernel.org>
> > > >
> > > > Add a simple tool that allows to issue an advice on a process or a file.
> > > >
> > > > It can be useful to experiment with effects of an advice on a workload
> > > > without modifying the workload itself.
> > > >
> > > > Only supports advices available for process_madvise().
> > > >
> > > > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> > > > ---
> > > > tools/mm/.gitignore | 4 +-
> > > > tools/mm/Makefile | 2 +-
> > > > tools/mm/madvise.c | 170 ++++++++++++++++++++++++++++++++++++++++++++
> > > > 3 files changed, 174 insertions(+), 2 deletions(-)
> > > > create mode 100644 tools/mm/madvise.c
> > > >
> > >
> > > ...
> > >
> > > > + unsigned long addr, hpage_pmd_size;
> > > > + void *p;
> > > > + int ret;
> > > > +
> > > > + hpage_pmd_size = read_pmd_pagesize();
> > > > + if (!hpage_pmd_size) {
> > > > + printf("Reading PMD pagesize failed");
> > > > + exit(-1);
> > >
> > > Shouldn't it fail only for collapse? Other advices don't depend on THP.
> >
> > True. But aligning everybody wouldn't hurt.
>
> Right, but this make the tool unusable for kernels built with !THP.
Yeah, I missed that.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 24+ messages in thread
* [ANNOUNCE] mm-tools: Random tools MM-related tools
2025-09-04 17:57 [PATCH] tools/mm: Add madvise tool kirill
` (6 preceding siblings ...)
2025-09-05 10:59 ` Mike Rapoport
@ 2025-09-10 11:34 ` Kiryl Shutsemau
7 siblings, 0 replies; 24+ messages in thread
From: Kiryl Shutsemau @ 2025-09-10 11:34 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm
Cc: Usama Arif
On Thu, Sep 04, 2025 at 06:57:29PM +0100, kirill@shutemov.name wrote:
> From: Kiryl Shutsemau <kas@kernel.org>
>
> Add a simple tool that allows to issue an advice on a process or a file.
JFYI, I split it into two tools: one for PID and one for FILE.
Pushed here:
https://github.com/kiryl/mm-tools
If you want to put anything else there, let me know.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 24+ messages in thread