From: Mike Kravetz <mike.kravetz@oracle.com>
To: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
akpm@linux-foundation.org, songmuchun@bytedance.com,
willy@infradead.org
Subject: Re: [PATCH v6] mm/filemap: remove hugetlb special casing in filemap.c
Date: Wed, 6 Sep 2023 17:18:32 -0700 [thread overview]
Message-ID: <20230907001832.GA63356@monkey> (raw)
In-Reply-To: <c2519e75-2354-9dc0-d771-c7ad2bbcf80d@oracle.com>
[-- Attachment #1: Type: text/plain, Size: 2654 bytes --]
On 09/04/23 21:05, Sidhartha Kumar wrote:
> On 8/21/23 11:33 AM, Mike Kravetz wrote:
> > On 08/17/23 11:18, Sidhartha Kumar wrote:
> > > Remove special cased hugetlb handling code within the page cache by
> > > changing the granularity of each index to the base page size rather than
> > > the huge page size. Adds new wrappers for hugetlb code to to interact with the
> > > page cache which convert to a linear index.
> > <snip>
> > > @@ -237,7 +234,7 @@ void filemap_free_folio(struct address_space *mapping, struct folio *folio)
> > > if (free_folio)
> > > free_folio(folio);
> > > - if (folio_test_large(folio) && !folio_test_hugetlb(folio))
> > > + if (folio_test_large(folio))
> > > refs = folio_nr_pages(folio);
> > > folio_put_refs(folio, refs);
> > > }
> > > @@ -858,14 +855,15 @@ noinline int __filemap_add_folio(struct address_space *mapping,
> > > if (!huge) {
> > > int error = mem_cgroup_charge(folio, NULL, gfp);
> > > - VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio);
> > > if (error)
> > > return error;
> > > charged = true;
> > > - xas_set_order(&xas, index, folio_order(folio));
> > > - nr = folio_nr_pages(folio);
> > > }
> >
> > When a hugetlb page is added to the page cache, the ref count will now
> > be increased by folio_nr_pages. So, the ref count for a 2MB hugetlb page
> > on x86 will be increased by 512.
> >
> > We will need a corresponding change to migrate_huge_page_move_mapping().
> > For migration, the ref count is checked as follows:
> >
> > xas_lock_irq(&xas);
> > expected_count = 2 + folio_has_private(src);
> Hi Mike,
>
> Thanks for catching this. Changing this line to:
> + expected_count = folio_expected_refs(mapping, src);
> seems to fix migration from my testing. My test was inserting a sleep() in
> the hugepage-mmap.c selftest and running the migratepages command.
>
> With this version of the patch:
> migrate_pages(44906, 65, [0x0000000000000001], [0x0000000000000002]) = 75
> which means 75 pages did not migrate and after the change to
> folio_expected_refs():
> migrate_pages(7344, 65, [0x0000000000000001], [0x0000000000000002]) = 0
>
> Does that change look correct to you?
I just ran the simple attached test program (don't laugh) on the suggested
change. Command line './move-pages 2 /var/opt/oracle/hugepool/foo'.
Unfortunately, migration is not working as expected. The source pages of
the migration are not freed.
I have not taken a closer look at the code to get an idea about root cause.
Certainly, it has to do with the ref counts. I can look closer in a day or
two if you have not resolved the issue.
--
Mike Kravetz
[-- Attachment #2: move-pages.c --]
[-- Type: text/plain, Size: 3777 bytes --]
/*
* hugepage-mmap:
*
* Example of using huge page memory in a user application using the mmap
* system call. Before running this application, make sure that the
* administrator has mounted the hugetlbfs filesystem (on some directory
* like /mnt) using the command mount -t hugetlbfs nodev /mnt. In this
* example, the app is requesting memory of size 256MB that is backed by
* huge pages.
*
* For the ia64 architecture, the Linux kernel reserves Region number 4 for
* huge pages. That means that if one requires a fixed address, a huge page
* aligned address starting with 0x800000... will be required. If a fixed
* address is not required, the kernel will select an address in the proper
* range.
* Other architectures, such as ppc64, i386 or x86_64 are not so constrained.
*/
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#define __USE_GNU
#include <fcntl.h>
#include <errno.h>
#include <sys/types.h>
#include <time.h>
#include <numa.h>
#include <numaif.h>
#define USAGE "USAGE: %s num_hpages hugepagefile_name"
#define H_PAGESIZE (2 * 1024 * 1024)
#define B_PAGESIZE (4096)
#define ITERATIONS 100000
#define PROTECTION (PROT_READ | PROT_WRITE)
#define ADDR (void *)(0x0UL)
#define FLAGS (MAP_SHARED)
int main(int argc, char ** argv)
{
char *f_name;
char *sep;
char ch;
int fd;
long i;
long long hpages, bpages;
void *addr;
char foo;
long count = 0;
void **pages;
int *nodes;
int *status;
int flags;
long m_ret;
/*
* HARD CODED FOR TWO NODES: 0 and 1
*/
unsigned long node0_mask = 01L << 0;
unsigned long node1_mask = 01L << 1;
if (argc != 3) {
printf(USAGE, argv[0]);
exit (1);
}
hpages = strtol(argv[1], &sep, 0);
if (errno || hpages < 0) {
printf("Invalid number hpages (%s)\n", argv[1]);
printf(USAGE, argv[0]);
exit (1);
}
bpages = hpages * (H_PAGESIZE / B_PAGESIZE);
f_name = argv[2];
fd = open(f_name, O_CREAT | O_RDWR, 0755);
if (fd < 0) {
printf("Open of %s failed", argv[2]);
exit(1);
}
addr = mmap(ADDR, hpages * H_PAGESIZE, PROTECTION, FLAGS, fd, 0);
if (addr == MAP_FAILED) {
perror("mmap");
exit (1);
}
printf("%ld huge pages mapped at 0x%lx\n", hpages,
( unsigned long)addr);
printf("Faulting in all pages\n");
for (i=0; i < hpages; i++)
foo = *((char *)(addr + (i * H_PAGESIZE)));
pages = malloc(bpages * sizeof(void *));
nodes = malloc(bpages * sizeof(int));
status = malloc(bpages * sizeof(int));
if (!pages || !nodes || !status) {
printf("error allocating memory for arrays\n");
exit (1);
}
while (1) {
printf("Hit any key to move hugetlb pages to node 1\n");
read(STDIN_FILENO, &ch, 1);
for (i=0; i < hpages; i++) {
pages[i] = addr + (i * H_PAGESIZE);
// pages[i] = addr + (i * H_PAGESIZE) + B_PAGESIZE;
nodes[i] = 1;
status[i] = -1;
flags = MPOL_MF_MOVE_ALL;
}
m_ret = numa_move_pages(0, hpages, pages, nodes, status, flags);
if (m_ret) {
perror("move_pages");
if (m_ret > 0)
printf("%ld pages not migrated\n", m_ret);
} else {
printf("Success!\n");
}
for (i=0; i < hpages; i++) {
printf("\tstatus[%d] = %d\n", i, status[i]);
status[i] = -1;
}
printf("Hit any key to move hugetlb pages to node 0\n");
read(STDIN_FILENO, &ch, 1);
for (i=0; i < hpages; i++) {
pages[i] = addr + (i * H_PAGESIZE);
// pages[i] = addr + (i * H_PAGESIZE) + B_PAGESIZE;
nodes[i] = 0;
status[i] = -1;
flags = MPOL_MF_MOVE_ALL;
}
m_ret = numa_move_pages(0, hpages, pages, nodes, status, flags);
if (m_ret) {
perror("move_pages");
if (m_ret > 0)
printf("%ld pages not migrated\n", m_ret);
} else {
printf("Success!\n");
}
for (i=0; i < hpages; i++) {
printf("\tstatus[%d] = %d\n", i, status[i]);
status[i] = -1;
}
}
munmap(addr, hpages * H_PAGESIZE);
close(fd);
return 0;
}
next prev parent reply other threads:[~2023-09-07 0:19 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-17 18:18 Sidhartha Kumar
2023-08-18 18:03 ` Andrew Morton
2023-08-18 18:09 ` Matthew Wilcox
2023-08-18 18:34 ` Mike Kravetz
2023-08-18 18:54 ` Sidhartha Kumar
2023-08-18 19:24 ` Andrew Morton
2023-08-21 18:33 ` Mike Kravetz
2023-09-05 4:05 ` Sidhartha Kumar
2023-09-07 0:18 ` Mike Kravetz [this message]
2023-08-22 17:15 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230907001832.GA63356@monkey \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=sidhartha.kumar@oracle.com \
--cc=songmuchun@bytedance.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox