* [PATCH v2] mm: migrate: Fix THP's mapcount on isolation
@ 2022-11-24 9:55 Gavin Shan
2022-11-24 10:09 ` David Hildenbrand
0 siblings, 1 reply; 10+ messages in thread
From: Gavin Shan @ 2022-11-24 9:55 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel, akpm, william.kucharski, ziy, kirill.shutemov,
david, zhenyzha, apopple, hughd, willy, shan.gavin
The issue is reported when removing memory through virtio_mem device.
The transparent huge page, experienced copy-on-write fault, is wrongly
regarded as pinned. The transparent huge page is escaped from being
isolated in isolate_migratepages_block(). The transparent huge page
can't be migrated and the corresponding memory block can't be put
into offline state.
Fix it by replacing page_mapcount() with total_mapcount(). With this,
the transparent huge page can be isolated and migrated, and the memory
block can be put into offline state. Besides, The page's refcount is
increased a bit earlier to avoid the page is released when the check
is executed.
Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations")
Cc: stable@vger.kernel.org # v5.7+
Reported-by: Zhenyu Zhang <zhenyzha@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
v2: Corrected fix tag and increase page's refcount before the check
---
mm/compaction.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index c51f7f545afe..1f6da31dd9a5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -984,29 +984,29 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
goto isolate_fail;
}
+ /*
+ * Be careful not to clear PageLRU until after we're
+ * sure the page is not being freed elsewhere -- the
+ * page release code relies on it.
+ */
+ if (unlikely(!get_page_unless_zero(page)))
+ goto isolate_fail;
+
/*
* Migration will fail if an anonymous page is pinned in memory,
* so avoid taking lru_lock and isolating it unnecessarily in an
* admittedly racy check.
*/
mapping = page_mapping(page);
- if (!mapping && page_count(page) > page_mapcount(page))
- goto isolate_fail;
+ if (!mapping && (page_count(page) - 1) > total_mapcount(page))
+ goto isolate_fail_put;
/*
* Only allow to migrate anonymous pages in GFP_NOFS context
* because those do not depend on fs locks.
*/
if (!(cc->gfp_mask & __GFP_FS) && mapping)
- goto isolate_fail;
-
- /*
- * Be careful not to clear PageLRU until after we're
- * sure the page is not being freed elsewhere -- the
- * page release code relies on it.
- */
- if (unlikely(!get_page_unless_zero(page)))
- goto isolate_fail;
+ goto isolate_fail_put;
/* Only take pages on LRU: a check now makes later tests safe */
if (!PageLRU(page))
--
2.23.0
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation 2022-11-24 9:55 [PATCH v2] mm: migrate: Fix THP's mapcount on isolation Gavin Shan @ 2022-11-24 10:09 ` David Hildenbrand 2022-11-24 10:21 ` Gavin Shan 0 siblings, 1 reply; 10+ messages in thread From: David Hildenbrand @ 2022-11-24 10:09 UTC (permalink / raw) To: Gavin Shan, linux-mm Cc: linux-kernel, akpm, william.kucharski, ziy, kirill.shutemov, zhenyzha, apopple, hughd, willy, shan.gavin On 24.11.22 10:55, Gavin Shan wrote: > The issue is reported when removing memory through virtio_mem device. > The transparent huge page, experienced copy-on-write fault, is wrongly > regarded as pinned. The transparent huge page is escaped from being > isolated in isolate_migratepages_block(). The transparent huge page > can't be migrated and the corresponding memory block can't be put > into offline state. > > Fix it by replacing page_mapcount() with total_mapcount(). With this, > the transparent huge page can be isolated and migrated, and the memory > block can be put into offline state. Besides, The page's refcount is > increased a bit earlier to avoid the page is released when the check > is executed. Did you look into handling pages that are in the swapcache case as well? See is_refcount_suitable() in mm/khugepaged.c. Should be easy to reproduce, let me know if you need inspiration. > > Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations") > Cc: stable@vger.kernel.org # v5.7+ > Reported-by: Zhenyu Zhang <zhenyzha@redhat.com> > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Gavin Shan <gshan@redhat.com> > --- > v2: Corrected fix tag and increase page's refcount before the check > --- > mm/compaction.c | 22 +++++++++++----------- > 1 file changed, 11 insertions(+), 11 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index c51f7f545afe..1f6da31dd9a5 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -984,29 +984,29 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > goto isolate_fail; > } > > + /* > + * Be careful not to clear PageLRU until after we're > + * sure the page is not being freed elsewhere -- the > + * page release code relies on it. > + */ > + if (unlikely(!get_page_unless_zero(page))) > + goto isolate_fail; > + > /* > * Migration will fail if an anonymous page is pinned in memory, > * so avoid taking lru_lock and isolating it unnecessarily in an > * admittedly racy check. > */ > mapping = page_mapping(page); > - if (!mapping && page_count(page) > page_mapcount(page)) > - goto isolate_fail; > + if (!mapping && (page_count(page) - 1) > total_mapcount(page)) > + goto isolate_fail_put; > > /* > * Only allow to migrate anonymous pages in GFP_NOFS context > * because those do not depend on fs locks. > */ > if (!(cc->gfp_mask & __GFP_FS) && mapping) > - goto isolate_fail; > - > - /* > - * Be careful not to clear PageLRU until after we're > - * sure the page is not being freed elsewhere -- the > - * page release code relies on it. > - */ > - if (unlikely(!get_page_unless_zero(page))) > - goto isolate_fail; > + goto isolate_fail_put; > > /* Only take pages on LRU: a check now makes later tests safe */ > if (!PageLRU(page)) -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation 2022-11-24 10:09 ` David Hildenbrand @ 2022-11-24 10:21 ` Gavin Shan 2022-11-24 10:43 ` David Hildenbrand 0 siblings, 1 reply; 10+ messages in thread From: Gavin Shan @ 2022-11-24 10:21 UTC (permalink / raw) To: David Hildenbrand, linux-mm Cc: linux-kernel, akpm, william.kucharski, ziy, kirill.shutemov, zhenyzha, apopple, hughd, willy, shan.gavin On 11/24/22 6:09 PM, David Hildenbrand wrote: > On 24.11.22 10:55, Gavin Shan wrote: >> The issue is reported when removing memory through virtio_mem device. >> The transparent huge page, experienced copy-on-write fault, is wrongly >> regarded as pinned. The transparent huge page is escaped from being >> isolated in isolate_migratepages_block(). The transparent huge page >> can't be migrated and the corresponding memory block can't be put >> into offline state. >> >> Fix it by replacing page_mapcount() with total_mapcount(). With this, >> the transparent huge page can be isolated and migrated, and the memory >> block can be put into offline state. Besides, The page's refcount is >> increased a bit earlier to avoid the page is released when the check >> is executed. > > Did you look into handling pages that are in the swapcache case as well? > > See is_refcount_suitable() in mm/khugepaged.c. > > Should be easy to reproduce, let me know if you need inspiration. > Nope, I didn't look into the case. Please elaborate the details so that I can reproduce it firstly. >> >> Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations") >> Cc: stable@vger.kernel.org # v5.7+ >> Reported-by: Zhenyu Zhang <zhenyzha@redhat.com> >> Suggested-by: David Hildenbrand <david@redhat.com> >> Signed-off-by: Gavin Shan <gshan@redhat.com> >> --- >> v2: Corrected fix tag and increase page's refcount before the check >> --- >> mm/compaction.c | 22 +++++++++++----------- >> 1 file changed, 11 insertions(+), 11 deletions(-) >> >> diff --git a/mm/compaction.c b/mm/compaction.c >> index c51f7f545afe..1f6da31dd9a5 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -984,29 +984,29 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, >> goto isolate_fail; >> } >> + /* >> + * Be careful not to clear PageLRU until after we're >> + * sure the page is not being freed elsewhere -- the >> + * page release code relies on it. >> + */ >> + if (unlikely(!get_page_unless_zero(page))) >> + goto isolate_fail; >> + >> /* >> * Migration will fail if an anonymous page is pinned in memory, >> * so avoid taking lru_lock and isolating it unnecessarily in an >> * admittedly racy check. >> */ >> mapping = page_mapping(page); >> - if (!mapping && page_count(page) > page_mapcount(page)) >> - goto isolate_fail; >> + if (!mapping && (page_count(page) - 1) > total_mapcount(page)) >> + goto isolate_fail_put; >> /* >> * Only allow to migrate anonymous pages in GFP_NOFS context >> * because those do not depend on fs locks. >> */ >> if (!(cc->gfp_mask & __GFP_FS) && mapping) >> - goto isolate_fail; >> - >> - /* >> - * Be careful not to clear PageLRU until after we're >> - * sure the page is not being freed elsewhere -- the >> - * page release code relies on it. >> - */ >> - if (unlikely(!get_page_unless_zero(page))) >> - goto isolate_fail; >> + goto isolate_fail_put; >> /* Only take pages on LRU: a check now makes later tests safe */ >> if (!PageLRU(page)) > Thanks, Gavin ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation 2022-11-24 10:21 ` Gavin Shan @ 2022-11-24 10:43 ` David Hildenbrand 2022-11-24 12:38 ` Zi Yan 2022-11-24 12:55 ` Gavin Shan 0 siblings, 2 replies; 10+ messages in thread From: David Hildenbrand @ 2022-11-24 10:43 UTC (permalink / raw) To: Gavin Shan, linux-mm Cc: linux-kernel, akpm, william.kucharski, ziy, kirill.shutemov, zhenyzha, apopple, hughd, willy, shan.gavin On 24.11.22 11:21, Gavin Shan wrote: > On 11/24/22 6:09 PM, David Hildenbrand wrote: >> On 24.11.22 10:55, Gavin Shan wrote: >>> The issue is reported when removing memory through virtio_mem device. >>> The transparent huge page, experienced copy-on-write fault, is wrongly >>> regarded as pinned. The transparent huge page is escaped from being >>> isolated in isolate_migratepages_block(). The transparent huge page >>> can't be migrated and the corresponding memory block can't be put >>> into offline state. >>> >>> Fix it by replacing page_mapcount() with total_mapcount(). With this, >>> the transparent huge page can be isolated and migrated, and the memory >>> block can be put into offline state. Besides, The page's refcount is >>> increased a bit earlier to avoid the page is released when the check >>> is executed. >> >> Did you look into handling pages that are in the swapcache case as well? >> >> See is_refcount_suitable() in mm/khugepaged.c. >> >> Should be easy to reproduce, let me know if you need inspiration. >> > > Nope, I didn't look into the case. Please elaborate the details so that > I can reproduce it firstly. A simple reproducer would be (on a system with ordinary swap (not zram)) 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP 2) Enable THP for that region (MADV_HUGEPAGE) 3) Populate a THP (e.g., write access) 4) PTE-map the THP, for example, using MADV_FREE on the last subpage 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT 6) Read-access to some subpages to fault them in from the swapcache Now you'd have a THP, which 1) Is partially PTE-mapped into the page table 2) Is in the swapcache (each subpage should have one reference from the swapache) Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem). -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation 2022-11-24 10:43 ` David Hildenbrand @ 2022-11-24 12:38 ` Zi Yan 2022-11-24 13:20 ` David Hildenbrand 2022-11-24 12:55 ` Gavin Shan 1 sibling, 1 reply; 10+ messages in thread From: Zi Yan @ 2022-11-24 12:38 UTC (permalink / raw) To: David Hildenbrand Cc: Gavin Shan, linux-mm, linux-kernel, akpm, william.kucharski, kirill.shutemov, zhenyzha, apopple, hughd, willy, shan.gavin, Huang Ying [-- Attachment #1: Type: text/plain, Size: 2358 bytes --] On 24 Nov 2022, at 5:43, David Hildenbrand wrote: > On 24.11.22 11:21, Gavin Shan wrote: >> On 11/24/22 6:09 PM, David Hildenbrand wrote: >>> On 24.11.22 10:55, Gavin Shan wrote: >>>> The issue is reported when removing memory through virtio_mem device. >>>> The transparent huge page, experienced copy-on-write fault, is wrongly >>>> regarded as pinned. The transparent huge page is escaped from being >>>> isolated in isolate_migratepages_block(). The transparent huge page >>>> can't be migrated and the corresponding memory block can't be put >>>> into offline state. >>>> >>>> Fix it by replacing page_mapcount() with total_mapcount(). With this, >>>> the transparent huge page can be isolated and migrated, and the memory >>>> block can be put into offline state. Besides, The page's refcount is >>>> increased a bit earlier to avoid the page is released when the check >>>> is executed. >>> >>> Did you look into handling pages that are in the swapcache case as well? >>> >>> See is_refcount_suitable() in mm/khugepaged.c. >>> >>> Should be easy to reproduce, let me know if you need inspiration. >>> >> >> Nope, I didn't look into the case. Please elaborate the details so that >> I can reproduce it firstly. > > > A simple reproducer would be (on a system with ordinary swap (not zram)) > > 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP > > 2) Enable THP for that region (MADV_HUGEPAGE) > > 3) Populate a THP (e.g., write access) > > 4) PTE-map the THP, for example, using MADV_FREE on the last subpage > > 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT Added the original THP swapout code author, Ying. At this step, the THP will be split, right? https://elixir.bootlin.com/linux/latest/source/mm/vmscan.c#L1786 Even if a THP has PMD mapping, IIRC, it is split in the add_to_swap() then swapped out. But I cannot find that split code now. > > 6) Read-access to some subpages to fault them in from the swapcache > > > Now you'd have a THP, which > > 1) Is partially PTE-mapped into the page table > 2) Is in the swapcache (each subpage should have one reference from the swapache) > > > Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem). > > -- > Thanks, > > David / dhildenb -- Best Regards, Yan Zi [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 854 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation 2022-11-24 12:38 ` Zi Yan @ 2022-11-24 13:20 ` David Hildenbrand 0 siblings, 0 replies; 10+ messages in thread From: David Hildenbrand @ 2022-11-24 13:20 UTC (permalink / raw) To: Zi Yan Cc: Gavin Shan, linux-mm, linux-kernel, akpm, william.kucharski, kirill.shutemov, zhenyzha, apopple, hughd, willy, shan.gavin, Huang Ying On 24.11.22 13:38, Zi Yan wrote: > > On 24 Nov 2022, at 5:43, David Hildenbrand wrote: > >> On 24.11.22 11:21, Gavin Shan wrote: >>> On 11/24/22 6:09 PM, David Hildenbrand wrote: >>>> On 24.11.22 10:55, Gavin Shan wrote: >>>>> The issue is reported when removing memory through virtio_mem device. >>>>> The transparent huge page, experienced copy-on-write fault, is wrongly >>>>> regarded as pinned. The transparent huge page is escaped from being >>>>> isolated in isolate_migratepages_block(). The transparent huge page >>>>> can't be migrated and the corresponding memory block can't be put >>>>> into offline state. >>>>> >>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this, >>>>> the transparent huge page can be isolated and migrated, and the memory >>>>> block can be put into offline state. Besides, The page's refcount is >>>>> increased a bit earlier to avoid the page is released when the check >>>>> is executed. >>>> >>>> Did you look into handling pages that are in the swapcache case as well? >>>> >>>> See is_refcount_suitable() in mm/khugepaged.c. >>>> >>>> Should be easy to reproduce, let me know if you need inspiration. >>>> >>> >>> Nope, I didn't look into the case. Please elaborate the details so that >>> I can reproduce it firstly. >> >> >> A simple reproducer would be (on a system with ordinary swap (not zram)) >> >> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP >> >> 2) Enable THP for that region (MADV_HUGEPAGE) >> >> 3) Populate a THP (e.g., write access) >> >> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage >> >> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT > > Added the original THP swapout code author, Ying. > > At this step, the THP will be split, right? > > https://elixir.bootlin.com/linux/latest/source/mm/vmscan.c#L1786 > > Even if a THP has PMD mapping, IIRC, it is split in the add_to_swap() > then swapped out. But I cannot find that split code now. I recall there was some sequence to achieve it. Maybe it was swapping out the PMD first and not triggering a PTE-mapping first. mm/vmscan.c:shrink_folio_list() if (folio_test_large(folio)) { /* cannot split folio, skip it */ if (!can_split_folio(folio, NULL)) goto activate_locked; /* * Split folios without a PMD map right * away. Chances are some or all of the * tail pages can be freed without IO. */ if (!folio_entire_mapcount(folio) && split_folio_to_list(folio, folio_list)) goto activate_locked; } } So the sequence might have to be 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP 2) Enable THP for that region (MADV_HUGEPAGE) 3) Populate a THP (e.g., write access) 4) Trigger swapout of the THP, for example, using MADV_PAGEOUT 5) Access some subpage As we don't have PMD swap entries, we will PTE-map the THP during try_to_unmap() IIRC. Independent of that, the check we have here also doesn't consider ordinary order-0 pages that might be in the swapache. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation 2022-11-24 10:43 ` David Hildenbrand 2022-11-24 12:38 ` Zi Yan @ 2022-11-24 12:55 ` Gavin Shan 2022-11-24 13:22 ` David Hildenbrand 1 sibling, 1 reply; 10+ messages in thread From: Gavin Shan @ 2022-11-24 12:55 UTC (permalink / raw) To: David Hildenbrand, linux-mm Cc: linux-kernel, akpm, william.kucharski, ziy, kirill.shutemov, zhenyzha, apopple, hughd, willy, shan.gavin [-- Attachment #1: Type: text/plain, Size: 2974 bytes --] On 11/24/22 6:43 PM, David Hildenbrand wrote: > On 24.11.22 11:21, Gavin Shan wrote: >> On 11/24/22 6:09 PM, David Hildenbrand wrote: >>> On 24.11.22 10:55, Gavin Shan wrote: >>>> The issue is reported when removing memory through virtio_mem device. >>>> The transparent huge page, experienced copy-on-write fault, is wrongly >>>> regarded as pinned. The transparent huge page is escaped from being >>>> isolated in isolate_migratepages_block(). The transparent huge page >>>> can't be migrated and the corresponding memory block can't be put >>>> into offline state. >>>> >>>> Fix it by replacing page_mapcount() with total_mapcount(). With this, >>>> the transparent huge page can be isolated and migrated, and the memory >>>> block can be put into offline state. Besides, The page's refcount is >>>> increased a bit earlier to avoid the page is released when the check >>>> is executed. >>> >>> Did you look into handling pages that are in the swapcache case as well? >>> >>> See is_refcount_suitable() in mm/khugepaged.c. >>> >>> Should be easy to reproduce, let me know if you need inspiration. >>> >> >> Nope, I didn't look into the case. Please elaborate the details so that >> I can reproduce it firstly. > > > A simple reproducer would be (on a system with ordinary swap (not zram)) > > 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP > > 2) Enable THP for that region (MADV_HUGEPAGE) > > 3) Populate a THP (e.g., write access) > > 4) PTE-map the THP, for example, using MADV_FREE on the last subpage > > 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT > > 6) Read-access to some subpages to fault them in from the swapcache > > > Now you'd have a THP, which > > 1) Is partially PTE-mapped into the page table > 2) Is in the swapcache (each subpage should have one reference from the swapache) > > > Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem). > Thanks for the details. Step (4) and (5) can be actually combined. To swap part of the THP (e.g. one sub-page) will force the THP to be split. I followed your steps in the attached program, there is no issue to do memory hot-remove through virtio-mem with or without this patch. # numactl -p 1 testsuite mm swap -k Any key to split THP Any key to swap sub-pages Any key to read the swapped sub-pages Page[000]: 0xffffffffffffffff Page[001]: 0xffffffffffffffff : Page[255]: 0xffffffffffffffff Any key to exit // hold here and the program doesn't exit (qemu) qom-set vm1 requested-size 0 [ 356.005396] virtio_mem virtio1: plugged size: 0x40000000 [ 356.005996] virtio_mem virtio1: requested size: 0x0 [ 356.350299] Fallback order for Node 0: 0 1 [ 356.350810] Fallback order for Node 1: 1 0 [ 356.351260] Built 2 zonelists, mobility grouping on. Total pages: 491343 [ 356.351998] Policy zone: DMA Thanks, Gavin [-- Attachment #2: swap.c --] [-- Type: text/x-csrc, Size: 3257 bytes --] #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/wait.h> #include <fcntl.h> #include <sys/mman.h> #include <errno.h> #include "inc/testsuite.h" struct swap_struct { int page_size; int fd; int flags; void *buf; unsigned long len; int key_break; }; #define SWAP_DEFAULT_SIZE 0x200000 /* 2MB */ #define SWAP_PAGE_TO_SPLIT 511 #define SWAP_PAGE_TO_SWAP 1 #define SWAP_PAGE_TO_SWAP_NUM 256 #ifndef MADV_PAGEOUT #define MADV_PAGEOUT 21 #endif static void usage(void) { fprintf(stdout, "testsuite mm swap -l <size> -k\n"); fprintf(stdout, "\n"); fprintf(stdout, "-l: Length of memory to be mapped\n"); fprintf(stdout, "-w: Length of memory to be copied-on-write\n"); fprintf(stdout, "-k: Stop at various stages\n"); fprintf(stdout, "\n"); } static int swap_init_data(struct swap_struct *m) { m->page_size = getpagesize(); m->fd = -1; m->flags = (MAP_PRIVATE | MAP_ANONYMOUS); m->len = SWAP_DEFAULT_SIZE; m->key_break = 0; return 0; } static int swap_handler(int argc, char **argv) { struct swap_struct m; unsigned long *pval; int i, opt, ret; ret = swap_init_data(&m); if (ret) return ret; while ((opt = getopt(argc, argv, "l:w:kh")) != -1) { switch (opt) { case 'l': m.len = util_memory_parse_size(optarg); if (m.len <= SWAP_DEFAULT_SIZE) { fprintf(stderr, "%s: length 0x%lx less than 0x%x\n", __func__, m.len, SWAP_DEFAULT_SIZE); return -1; } break; case 'k': m.key_break = 1; break; case 'h': usage(); return 0; } } /* * Setup the area. The area should be backed up with huge pages * if it suits. Write to the area to ensure the area is populted * completely. */ m.buf = mmap(NULL, m.len, PROT_READ | PROT_WRITE, m.flags, m.fd, 0); if (m.buf == (void *)-1) { fprintf(stderr, "Unable do mmap()\n"); goto out; } memset(m.buf, 0xff, m.len); /* Force to split the huge page */ util_misc_key_press(m.key_break, " ", "Any key to split THP"); ret = madvise(m.buf + SWAP_PAGE_TO_SPLIT * m.page_size, m.page_size, MADV_FREE); if (ret) { fprintf(stderr, "Error %d to split THP\n", ret); goto out; } /* Swap one sub-page */ util_misc_key_press(m.key_break, " ", "Any key to swap sub-pages"); ret = madvise(m.buf + SWAP_PAGE_TO_SWAP * m.page_size, SWAP_PAGE_TO_SWAP_NUM * m.page_size, MADV_PAGEOUT); if (ret) { fprintf(stderr, "Error %d to swap one sub-page\n", ret); goto out; } /* Read the swapped sub-page */ util_misc_key_press(m.key_break, " ", "Any key to read the swapped sub-pages"); for (i = 0; i < SWAP_PAGE_TO_SWAP_NUM; i++) { pval = (unsigned long *)(m.buf + (SWAP_PAGE_TO_SWAP + i) * m.page_size); fprintf(stdout, " Page[%03d]: 0x%016lx\n", i, *pval); } /* Exit */ util_misc_key_press(m.key_break, " ", "Any key to exit"); out: if (m.buf != (void *)-1) munmap(m.buf, m.len); if (m.fd > 0) close(m.fd); return 0; } static struct command swap_command = { .name = "swap", .handler = swap_handler, .children = LIST_HEAD_INIT(swap_command.children), .link = LIST_HEAD_INIT(swap_command.link), }; int mm_swap_init(void) { return command_add(&mm_command, &swap_command); } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation 2022-11-24 12:55 ` Gavin Shan @ 2022-11-24 13:22 ` David Hildenbrand 2022-11-24 14:09 ` David Hildenbrand 0 siblings, 1 reply; 10+ messages in thread From: David Hildenbrand @ 2022-11-24 13:22 UTC (permalink / raw) To: Gavin Shan, linux-mm Cc: linux-kernel, akpm, william.kucharski, ziy, kirill.shutemov, zhenyzha, apopple, hughd, willy, shan.gavin On 24.11.22 13:55, Gavin Shan wrote: > On 11/24/22 6:43 PM, David Hildenbrand wrote: >> On 24.11.22 11:21, Gavin Shan wrote: >>> On 11/24/22 6:09 PM, David Hildenbrand wrote: >>>> On 24.11.22 10:55, Gavin Shan wrote: >>>>> The issue is reported when removing memory through virtio_mem device. >>>>> The transparent huge page, experienced copy-on-write fault, is wrongly >>>>> regarded as pinned. The transparent huge page is escaped from being >>>>> isolated in isolate_migratepages_block(). The transparent huge page >>>>> can't be migrated and the corresponding memory block can't be put >>>>> into offline state. >>>>> >>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this, >>>>> the transparent huge page can be isolated and migrated, and the memory >>>>> block can be put into offline state. Besides, The page's refcount is >>>>> increased a bit earlier to avoid the page is released when the check >>>>> is executed. >>>> >>>> Did you look into handling pages that are in the swapcache case as well? >>>> >>>> See is_refcount_suitable() in mm/khugepaged.c. >>>> >>>> Should be easy to reproduce, let me know if you need inspiration. >>>> >>> >>> Nope, I didn't look into the case. Please elaborate the details so that >>> I can reproduce it firstly. >> >> >> A simple reproducer would be (on a system with ordinary swap (not zram)) >> >> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP >> >> 2) Enable THP for that region (MADV_HUGEPAGE) >> >> 3) Populate a THP (e.g., write access) >> >> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage >> >> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT >> >> 6) Read-access to some subpages to fault them in from the swapcache >> >> >> Now you'd have a THP, which >> >> 1) Is partially PTE-mapped into the page table >> 2) Is in the swapcache (each subpage should have one reference from the swapache) >> >> >> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem). >> > > Thanks for the details. Step (4) and (5) can be actually combined. To swap part of > the THP (e.g. one sub-page) will force the THP to be split. > > I followed your steps in the attached program, there is no issue to do memory hot-remove > through virtio-mem with or without this patch. Interesting. But I don't really see how we could pass this check with a page that's in the swapcache, maybe I'm missing something else. I'll try to see if I can reproduce it. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation 2022-11-24 13:22 ` David Hildenbrand @ 2022-11-24 14:09 ` David Hildenbrand 2022-11-25 7:40 ` Zhenyu Zhang 0 siblings, 1 reply; 10+ messages in thread From: David Hildenbrand @ 2022-11-24 14:09 UTC (permalink / raw) To: Gavin Shan, linux-mm Cc: linux-kernel, akpm, william.kucharski, ziy, kirill.shutemov, zhenyzha, apopple, hughd, willy, shan.gavin On 24.11.22 14:22, David Hildenbrand wrote: > On 24.11.22 13:55, Gavin Shan wrote: >> On 11/24/22 6:43 PM, David Hildenbrand wrote: >>> On 24.11.22 11:21, Gavin Shan wrote: >>>> On 11/24/22 6:09 PM, David Hildenbrand wrote: >>>>> On 24.11.22 10:55, Gavin Shan wrote: >>>>>> The issue is reported when removing memory through virtio_mem device. >>>>>> The transparent huge page, experienced copy-on-write fault, is wrongly >>>>>> regarded as pinned. The transparent huge page is escaped from being >>>>>> isolated in isolate_migratepages_block(). The transparent huge page >>>>>> can't be migrated and the corresponding memory block can't be put >>>>>> into offline state. >>>>>> >>>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this, >>>>>> the transparent huge page can be isolated and migrated, and the memory >>>>>> block can be put into offline state. Besides, The page's refcount is >>>>>> increased a bit earlier to avoid the page is released when the check >>>>>> is executed. >>>>> >>>>> Did you look into handling pages that are in the swapcache case as well? >>>>> >>>>> See is_refcount_suitable() in mm/khugepaged.c. >>>>> >>>>> Should be easy to reproduce, let me know if you need inspiration. >>>>> >>>> >>>> Nope, I didn't look into the case. Please elaborate the details so that >>>> I can reproduce it firstly. >>> >>> >>> A simple reproducer would be (on a system with ordinary swap (not zram)) >>> >>> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP >>> >>> 2) Enable THP for that region (MADV_HUGEPAGE) >>> >>> 3) Populate a THP (e.g., write access) >>> >>> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage >>> >>> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT >>> >>> 6) Read-access to some subpages to fault them in from the swapcache >>> >>> >>> Now you'd have a THP, which >>> >>> 1) Is partially PTE-mapped into the page table >>> 2) Is in the swapcache (each subpage should have one reference from the swapache) >>> >>> >>> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem). >>> >> >> Thanks for the details. Step (4) and (5) can be actually combined. To swap part of >> the THP (e.g. one sub-page) will force the THP to be split. >> >> I followed your steps in the attached program, there is no issue to do memory hot-remove >> through virtio-mem with or without this patch. > > Interesting. But I don't really see how we could pass this check with a > page that's in the swapcache, maybe I'm missing something else. > > I'll try to see if I can reproduce it. > After some unsuccessful attempts and many head-scratches, I realized that it's quite simple why we don't have to worry about swapcache pages here: page_mapping() is != NULL for pages in the swapcache: folio_mapping() makes this rather obvious: if (unlikely(folio_test_swapcache(folio)) return swap_address_space(folio_swap_entry(folio)); I think the get_page_unless_zero() might also be a fix for the page_mapping() call, smells like something could blow up on concurrent page freeing. (what about concurrent removal from the swapcache? nobody knows :) ) Thanks Gavin! Acked-by: David Hildenbrand <david@redhat.com> -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation 2022-11-24 14:09 ` David Hildenbrand @ 2022-11-25 7:40 ` Zhenyu Zhang 0 siblings, 0 replies; 10+ messages in thread From: Zhenyu Zhang @ 2022-11-25 7:40 UTC (permalink / raw) To: Guowen Shan, linux-mm Cc: linux-kernel, akpm, william.kucharski, ziy, kirill.shutemov, apopple, hughd, willy, shan.gavin, David Hildenbrand With the patch applied, I'm unable to hit memory hot-remove failure in the environment where the issue was initially found. Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> On Thu, Nov 24, 2022 at 10:09 PM David Hildenbrand <david@redhat.com> wrote: > > On 24.11.22 14:22, David Hildenbrand wrote: > > On 24.11.22 13:55, Gavin Shan wrote: > >> On 11/24/22 6:43 PM, David Hildenbrand wrote: > >>> On 24.11.22 11:21, Gavin Shan wrote: > >>>> On 11/24/22 6:09 PM, David Hildenbrand wrote: > >>>>> On 24.11.22 10:55, Gavin Shan wrote: > >>>>>> The issue is reported when removing memory through virtio_mem device. > >>>>>> The transparent huge page, experienced copy-on-write fault, is wrongly > >>>>>> regarded as pinned. The transparent huge page is escaped from being > >>>>>> isolated in isolate_migratepages_block(). The transparent huge page > >>>>>> can't be migrated and the corresponding memory block can't be put > >>>>>> into offline state. > >>>>>> > >>>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this, > >>>>>> the transparent huge page can be isolated and migrated, and the memory > >>>>>> block can be put into offline state. Besides, The page's refcount is > >>>>>> increased a bit earlier to avoid the page is released when the check > >>>>>> is executed. > >>>>> > >>>>> Did you look into handling pages that are in the swapcache case as well? > >>>>> > >>>>> See is_refcount_suitable() in mm/khugepaged.c. > >>>>> > >>>>> Should be easy to reproduce, let me know if you need inspiration. > >>>>> > >>>> > >>>> Nope, I didn't look into the case. Please elaborate the details so that > >>>> I can reproduce it firstly. > >>> > >>> > >>> A simple reproducer would be (on a system with ordinary swap (not zram)) > >>> > >>> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP > >>> > >>> 2) Enable THP for that region (MADV_HUGEPAGE) > >>> > >>> 3) Populate a THP (e.g., write access) > >>> > >>> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage > >>> > >>> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT > >>> > >>> 6) Read-access to some subpages to fault them in from the swapcache > >>> > >>> > >>> Now you'd have a THP, which > >>> > >>> 1) Is partially PTE-mapped into the page table > >>> 2) Is in the swapcache (each subpage should have one reference from the swapache) > >>> > >>> > >>> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem). > >>> > >> > >> Thanks for the details. Step (4) and (5) can be actually combined. To swap part of > >> the THP (e.g. one sub-page) will force the THP to be split. > >> > >> I followed your steps in the attached program, there is no issue to do memory hot-remove > >> through virtio-mem with or without this patch. > > > > Interesting. But I don't really see how we could pass this check with a > > page that's in the swapcache, maybe I'm missing something else. > > > > I'll try to see if I can reproduce it. > > > > After some unsuccessful attempts and many head-scratches, I realized > that it's quite simple why we don't have to worry about swapcache pages > here: > > page_mapping() is != NULL for pages in the swapcache: folio_mapping() > makes this rather obvious: > > if (unlikely(folio_test_swapcache(folio)) > return swap_address_space(folio_swap_entry(folio)); > > > I think the get_page_unless_zero() might also be a fix for the > page_mapping() call, smells like something could blow up on concurrent > page freeing. (what about concurrent removal from the swapcache? nobody > knows :) ) > > > Thanks Gavin! > > Acked-by: David Hildenbrand <david@redhat.com> > > > -- > Thanks, > > David / dhildenb > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-11-25 7:40 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-11-24 9:55 [PATCH v2] mm: migrate: Fix THP's mapcount on isolation Gavin Shan 2022-11-24 10:09 ` David Hildenbrand 2022-11-24 10:21 ` Gavin Shan 2022-11-24 10:43 ` David Hildenbrand 2022-11-24 12:38 ` Zi Yan 2022-11-24 13:20 ` David Hildenbrand 2022-11-24 12:55 ` Gavin Shan 2022-11-24 13:22 ` David Hildenbrand 2022-11-24 14:09 ` David Hildenbrand 2022-11-25 7:40 ` Zhenyu Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox