linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm: migrate: Fix THP's mapcount on isolation
@ 2022-11-24  9:55 Gavin Shan
  2022-11-24 10:09 ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: Gavin Shan @ 2022-11-24  9:55 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, akpm, william.kucharski, ziy, kirill.shutemov,
	david, zhenyzha, apopple, hughd, willy, shan.gavin

The issue is reported when removing memory through virtio_mem device.
The transparent huge page, experienced copy-on-write fault, is wrongly
regarded as pinned. The transparent huge page is escaped from being
isolated in isolate_migratepages_block(). The transparent huge page
can't be migrated and the corresponding memory block can't be put
into offline state.

Fix it by replacing page_mapcount() with total_mapcount(). With this,
the transparent huge page can be isolated and migrated, and the memory
block can be put into offline state. Besides, The page's refcount is
increased a bit earlier to avoid the page is released when the check
is executed.

Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations")
Cc: stable@vger.kernel.org   # v5.7+
Reported-by: Zhenyu Zhang <zhenyzha@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
v2: Corrected fix tag and increase page's refcount before the check
---
 mm/compaction.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index c51f7f545afe..1f6da31dd9a5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -984,29 +984,29 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			goto isolate_fail;
 		}
 
+		/*
+		 * Be careful not to clear PageLRU until after we're
+		 * sure the page is not being freed elsewhere -- the
+		 * page release code relies on it.
+		 */
+		if (unlikely(!get_page_unless_zero(page)))
+			goto isolate_fail;
+
 		/*
 		 * Migration will fail if an anonymous page is pinned in memory,
 		 * so avoid taking lru_lock and isolating it unnecessarily in an
 		 * admittedly racy check.
 		 */
 		mapping = page_mapping(page);
-		if (!mapping && page_count(page) > page_mapcount(page))
-			goto isolate_fail;
+		if (!mapping && (page_count(page) - 1) > total_mapcount(page))
+			goto isolate_fail_put;
 
 		/*
 		 * Only allow to migrate anonymous pages in GFP_NOFS context
 		 * because those do not depend on fs locks.
 		 */
 		if (!(cc->gfp_mask & __GFP_FS) && mapping)
-			goto isolate_fail;
-
-		/*
-		 * Be careful not to clear PageLRU until after we're
-		 * sure the page is not being freed elsewhere -- the
-		 * page release code relies on it.
-		 */
-		if (unlikely(!get_page_unless_zero(page)))
-			goto isolate_fail;
+			goto isolate_fail_put;
 
 		/* Only take pages on LRU: a check now makes later tests safe */
 		if (!PageLRU(page))
-- 
2.23.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation
  2022-11-24  9:55 [PATCH v2] mm: migrate: Fix THP's mapcount on isolation Gavin Shan
@ 2022-11-24 10:09 ` David Hildenbrand
  2022-11-24 10:21   ` Gavin Shan
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2022-11-24 10:09 UTC (permalink / raw)
  To: Gavin Shan, linux-mm
  Cc: linux-kernel, akpm, william.kucharski, ziy, kirill.shutemov,
	zhenyzha, apopple, hughd, willy, shan.gavin

On 24.11.22 10:55, Gavin Shan wrote:
> The issue is reported when removing memory through virtio_mem device.
> The transparent huge page, experienced copy-on-write fault, is wrongly
> regarded as pinned. The transparent huge page is escaped from being
> isolated in isolate_migratepages_block(). The transparent huge page
> can't be migrated and the corresponding memory block can't be put
> into offline state.
> 
> Fix it by replacing page_mapcount() with total_mapcount(). With this,
> the transparent huge page can be isolated and migrated, and the memory
> block can be put into offline state. Besides, The page's refcount is
> increased a bit earlier to avoid the page is released when the check
> is executed.

Did you look into handling pages that are in the swapcache case as well?

See is_refcount_suitable() in mm/khugepaged.c.

Should be easy to reproduce, let me know if you need inspiration.

> 
> Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations")
> Cc: stable@vger.kernel.org   # v5.7+
> Reported-by: Zhenyu Zhang <zhenyzha@redhat.com>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
> v2: Corrected fix tag and increase page's refcount before the check
> ---
>   mm/compaction.c | 22 +++++++++++-----------
>   1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index c51f7f545afe..1f6da31dd9a5 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -984,29 +984,29 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>   			goto isolate_fail;
>   		}
>   
> +		/*
> +		 * Be careful not to clear PageLRU until after we're
> +		 * sure the page is not being freed elsewhere -- the
> +		 * page release code relies on it.
> +		 */
> +		if (unlikely(!get_page_unless_zero(page)))
> +			goto isolate_fail;
> +
>   		/*
>   		 * Migration will fail if an anonymous page is pinned in memory,
>   		 * so avoid taking lru_lock and isolating it unnecessarily in an
>   		 * admittedly racy check.
>   		 */
>   		mapping = page_mapping(page);
> -		if (!mapping && page_count(page) > page_mapcount(page))
> -			goto isolate_fail;
> +		if (!mapping && (page_count(page) - 1) > total_mapcount(page))
> +			goto isolate_fail_put;
>   
>   		/*
>   		 * Only allow to migrate anonymous pages in GFP_NOFS context
>   		 * because those do not depend on fs locks.
>   		 */
>   		if (!(cc->gfp_mask & __GFP_FS) && mapping)
> -			goto isolate_fail;
> -
> -		/*
> -		 * Be careful not to clear PageLRU until after we're
> -		 * sure the page is not being freed elsewhere -- the
> -		 * page release code relies on it.
> -		 */
> -		if (unlikely(!get_page_unless_zero(page)))
> -			goto isolate_fail;
> +			goto isolate_fail_put;
>   
>   		/* Only take pages on LRU: a check now makes later tests safe */
>   		if (!PageLRU(page))

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation
  2022-11-24 10:09 ` David Hildenbrand
@ 2022-11-24 10:21   ` Gavin Shan
  2022-11-24 10:43     ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: Gavin Shan @ 2022-11-24 10:21 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, akpm, william.kucharski, ziy, kirill.shutemov,
	zhenyzha, apopple, hughd, willy, shan.gavin

On 11/24/22 6:09 PM, David Hildenbrand wrote:
> On 24.11.22 10:55, Gavin Shan wrote:
>> The issue is reported when removing memory through virtio_mem device.
>> The transparent huge page, experienced copy-on-write fault, is wrongly
>> regarded as pinned. The transparent huge page is escaped from being
>> isolated in isolate_migratepages_block(). The transparent huge page
>> can't be migrated and the corresponding memory block can't be put
>> into offline state.
>>
>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
>> the transparent huge page can be isolated and migrated, and the memory
>> block can be put into offline state. Besides, The page's refcount is
>> increased a bit earlier to avoid the page is released when the check
>> is executed.
> 
> Did you look into handling pages that are in the swapcache case as well?
> 
> See is_refcount_suitable() in mm/khugepaged.c.
> 
> Should be easy to reproduce, let me know if you need inspiration.
> 

Nope, I didn't look into the case. Please elaborate the details so that
I can reproduce it firstly.

>>
>> Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations")
>> Cc: stable@vger.kernel.org   # v5.7+
>> Reported-by: Zhenyu Zhang <zhenyzha@redhat.com>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>> v2: Corrected fix tag and increase page's refcount before the check
>> ---
>>   mm/compaction.c | 22 +++++++++++-----------
>>   1 file changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index c51f7f545afe..1f6da31dd9a5 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -984,29 +984,29 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>               goto isolate_fail;
>>           }
>> +        /*
>> +         * Be careful not to clear PageLRU until after we're
>> +         * sure the page is not being freed elsewhere -- the
>> +         * page release code relies on it.
>> +         */
>> +        if (unlikely(!get_page_unless_zero(page)))
>> +            goto isolate_fail;
>> +
>>           /*
>>            * Migration will fail if an anonymous page is pinned in memory,
>>            * so avoid taking lru_lock and isolating it unnecessarily in an
>>            * admittedly racy check.
>>            */
>>           mapping = page_mapping(page);
>> -        if (!mapping && page_count(page) > page_mapcount(page))
>> -            goto isolate_fail;
>> +        if (!mapping && (page_count(page) - 1) > total_mapcount(page))
>> +            goto isolate_fail_put;
>>           /*
>>            * Only allow to migrate anonymous pages in GFP_NOFS context
>>            * because those do not depend on fs locks.
>>            */
>>           if (!(cc->gfp_mask & __GFP_FS) && mapping)
>> -            goto isolate_fail;
>> -
>> -        /*
>> -         * Be careful not to clear PageLRU until after we're
>> -         * sure the page is not being freed elsewhere -- the
>> -         * page release code relies on it.
>> -         */
>> -        if (unlikely(!get_page_unless_zero(page)))
>> -            goto isolate_fail;
>> +            goto isolate_fail_put;
>>           /* Only take pages on LRU: a check now makes later tests safe */
>>           if (!PageLRU(page))
> 

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation
  2022-11-24 10:21   ` Gavin Shan
@ 2022-11-24 10:43     ` David Hildenbrand
  2022-11-24 12:38       ` Zi Yan
  2022-11-24 12:55       ` Gavin Shan
  0 siblings, 2 replies; 10+ messages in thread
From: David Hildenbrand @ 2022-11-24 10:43 UTC (permalink / raw)
  To: Gavin Shan, linux-mm
  Cc: linux-kernel, akpm, william.kucharski, ziy, kirill.shutemov,
	zhenyzha, apopple, hughd, willy, shan.gavin

On 24.11.22 11:21, Gavin Shan wrote:
> On 11/24/22 6:09 PM, David Hildenbrand wrote:
>> On 24.11.22 10:55, Gavin Shan wrote:
>>> The issue is reported when removing memory through virtio_mem device.
>>> The transparent huge page, experienced copy-on-write fault, is wrongly
>>> regarded as pinned. The transparent huge page is escaped from being
>>> isolated in isolate_migratepages_block(). The transparent huge page
>>> can't be migrated and the corresponding memory block can't be put
>>> into offline state.
>>>
>>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
>>> the transparent huge page can be isolated and migrated, and the memory
>>> block can be put into offline state. Besides, The page's refcount is
>>> increased a bit earlier to avoid the page is released when the check
>>> is executed.
>>
>> Did you look into handling pages that are in the swapcache case as well?
>>
>> See is_refcount_suitable() in mm/khugepaged.c.
>>
>> Should be easy to reproduce, let me know if you need inspiration.
>>
> 
> Nope, I didn't look into the case. Please elaborate the details so that
> I can reproduce it firstly.


A simple reproducer would be (on a system with ordinary swap (not zram))

1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP

2) Enable THP for that region (MADV_HUGEPAGE)

3) Populate a THP (e.g., write access)

4) PTE-map the THP, for example, using MADV_FREE on the last subpage

5) Trigger swapout of the THP, for example, using MADV_PAGEOUT

6) Read-access to some subpages to fault them in from the swapcache


Now you'd have a THP, which

1) Is partially PTE-mapped into the page table
2) Is in the swapcache (each subpage should have one reference from the 
swapache)


Now we could test, if alloc_contig_range() will still succeed (e.g., 
using virtio-mem).

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation
  2022-11-24 10:43     ` David Hildenbrand
@ 2022-11-24 12:38       ` Zi Yan
  2022-11-24 13:20         ` David Hildenbrand
  2022-11-24 12:55       ` Gavin Shan
  1 sibling, 1 reply; 10+ messages in thread
From: Zi Yan @ 2022-11-24 12:38 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Gavin Shan, linux-mm, linux-kernel, akpm, william.kucharski,
	kirill.shutemov, zhenyzha, apopple, hughd, willy, shan.gavin,
	Huang Ying

[-- Attachment #1: Type: text/plain, Size: 2358 bytes --]


On 24 Nov 2022, at 5:43, David Hildenbrand wrote:

> On 24.11.22 11:21, Gavin Shan wrote:
>> On 11/24/22 6:09 PM, David Hildenbrand wrote:
>>> On 24.11.22 10:55, Gavin Shan wrote:
>>>> The issue is reported when removing memory through virtio_mem device.
>>>> The transparent huge page, experienced copy-on-write fault, is wrongly
>>>> regarded as pinned. The transparent huge page is escaped from being
>>>> isolated in isolate_migratepages_block(). The transparent huge page
>>>> can't be migrated and the corresponding memory block can't be put
>>>> into offline state.
>>>>
>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
>>>> the transparent huge page can be isolated and migrated, and the memory
>>>> block can be put into offline state. Besides, The page's refcount is
>>>> increased a bit earlier to avoid the page is released when the check
>>>> is executed.
>>>
>>> Did you look into handling pages that are in the swapcache case as well?
>>>
>>> See is_refcount_suitable() in mm/khugepaged.c.
>>>
>>> Should be easy to reproduce, let me know if you need inspiration.
>>>
>>
>> Nope, I didn't look into the case. Please elaborate the details so that
>> I can reproduce it firstly.
>
>
> A simple reproducer would be (on a system with ordinary swap (not zram))
>
> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP
>
> 2) Enable THP for that region (MADV_HUGEPAGE)
>
> 3) Populate a THP (e.g., write access)
>
> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage
>
> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT

Added the original THP swapout code author, Ying.

At this step, the THP will be split, right?

https://elixir.bootlin.com/linux/latest/source/mm/vmscan.c#L1786

Even if a THP has PMD mapping, IIRC, it is split in the add_to_swap()
then swapped out. But I cannot find that split code now.


>
> 6) Read-access to some subpages to fault them in from the swapcache
>
>
> Now you'd have a THP, which
>
> 1) Is partially PTE-mapped into the page table
> 2) Is in the swapcache (each subpage should have one reference from the swapache)
>
>
> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem).
>
> -- 
> Thanks,
>
> David / dhildenb

--
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation
  2022-11-24 10:43     ` David Hildenbrand
  2022-11-24 12:38       ` Zi Yan
@ 2022-11-24 12:55       ` Gavin Shan
  2022-11-24 13:22         ` David Hildenbrand
  1 sibling, 1 reply; 10+ messages in thread
From: Gavin Shan @ 2022-11-24 12:55 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, akpm, william.kucharski, ziy, kirill.shutemov,
	zhenyzha, apopple, hughd, willy, shan.gavin

[-- Attachment #1: Type: text/plain, Size: 2974 bytes --]

On 11/24/22 6:43 PM, David Hildenbrand wrote:
> On 24.11.22 11:21, Gavin Shan wrote:
>> On 11/24/22 6:09 PM, David Hildenbrand wrote:
>>> On 24.11.22 10:55, Gavin Shan wrote:
>>>> The issue is reported when removing memory through virtio_mem device.
>>>> The transparent huge page, experienced copy-on-write fault, is wrongly
>>>> regarded as pinned. The transparent huge page is escaped from being
>>>> isolated in isolate_migratepages_block(). The transparent huge page
>>>> can't be migrated and the corresponding memory block can't be put
>>>> into offline state.
>>>>
>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
>>>> the transparent huge page can be isolated and migrated, and the memory
>>>> block can be put into offline state. Besides, The page's refcount is
>>>> increased a bit earlier to avoid the page is released when the check
>>>> is executed.
>>>
>>> Did you look into handling pages that are in the swapcache case as well?
>>>
>>> See is_refcount_suitable() in mm/khugepaged.c.
>>>
>>> Should be easy to reproduce, let me know if you need inspiration.
>>>
>>
>> Nope, I didn't look into the case. Please elaborate the details so that
>> I can reproduce it firstly.
> 
> 
> A simple reproducer would be (on a system with ordinary swap (not zram))
> 
> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP
> 
> 2) Enable THP for that region (MADV_HUGEPAGE)
> 
> 3) Populate a THP (e.g., write access)
> 
> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage
> 
> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT
> 
> 6) Read-access to some subpages to fault them in from the swapcache
> 
> 
> Now you'd have a THP, which
> 
> 1) Is partially PTE-mapped into the page table
> 2) Is in the swapcache (each subpage should have one reference from the swapache)
> 
> 
> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem).
> 

Thanks for the details. Step (4) and (5) can be actually combined. To swap part of
the THP (e.g. one sub-page) will force the THP to be split.

I followed your steps in the attached program, there is no issue to do memory hot-remove
through virtio-mem with or without this patch.

    # numactl -p 1 testsuite mm swap -k
    Any key to split THP
    Any key to swap sub-pages
    Any key to read the swapped sub-pages
        Page[000]: 0xffffffffffffffff
        Page[001]: 0xffffffffffffffff
          :
        Page[255]: 0xffffffffffffffff
    Any key to exit                                // hold here and the program doesn't exit

    (qemu) qom-set vm1 requested-size 0
    [  356.005396] virtio_mem virtio1: plugged size: 0x40000000
    [  356.005996] virtio_mem virtio1: requested size: 0x0
    [  356.350299] Fallback order for Node 0: 0 1
    [  356.350810] Fallback order for Node 1: 1 0
    [  356.351260] Built 2 zonelists, mobility grouping on.  Total pages: 491343
    [  356.351998] Policy zone: DMA

Thanks,
Gavin

[-- Attachment #2: swap.c --]
[-- Type: text/x-csrc, Size: 3257 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <errno.h>

#include "inc/testsuite.h"

struct swap_struct {
	int		page_size;

	int		fd;
	int		flags;
	void		*buf;
	unsigned long 	len;

	int		key_break;
};

#define SWAP_DEFAULT_SIZE	0x200000	/* 2MB */
#define SWAP_PAGE_TO_SPLIT	511
#define SWAP_PAGE_TO_SWAP	1
#define SWAP_PAGE_TO_SWAP_NUM	256

#ifndef MADV_PAGEOUT
#define MADV_PAGEOUT		21
#endif


static void usage(void)
{
	fprintf(stdout, "testsuite mm swap -l <size> -k\n");
	fprintf(stdout, "\n");
	fprintf(stdout, "-l: Length of memory to be mapped\n");
	fprintf(stdout, "-w: Length of memory to be copied-on-write\n");
	fprintf(stdout, "-k: Stop at various stages\n");
	fprintf(stdout, "\n");
}

static int swap_init_data(struct swap_struct *m)
{
	m->page_size	= getpagesize();

	m->fd		= -1;
	m->flags	= (MAP_PRIVATE | MAP_ANONYMOUS);
	m->len		= SWAP_DEFAULT_SIZE;
	m->key_break	= 0;

	return 0;
}

static int swap_handler(int argc, char **argv)
{
	struct swap_struct m;
	unsigned long *pval;
	int i, opt, ret;

	ret = swap_init_data(&m);
	if (ret)
		return ret;

	while ((opt = getopt(argc, argv, "l:w:kh")) != -1) {
		switch (opt) {
		case 'l':
			m.len = util_memory_parse_size(optarg);
			if (m.len <= SWAP_DEFAULT_SIZE) {
				fprintf(stderr, "%s: length 0x%lx less than 0x%x\n",
					__func__, m.len, SWAP_DEFAULT_SIZE);
				return -1;
			}

			break;
		case 'k':
			m.key_break = 1;
			break;
		case 'h':
			usage();
			return 0;
		}
	}

	/*
	 * Setup the area. The area should be backed up with huge pages
	 * if it suits. Write to the area to ensure the area is populted
	 * completely.
	 */
	m.buf = mmap(NULL, m.len, PROT_READ | PROT_WRITE, m.flags, m.fd, 0);
	if (m.buf == (void *)-1) {
		fprintf(stderr, "Unable do mmap()\n");
		goto out;
	}

        memset(m.buf, 0xff, m.len);

	/* Force to split the huge page */
	util_misc_key_press(m.key_break, "  ", "Any key to split THP");
	ret = madvise(m.buf + SWAP_PAGE_TO_SPLIT * m.page_size,
		      m.page_size, MADV_FREE);
	if (ret) {
		fprintf(stderr, "Error %d to split THP\n", ret); 
		goto out;
	}

	/* Swap one sub-page */
	util_misc_key_press(m.key_break, "  ", "Any key to swap sub-pages");
	ret = madvise(m.buf + SWAP_PAGE_TO_SWAP * m.page_size,
		      SWAP_PAGE_TO_SWAP_NUM * m.page_size,
		      MADV_PAGEOUT);
	if (ret) {
		fprintf(stderr, "Error %d to swap one sub-page\n", ret);
		goto out;
	}

	/* Read the swapped sub-page */
	util_misc_key_press(m.key_break, "  ", "Any key to read the swapped sub-pages");
	for (i = 0; i < SWAP_PAGE_TO_SWAP_NUM; i++) {
		pval = (unsigned long *)(m.buf + (SWAP_PAGE_TO_SWAP + i) * m.page_size);
		fprintf(stdout, "  Page[%03d]: 0x%016lx\n", i, *pval);
	}

	/* Exit */
	util_misc_key_press(m.key_break, "  ", "Any key to exit");

out:
	if (m.buf != (void *)-1)
		munmap(m.buf, m.len);
	if (m.fd > 0)
		close(m.fd);

	return 0;
}

static struct command swap_command = {
	.name		= "swap",
	.handler	= swap_handler,
	.children	= LIST_HEAD_INIT(swap_command.children),
	.link		= LIST_HEAD_INIT(swap_command.link),
};

int mm_swap_init(void)
{
	return command_add(&mm_command, &swap_command);
}


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation
  2022-11-24 12:38       ` Zi Yan
@ 2022-11-24 13:20         ` David Hildenbrand
  0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2022-11-24 13:20 UTC (permalink / raw)
  To: Zi Yan
  Cc: Gavin Shan, linux-mm, linux-kernel, akpm, william.kucharski,
	kirill.shutemov, zhenyzha, apopple, hughd, willy, shan.gavin,
	Huang Ying

On 24.11.22 13:38, Zi Yan wrote:
> 
> On 24 Nov 2022, at 5:43, David Hildenbrand wrote:
> 
>> On 24.11.22 11:21, Gavin Shan wrote:
>>> On 11/24/22 6:09 PM, David Hildenbrand wrote:
>>>> On 24.11.22 10:55, Gavin Shan wrote:
>>>>> The issue is reported when removing memory through virtio_mem device.
>>>>> The transparent huge page, experienced copy-on-write fault, is wrongly
>>>>> regarded as pinned. The transparent huge page is escaped from being
>>>>> isolated in isolate_migratepages_block(). The transparent huge page
>>>>> can't be migrated and the corresponding memory block can't be put
>>>>> into offline state.
>>>>>
>>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
>>>>> the transparent huge page can be isolated and migrated, and the memory
>>>>> block can be put into offline state. Besides, The page's refcount is
>>>>> increased a bit earlier to avoid the page is released when the check
>>>>> is executed.
>>>>
>>>> Did you look into handling pages that are in the swapcache case as well?
>>>>
>>>> See is_refcount_suitable() in mm/khugepaged.c.
>>>>
>>>> Should be easy to reproduce, let me know if you need inspiration.
>>>>
>>>
>>> Nope, I didn't look into the case. Please elaborate the details so that
>>> I can reproduce it firstly.
>>
>>
>> A simple reproducer would be (on a system with ordinary swap (not zram))
>>
>> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP
>>
>> 2) Enable THP for that region (MADV_HUGEPAGE)
>>
>> 3) Populate a THP (e.g., write access)
>>
>> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage
>>
>> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT
> 
> Added the original THP swapout code author, Ying.
> 
> At this step, the THP will be split, right?
> 
> https://elixir.bootlin.com/linux/latest/source/mm/vmscan.c#L1786
> 
> Even if a THP has PMD mapping, IIRC, it is split in the add_to_swap()
> then swapped out. But I cannot find that split code now.

I recall there was some sequence to achieve it. Maybe it was
swapping out the PMD first and not triggering a PTE-mapping first.

mm/vmscan.c:shrink_folio_list()

if (folio_test_large(folio)) {
	/* cannot split folio, skip it */
	if (!can_split_folio(folio, NULL))
		goto activate_locked;
	/*
	 * Split folios without a PMD map right
	 * away. Chances are some or all of the
	 * tail pages can be freed without IO.
	 */
	if (!folio_entire_mapcount(folio) &&
	    split_folio_to_list(folio, folio_list))
		goto activate_locked;
	}
}

So the sequence might have to be

1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP

2) Enable THP for that region (MADV_HUGEPAGE)

3) Populate a THP (e.g., write access)

4) Trigger swapout of the THP, for example, using MADV_PAGEOUT

5) Access some subpage

As we don't have PMD swap entries, we will PTE-map the
THP during try_to_unmap() IIRC.



Independent of that, the check we have here also doesn't consider
ordinary order-0 pages that might be in the swapache.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation
  2022-11-24 12:55       ` Gavin Shan
@ 2022-11-24 13:22         ` David Hildenbrand
  2022-11-24 14:09           ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2022-11-24 13:22 UTC (permalink / raw)
  To: Gavin Shan, linux-mm
  Cc: linux-kernel, akpm, william.kucharski, ziy, kirill.shutemov,
	zhenyzha, apopple, hughd, willy, shan.gavin

On 24.11.22 13:55, Gavin Shan wrote:
> On 11/24/22 6:43 PM, David Hildenbrand wrote:
>> On 24.11.22 11:21, Gavin Shan wrote:
>>> On 11/24/22 6:09 PM, David Hildenbrand wrote:
>>>> On 24.11.22 10:55, Gavin Shan wrote:
>>>>> The issue is reported when removing memory through virtio_mem device.
>>>>> The transparent huge page, experienced copy-on-write fault, is wrongly
>>>>> regarded as pinned. The transparent huge page is escaped from being
>>>>> isolated in isolate_migratepages_block(). The transparent huge page
>>>>> can't be migrated and the corresponding memory block can't be put
>>>>> into offline state.
>>>>>
>>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
>>>>> the transparent huge page can be isolated and migrated, and the memory
>>>>> block can be put into offline state. Besides, The page's refcount is
>>>>> increased a bit earlier to avoid the page is released when the check
>>>>> is executed.
>>>>
>>>> Did you look into handling pages that are in the swapcache case as well?
>>>>
>>>> See is_refcount_suitable() in mm/khugepaged.c.
>>>>
>>>> Should be easy to reproduce, let me know if you need inspiration.
>>>>
>>>
>>> Nope, I didn't look into the case. Please elaborate the details so that
>>> I can reproduce it firstly.
>>
>>
>> A simple reproducer would be (on a system with ordinary swap (not zram))
>>
>> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP
>>
>> 2) Enable THP for that region (MADV_HUGEPAGE)
>>
>> 3) Populate a THP (e.g., write access)
>>
>> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage
>>
>> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT
>>
>> 6) Read-access to some subpages to fault them in from the swapcache
>>
>>
>> Now you'd have a THP, which
>>
>> 1) Is partially PTE-mapped into the page table
>> 2) Is in the swapcache (each subpage should have one reference from the swapache)
>>
>>
>> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem).
>>
> 
> Thanks for the details. Step (4) and (5) can be actually combined. To swap part of
> the THP (e.g. one sub-page) will force the THP to be split.
> 
> I followed your steps in the attached program, there is no issue to do memory hot-remove
> through virtio-mem with or without this patch.

Interesting. But I don't really see how we could pass this check with a 
page that's in the swapcache, maybe I'm missing something else.

I'll try to see if I can reproduce it.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation
  2022-11-24 13:22         ` David Hildenbrand
@ 2022-11-24 14:09           ` David Hildenbrand
  2022-11-25  7:40             ` Zhenyu Zhang
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2022-11-24 14:09 UTC (permalink / raw)
  To: Gavin Shan, linux-mm
  Cc: linux-kernel, akpm, william.kucharski, ziy, kirill.shutemov,
	zhenyzha, apopple, hughd, willy, shan.gavin

On 24.11.22 14:22, David Hildenbrand wrote:
> On 24.11.22 13:55, Gavin Shan wrote:
>> On 11/24/22 6:43 PM, David Hildenbrand wrote:
>>> On 24.11.22 11:21, Gavin Shan wrote:
>>>> On 11/24/22 6:09 PM, David Hildenbrand wrote:
>>>>> On 24.11.22 10:55, Gavin Shan wrote:
>>>>>> The issue is reported when removing memory through virtio_mem device.
>>>>>> The transparent huge page, experienced copy-on-write fault, is wrongly
>>>>>> regarded as pinned. The transparent huge page is escaped from being
>>>>>> isolated in isolate_migratepages_block(). The transparent huge page
>>>>>> can't be migrated and the corresponding memory block can't be put
>>>>>> into offline state.
>>>>>>
>>>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
>>>>>> the transparent huge page can be isolated and migrated, and the memory
>>>>>> block can be put into offline state. Besides, The page's refcount is
>>>>>> increased a bit earlier to avoid the page is released when the check
>>>>>> is executed.
>>>>>
>>>>> Did you look into handling pages that are in the swapcache case as well?
>>>>>
>>>>> See is_refcount_suitable() in mm/khugepaged.c.
>>>>>
>>>>> Should be easy to reproduce, let me know if you need inspiration.
>>>>>
>>>>
>>>> Nope, I didn't look into the case. Please elaborate the details so that
>>>> I can reproduce it firstly.
>>>
>>>
>>> A simple reproducer would be (on a system with ordinary swap (not zram))
>>>
>>> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP
>>>
>>> 2) Enable THP for that region (MADV_HUGEPAGE)
>>>
>>> 3) Populate a THP (e.g., write access)
>>>
>>> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage
>>>
>>> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT
>>>
>>> 6) Read-access to some subpages to fault them in from the swapcache
>>>
>>>
>>> Now you'd have a THP, which
>>>
>>> 1) Is partially PTE-mapped into the page table
>>> 2) Is in the swapcache (each subpage should have one reference from the swapache)
>>>
>>>
>>> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem).
>>>
>>
>> Thanks for the details. Step (4) and (5) can be actually combined. To swap part of
>> the THP (e.g. one sub-page) will force the THP to be split.
>>
>> I followed your steps in the attached program, there is no issue to do memory hot-remove
>> through virtio-mem with or without this patch.
> 
> Interesting. But I don't really see how we could pass this check with a
> page that's in the swapcache, maybe I'm missing something else.
> 
> I'll try to see if I can reproduce it.
> 

After some unsuccessful attempts and many head-scratches, I realized 
that it's quite simple why we don't have to worry about swapcache pages 
here:

page_mapping() is != NULL for pages in the swapcache: folio_mapping() 
makes this rather obvious:

if (unlikely(folio_test_swapcache(folio))
	return swap_address_space(folio_swap_entry(folio));


I think the get_page_unless_zero() might also be a fix for the 
page_mapping() call, smells like something could blow up on concurrent 
page freeing. (what about concurrent removal from the swapcache? nobody 
knows :) )


Thanks Gavin!

Acked-by: David Hildenbrand <david@redhat.com>


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation
  2022-11-24 14:09           ` David Hildenbrand
@ 2022-11-25  7:40             ` Zhenyu Zhang
  0 siblings, 0 replies; 10+ messages in thread
From: Zhenyu Zhang @ 2022-11-25  7:40 UTC (permalink / raw)
  To: Guowen Shan, linux-mm
  Cc: linux-kernel, akpm, william.kucharski, ziy, kirill.shutemov,
	apopple, hughd, willy, shan.gavin, David Hildenbrand

With the patch applied, I'm unable to hit memory hot-remove failure in
the environment where the issue was initially found.

Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>

On Thu, Nov 24, 2022 at 10:09 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 24.11.22 14:22, David Hildenbrand wrote:
> > On 24.11.22 13:55, Gavin Shan wrote:
> >> On 11/24/22 6:43 PM, David Hildenbrand wrote:
> >>> On 24.11.22 11:21, Gavin Shan wrote:
> >>>> On 11/24/22 6:09 PM, David Hildenbrand wrote:
> >>>>> On 24.11.22 10:55, Gavin Shan wrote:
> >>>>>> The issue is reported when removing memory through virtio_mem device.
> >>>>>> The transparent huge page, experienced copy-on-write fault, is wrongly
> >>>>>> regarded as pinned. The transparent huge page is escaped from being
> >>>>>> isolated in isolate_migratepages_block(). The transparent huge page
> >>>>>> can't be migrated and the corresponding memory block can't be put
> >>>>>> into offline state.
> >>>>>>
> >>>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
> >>>>>> the transparent huge page can be isolated and migrated, and the memory
> >>>>>> block can be put into offline state. Besides, The page's refcount is
> >>>>>> increased a bit earlier to avoid the page is released when the check
> >>>>>> is executed.
> >>>>>
> >>>>> Did you look into handling pages that are in the swapcache case as well?
> >>>>>
> >>>>> See is_refcount_suitable() in mm/khugepaged.c.
> >>>>>
> >>>>> Should be easy to reproduce, let me know if you need inspiration.
> >>>>>
> >>>>
> >>>> Nope, I didn't look into the case. Please elaborate the details so that
> >>>> I can reproduce it firstly.
> >>>
> >>>
> >>> A simple reproducer would be (on a system with ordinary swap (not zram))
> >>>
> >>> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP
> >>>
> >>> 2) Enable THP for that region (MADV_HUGEPAGE)
> >>>
> >>> 3) Populate a THP (e.g., write access)
> >>>
> >>> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage
> >>>
> >>> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT
> >>>
> >>> 6) Read-access to some subpages to fault them in from the swapcache
> >>>
> >>>
> >>> Now you'd have a THP, which
> >>>
> >>> 1) Is partially PTE-mapped into the page table
> >>> 2) Is in the swapcache (each subpage should have one reference from the swapache)
> >>>
> >>>
> >>> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem).
> >>>
> >>
> >> Thanks for the details. Step (4) and (5) can be actually combined. To swap part of
> >> the THP (e.g. one sub-page) will force the THP to be split.
> >>
> >> I followed your steps in the attached program, there is no issue to do memory hot-remove
> >> through virtio-mem with or without this patch.
> >
> > Interesting. But I don't really see how we could pass this check with a
> > page that's in the swapcache, maybe I'm missing something else.
> >
> > I'll try to see if I can reproduce it.
> >
>
> After some unsuccessful attempts and many head-scratches, I realized
> that it's quite simple why we don't have to worry about swapcache pages
> here:
>
> page_mapping() is != NULL for pages in the swapcache: folio_mapping()
> makes this rather obvious:
>
> if (unlikely(folio_test_swapcache(folio))
>         return swap_address_space(folio_swap_entry(folio));
>
>
> I think the get_page_unless_zero() might also be a fix for the
> page_mapping() call, smells like something could blow up on concurrent
> page freeing. (what about concurrent removal from the swapcache? nobody
> knows :) )
>
>
> Thanks Gavin!
>
> Acked-by: David Hildenbrand <david@redhat.com>
>
>
> --
> Thanks,
>
> David / dhildenb
>



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-25  7:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-24  9:55 [PATCH v2] mm: migrate: Fix THP's mapcount on isolation Gavin Shan
2022-11-24 10:09 ` David Hildenbrand
2022-11-24 10:21   ` Gavin Shan
2022-11-24 10:43     ` David Hildenbrand
2022-11-24 12:38       ` Zi Yan
2022-11-24 13:20         ` David Hildenbrand
2022-11-24 12:55       ` Gavin Shan
2022-11-24 13:22         ` David Hildenbrand
2022-11-24 14:09           ` David Hildenbrand
2022-11-25  7:40             ` Zhenyu Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox