linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] mm,swap: cleanup VMA based swap readahead window calculation
@ 2024-05-31  8:12 Huang Ying
  2024-05-31  8:12 ` [PATCH 1/3] mm,swap: fix a theoretical underflow in " Huang Ying
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Huang Ying @ 2024-05-31  8:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Hugh Dickins,
	Alistair Popple, Anshuman Khandual, David Hildenbrand,
	Mel Gorman, Miaohe Lin, Minchan Kim, Ryan Roberts, Yang Shi,
	Yu Zhao, Kairui Song, Barry Song, Chris Li, Yosry Ahmed

From: "Huang Ying" <ying.huang@intel.com>

When VMA based swap readahead is introduced in commit ec560175c0b6
("mm, swap: VMA based swap readahead"), "struct vma_swap_readahead" is
defined to describe the readahead window.  Because we wanted to save
the PTE entries in the struct at that time.  But after commit
4f8fcf4ced0b ("mm/swap: swap_vma_readahead() do the
pte_offset_map()"), we no longer save PTE entries in the struct.  The
size of the struct becomes so small, that it's better to use the
fields of the struct directly.  This can simplify the code to improve
the code readability.  The line number of source code reduces too.

A theoretical underflow issue and some related code cleanup is done in
the series too.

Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/3] mm,swap: fix a theoretical underflow in readahead window calculation
  2024-05-31  8:12 [PATCH 0/3] mm,swap: cleanup VMA based swap readahead window calculation Huang Ying
@ 2024-05-31  8:12 ` Huang Ying
  2024-05-31  8:12 ` [PATCH 2/3] mm,swap: remove struct vma_swap_readahead Huang Ying
  2024-05-31  8:12 ` [PATCH 3/3] mm,swap: simplify VMA based swap readahead window calculation Huang Ying
  2 siblings, 0 replies; 4+ messages in thread
From: Huang Ying @ 2024-05-31  8:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Hugh Dickins,
	Alistair Popple, Anshuman Khandual, David Hildenbrand,
	Mel Gorman, Miaohe Lin, Minchan Kim, Ryan Roberts, Yang Shi,
	Yu Zhao, Kairui Song, Barry Song, Chris Li, Yosry Ahmed

In swap readahead window calculation, if the fault PFN is smaller than
the readahead window size, underflow may occurs.  This is only
possible in theory, because the start of the virtual address space
will not be used for anonymous pages in practice.  Even if underflow
occurs, there will be no functional bugs.  In the worst cases, some
swap entries may be swapped in incorrectly and some pages may be
allocate on the wrong nodes.

Anyway, we still needs to fix the issue via some underflow checking.

Fixes: ec560175c0b6 ("mm, swap: VMA based swap readahead")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Yosry Ahmed <yosryahmed@google.com>
---
 mm/swap_state.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 642c30d8376c..848c167df530 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -787,6 +787,8 @@ static void swap_ra_info(struct vm_fault *vmf,
 		lpfn = fpfn - left;
 		rpfn = fpfn + win - left;
 	}
+	if ((long)lpfn < 0)
+		lpfn = 0;
 	start = max3(lpfn, PFN_DOWN(vma->vm_start),
 		     PFN_DOWN(faddr & PMD_MASK));
 	end = min3(rpfn, PFN_DOWN(vma->vm_end),
-- 
2.39.2



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/3] mm,swap: remove struct vma_swap_readahead
  2024-05-31  8:12 [PATCH 0/3] mm,swap: cleanup VMA based swap readahead window calculation Huang Ying
  2024-05-31  8:12 ` [PATCH 1/3] mm,swap: fix a theoretical underflow in " Huang Ying
@ 2024-05-31  8:12 ` Huang Ying
  2024-05-31  8:12 ` [PATCH 3/3] mm,swap: simplify VMA based swap readahead window calculation Huang Ying
  2 siblings, 0 replies; 4+ messages in thread
From: Huang Ying @ 2024-05-31  8:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Hugh Dickins,
	Alistair Popple, Anshuman Khandual, David Hildenbrand,
	Mel Gorman, Miaohe Lin, Minchan Kim, Ryan Roberts, Yang Shi,
	Yu Zhao, Kairui Song, Barry Song, Chris Li, Yosry Ahmed

When VMA based swap readahead is introduced in commit
ec560175c0b6 ("mm, swap: VMA based swap readahead"), "struct
vma_swap_readahead" is defined to describe the readahead window.
Because we wanted to save the PTE entries in the struct at that time.
But after commit 4f8fcf4ced0b ("mm/swap: swap_vma_readahead() do the
pte_offset_map()"), we no longer save PTE entries in the struct.  The
size of the struct becomes so small, that it's better to use the
fields of the struct directly.  This can simplify the code to improve
the code readability.  The line number of source code reduces too.

No functionality change is expected in this patch.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Yosry Ahmed <yosryahmed@google.com>
---
 mm/swap_state.c | 48 ++++++++++++++++++++----------------------------
 1 file changed, 20 insertions(+), 28 deletions(-)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 848c167df530..e1dac70198a6 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -42,6 +42,8 @@ struct address_space *swapper_spaces[MAX_SWAPFILES] __read_mostly;
 static unsigned int nr_swapper_spaces[MAX_SWAPFILES] __read_mostly;
 static bool enable_vma_readahead __read_mostly = true;
 
+#define SWAP_RA_ORDER_CEILING	5
+
 #define SWAP_RA_WIN_SHIFT	(PAGE_SHIFT / 2)
 #define SWAP_RA_HITS_MASK	((1UL << SWAP_RA_WIN_SHIFT) - 1)
 #define SWAP_RA_HITS_MAX	SWAP_RA_HITS_MASK
@@ -738,16 +740,9 @@ void exit_swap_address_space(unsigned int type)
 	swapper_spaces[type] = NULL;
 }
 
-#define SWAP_RA_ORDER_CEILING	5
-
-struct vma_swap_readahead {
-	unsigned short win;
-	unsigned short offset;
-	unsigned short nr_pte;
-};
-
-static void swap_ra_info(struct vm_fault *vmf,
-			 struct vma_swap_readahead *ra_info)
+static unsigned short swap_vma_ra_win(struct vm_fault *vmf,
+				      unsigned short *offset,
+				      unsigned short *nr_pte)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	unsigned long ra_val;
@@ -757,10 +752,8 @@ static void swap_ra_info(struct vm_fault *vmf,
 
 	max_win = 1 << min_t(unsigned int, READ_ONCE(page_cluster),
 			     SWAP_RA_ORDER_CEILING);
-	if (max_win == 1) {
-		ra_info->win = 1;
-		return;
-	}
+	if (max_win == 1)
+		return 1;
 
 	faddr = vmf->address;
 	fpfn = PFN_DOWN(faddr);
@@ -768,12 +761,11 @@ static void swap_ra_info(struct vm_fault *vmf,
 	pfn = PFN_DOWN(SWAP_RA_ADDR(ra_val));
 	prev_win = SWAP_RA_WIN(ra_val);
 	hits = SWAP_RA_HITS(ra_val);
-	ra_info->win = win = __swapin_nr_pages(pfn, fpfn, hits,
-					       max_win, prev_win);
+	win = __swapin_nr_pages(pfn, fpfn, hits, max_win, prev_win);
 	atomic_long_set(&vma->swap_readahead_info,
 			SWAP_RA_VAL(faddr, win, 0));
 	if (win == 1)
-		return;
+		return 1;
 
 	if (fpfn == pfn + 1) {
 		lpfn = fpfn;
@@ -794,8 +786,10 @@ static void swap_ra_info(struct vm_fault *vmf,
 	end = min3(rpfn, PFN_DOWN(vma->vm_end),
 		   PFN_DOWN((faddr & PMD_MASK) + PMD_SIZE));
 
-	ra_info->nr_pte = end - start;
-	ra_info->offset = fpfn - start;
+	*nr_pte = end - start;
+	*offset = fpfn - start;
+
+	return win;
 }
 
 /**
@@ -826,19 +820,17 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
 	pgoff_t ilx;
 	unsigned int i;
 	bool page_allocated;
-	struct vma_swap_readahead ra_info = {
-		.win = 1,
-	};
+	unsigned short win, nr_pte, offset;
 
-	swap_ra_info(vmf, &ra_info);
-	if (ra_info.win == 1)
+	win = swap_vma_ra_win(vmf, &offset, &nr_pte);
+	if (win == 1)
 		goto skip;
 
-	addr = vmf->address - (ra_info.offset * PAGE_SIZE);
-	ilx = targ_ilx - ra_info.offset;
+	addr = vmf->address - offset * PAGE_SIZE;
+	ilx = targ_ilx - offset;
 
 	blk_start_plug(&plug);
-	for (i = 0; i < ra_info.nr_pte; i++, ilx++, addr += PAGE_SIZE) {
+	for (i = 0; i < nr_pte; i++, ilx++, addr += PAGE_SIZE) {
 		if (!pte++) {
 			pte = pte_offset_map(vmf->pmd, addr);
 			if (!pte)
@@ -858,7 +850,7 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
 			continue;
 		if (page_allocated) {
 			swap_read_folio(folio, false, &splug);
-			if (i != ra_info.offset) {
+			if (i != offset) {
 				folio_set_readahead(folio);
 				count_vm_event(SWAP_RA);
 			}
-- 
2.39.2



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 3/3] mm,swap: simplify VMA based swap readahead window calculation
  2024-05-31  8:12 [PATCH 0/3] mm,swap: cleanup VMA based swap readahead window calculation Huang Ying
  2024-05-31  8:12 ` [PATCH 1/3] mm,swap: fix a theoretical underflow in " Huang Ying
  2024-05-31  8:12 ` [PATCH 2/3] mm,swap: remove struct vma_swap_readahead Huang Ying
@ 2024-05-31  8:12 ` Huang Ying
  2 siblings, 0 replies; 4+ messages in thread
From: Huang Ying @ 2024-05-31  8:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Hugh Dickins,
	Alistair Popple, Anshuman Khandual, David Hildenbrand,
	Mel Gorman, Miaohe Lin, Minchan Kim, Ryan Roberts, Yang Shi,
	Yu Zhao, Kairui Song, Barry Song, Chris Li, Yosry Ahmed

Replace PFNs with addresses in readahead window calculation.  This
simplified the logic and reduce the code line number.

No functionality change is expected.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Yosry Ahmed <yosryahmed@google.com>
---
 mm/swap_state.c | 66 +++++++++++++++++++------------------------------
 1 file changed, 25 insertions(+), 41 deletions(-)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index e1dac70198a6..d2adbd7b571b 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -740,54 +740,40 @@ void exit_swap_address_space(unsigned int type)
 	swapper_spaces[type] = NULL;
 }
 
-static unsigned short swap_vma_ra_win(struct vm_fault *vmf,
-				      unsigned short *offset,
-				      unsigned short *nr_pte)
+static int swap_vma_ra_win(struct vm_fault *vmf, unsigned long *start,
+			   unsigned long *end)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	unsigned long ra_val;
-	unsigned long faddr, pfn, fpfn, lpfn, rpfn;
-	unsigned long start, end;
+	unsigned long faddr, prev_faddr, left, right;
 	unsigned int max_win, hits, prev_win, win;
 
-	max_win = 1 << min_t(unsigned int, READ_ONCE(page_cluster),
-			     SWAP_RA_ORDER_CEILING);
+	max_win = 1 << min(READ_ONCE(page_cluster), SWAP_RA_ORDER_CEILING);
 	if (max_win == 1)
 		return 1;
 
 	faddr = vmf->address;
-	fpfn = PFN_DOWN(faddr);
 	ra_val = GET_SWAP_RA_VAL(vma);
-	pfn = PFN_DOWN(SWAP_RA_ADDR(ra_val));
+	prev_faddr = SWAP_RA_ADDR(ra_val);
 	prev_win = SWAP_RA_WIN(ra_val);
 	hits = SWAP_RA_HITS(ra_val);
-	win = __swapin_nr_pages(pfn, fpfn, hits, max_win, prev_win);
-	atomic_long_set(&vma->swap_readahead_info,
-			SWAP_RA_VAL(faddr, win, 0));
+	win = __swapin_nr_pages(PFN_DOWN(prev_faddr), PFN_DOWN(faddr), hits,
+				max_win, prev_win);
+	atomic_long_set(&vma->swap_readahead_info, SWAP_RA_VAL(faddr, win, 0));
 	if (win == 1)
 		return 1;
 
-	if (fpfn == pfn + 1) {
-		lpfn = fpfn;
-		rpfn = fpfn + win;
-	} else if (pfn == fpfn + 1) {
-		lpfn = fpfn - win + 1;
-		rpfn = fpfn + 1;
-	} else {
-		unsigned int left = (win - 1) / 2;
-
-		lpfn = fpfn - left;
-		rpfn = fpfn + win - left;
-	}
-	if ((long)lpfn < 0)
-		lpfn = 0;
-	start = max3(lpfn, PFN_DOWN(vma->vm_start),
-		     PFN_DOWN(faddr & PMD_MASK));
-	end = min3(rpfn, PFN_DOWN(vma->vm_end),
-		   PFN_DOWN((faddr & PMD_MASK) + PMD_SIZE));
-
-	*nr_pte = end - start;
-	*offset = fpfn - start;
+	if (faddr == prev_faddr + PAGE_SIZE)
+		left = faddr;
+	else if (prev_faddr == faddr + PAGE_SIZE)
+		left = faddr - (win << PAGE_SHIFT) + PAGE_SIZE;
+	else
+		left = faddr - (((win - 1) / 2) << PAGE_SHIFT);
+	right = left + (win << PAGE_SHIFT);
+	if ((long)left < 0)
+		left = 0;
+	*start = max3(left, vma->vm_start, faddr & PMD_MASK);
+	*end = min3(right, vma->vm_end, (faddr & PMD_MASK) + PMD_SIZE);
 
 	return win;
 }
@@ -815,22 +801,20 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
 	struct swap_iocb *splug = NULL;
 	struct folio *folio;
 	pte_t *pte = NULL, pentry;
-	unsigned long addr;
+	int win;
+	unsigned long start, end, addr;
 	swp_entry_t entry;
 	pgoff_t ilx;
-	unsigned int i;
 	bool page_allocated;
-	unsigned short win, nr_pte, offset;
 
-	win = swap_vma_ra_win(vmf, &offset, &nr_pte);
+	win = swap_vma_ra_win(vmf, &start, &end);
 	if (win == 1)
 		goto skip;
 
-	addr = vmf->address - offset * PAGE_SIZE;
-	ilx = targ_ilx - offset;
+	ilx = targ_ilx - PFN_DOWN(vmf->address - start);
 
 	blk_start_plug(&plug);
-	for (i = 0; i < nr_pte; i++, ilx++, addr += PAGE_SIZE) {
+	for (addr = start; addr < end; ilx++, addr += PAGE_SIZE) {
 		if (!pte++) {
 			pte = pte_offset_map(vmf->pmd, addr);
 			if (!pte)
@@ -850,7 +834,7 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
 			continue;
 		if (page_allocated) {
 			swap_read_folio(folio, false, &splug);
-			if (i != offset) {
+			if (addr != vmf->address) {
 				folio_set_readahead(folio);
 				count_vm_event(SWAP_RA);
 			}
-- 
2.39.2



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-05-31  8:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-31  8:12 [PATCH 0/3] mm,swap: cleanup VMA based swap readahead window calculation Huang Ying
2024-05-31  8:12 ` [PATCH 1/3] mm,swap: fix a theoretical underflow in " Huang Ying
2024-05-31  8:12 ` [PATCH 2/3] mm,swap: remove struct vma_swap_readahead Huang Ying
2024-05-31  8:12 ` [PATCH 3/3] mm,swap: simplify VMA based swap readahead window calculation Huang Ying

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox