* [PATCH v3 01/16] mm/huge_memory: use flush_pmd_tlb_range in move_huge_pmd
2022-07-04 13:21 [PATCH v3 00/16] A few cleanup patches for huge_memory Miaohe Lin
@ 2022-07-04 13:21 ` Miaohe Lin
2022-07-04 13:21 ` [PATCH v3 02/16] mm/huge_memory: access vm_page_prot with READ_ONCE in remove_migration_pmd Miaohe Lin
` (14 subsequent siblings)
15 siblings, 0 replies; 21+ messages in thread
From: Miaohe Lin @ 2022-07-04 13:21 UTC (permalink / raw)
To: akpm
Cc: shy828301, willy, zokeefe, songmuchun, linux-mm, linux-kernel, linmiaohe
ARCHes with special requirements for evicting THP backing TLB entries can
implement flush_pmd_tlb_range. Otherwise also, it can help optimize TLB
flush in THP regime. Using flush_pmd_tlb_range to take advantage of this
in move_huge_pmd.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Zach O'Keefe <zokeefe@google.com>
---
mm/huge_memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0243105d0cc6..f4e581eefb67 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1850,7 +1850,7 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
pmd = move_soft_dirty_pmd(pmd);
set_pmd_at(mm, new_addr, new_pmd, pmd);
if (force_flush)
- flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
+ flush_pmd_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
if (new_ptl != old_ptl)
spin_unlock(new_ptl);
spin_unlock(old_ptl);
--
2.23.0
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH v3 02/16] mm/huge_memory: access vm_page_prot with READ_ONCE in remove_migration_pmd
2022-07-04 13:21 [PATCH v3 00/16] A few cleanup patches for huge_memory Miaohe Lin
2022-07-04 13:21 ` [PATCH v3 01/16] mm/huge_memory: use flush_pmd_tlb_range in move_huge_pmd Miaohe Lin
@ 2022-07-04 13:21 ` Miaohe Lin
2022-07-04 13:21 ` [PATCH v3 03/16] mm/huge_memory: fix comment of __pud_trans_huge_lock Miaohe Lin
` (13 subsequent siblings)
15 siblings, 0 replies; 21+ messages in thread
From: Miaohe Lin @ 2022-07-04 13:21 UTC (permalink / raw)
To: akpm
Cc: shy828301, willy, zokeefe, songmuchun, linux-mm, linux-kernel, linmiaohe
vma->vm_page_prot is read lockless from the rmap_walk, it may be updated
concurrently. Using READ_ONCE to prevent the risk of reading intermediate
values.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
mm/huge_memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f4e581eefb67..a010f9ba15ce 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3309,7 +3309,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
entry = pmd_to_swp_entry(*pvmw->pmd);
get_page(new);
- pmde = pmd_mkold(mk_huge_pmd(new, vma->vm_page_prot));
+ pmde = pmd_mkold(mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot)));
if (pmd_swp_soft_dirty(*pvmw->pmd))
pmde = pmd_mksoft_dirty(pmde);
if (is_writable_migration_entry(entry))
--
2.23.0
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH v3 03/16] mm/huge_memory: fix comment of __pud_trans_huge_lock
2022-07-04 13:21 [PATCH v3 00/16] A few cleanup patches for huge_memory Miaohe Lin
2022-07-04 13:21 ` [PATCH v3 01/16] mm/huge_memory: use flush_pmd_tlb_range in move_huge_pmd Miaohe Lin
2022-07-04 13:21 ` [PATCH v3 02/16] mm/huge_memory: access vm_page_prot with READ_ONCE in remove_migration_pmd Miaohe Lin
@ 2022-07-04 13:21 ` Miaohe Lin
2022-07-04 13:21 ` [PATCH v3 04/16] mm/huge_memory: use helper touch_pud in huge_pud_set_accessed Miaohe Lin
` (12 subsequent siblings)
15 siblings, 0 replies; 21+ messages in thread
From: Miaohe Lin @ 2022-07-04 13:21 UTC (permalink / raw)
To: akpm
Cc: shy828301, willy, zokeefe, songmuchun, linux-mm, linux-kernel, linmiaohe
__pud_trans_huge_lock returns page table lock pointer if a given pud maps
a thp instead of 'true' since introduced. Fix corresponding comments.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
---
mm/huge_memory.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a010f9ba15ce..212e092d8ad0 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2007,10 +2007,10 @@ spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma)
}
/*
- * Returns true if a given pud maps a thp, false otherwise.
+ * Returns page table lock pointer if a given pud maps a thp, NULL otherwise.
*
- * Note that if it returns true, this routine returns without unlocking page
- * table lock. So callers must unlock it.
+ * Note that if it returns page table lock pointer, this routine returns without
+ * unlocking page table lock. So callers must unlock it.
*/
spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma)
{
--
2.23.0
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH v3 04/16] mm/huge_memory: use helper touch_pud in huge_pud_set_accessed
2022-07-04 13:21 [PATCH v3 00/16] A few cleanup patches for huge_memory Miaohe Lin
` (2 preceding siblings ...)
2022-07-04 13:21 ` [PATCH v3 03/16] mm/huge_memory: fix comment of __pud_trans_huge_lock Miaohe Lin
@ 2022-07-04 13:21 ` Miaohe Lin
2022-07-04 13:52 ` Muchun Song
2022-07-04 13:21 ` [PATCH v3 05/16] mm/huge_memory: use helper touch_pmd in huge_pmd_set_accessed Miaohe Lin
` (11 subsequent siblings)
15 siblings, 1 reply; 21+ messages in thread
From: Miaohe Lin @ 2022-07-04 13:21 UTC (permalink / raw)
To: akpm
Cc: shy828301, willy, zokeefe, songmuchun, linux-mm, linux-kernel, linmiaohe
Use helper touch_pud to set pud accessed to simplify the code and improve
the readability. No functional change intended.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
mm/huge_memory.c | 18 +++++-------------
1 file changed, 5 insertions(+), 13 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 212e092d8ad0..30acb3b994cf 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1285,15 +1285,15 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
static void touch_pud(struct vm_area_struct *vma, unsigned long addr,
- pud_t *pud, int flags)
+ pud_t *pud, bool write)
{
pud_t _pud;
_pud = pud_mkyoung(*pud);
- if (flags & FOLL_WRITE)
+ if (write)
_pud = pud_mkdirty(_pud);
if (pudp_set_access_flags(vma, addr & HPAGE_PUD_MASK,
- pud, _pud, flags & FOLL_WRITE))
+ pud, _pud, write))
update_mmu_cache_pud(vma, addr, pud);
}
@@ -1320,7 +1320,7 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr,
return NULL;
if (flags & FOLL_TOUCH)
- touch_pud(vma, addr, pud, flags);
+ touch_pud(vma, addr, pud, flags & FOLL_WRITE);
/*
* device mapped pages can only be returned if the
@@ -1385,21 +1385,13 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud)
{
- pud_t entry;
- unsigned long haddr;
bool write = vmf->flags & FAULT_FLAG_WRITE;
vmf->ptl = pud_lock(vmf->vma->vm_mm, vmf->pud);
if (unlikely(!pud_same(*vmf->pud, orig_pud)))
goto unlock;
- entry = pud_mkyoung(orig_pud);
- if (write)
- entry = pud_mkdirty(entry);
- haddr = vmf->address & HPAGE_PUD_MASK;
- if (pudp_set_access_flags(vmf->vma, haddr, vmf->pud, entry, write))
- update_mmu_cache_pud(vmf->vma, vmf->address, vmf->pud);
-
+ touch_pud(vmf->vma, vmf->address, vmf->pud, write);
unlock:
spin_unlock(vmf->ptl);
}
--
2.23.0
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH v3 05/16] mm/huge_memory: use helper touch_pmd in huge_pmd_set_accessed
2022-07-04 13:21 [PATCH v3 00/16] A few cleanup patches for huge_memory Miaohe Lin
` (3 preceding siblings ...)
2022-07-04 13:21 ` [PATCH v3 04/16] mm/huge_memory: use helper touch_pud in huge_pud_set_accessed Miaohe Lin
@ 2022-07-04 13:21 ` Miaohe Lin
2022-07-04 13:53 ` Muchun Song
2022-07-04 13:21 ` [PATCH v3 06/16] mm/huge_memory: rename mmun_start to haddr in remove_migration_pmd Miaohe Lin
` (10 subsequent siblings)
15 siblings, 1 reply; 21+ messages in thread
From: Miaohe Lin @ 2022-07-04 13:21 UTC (permalink / raw)
To: akpm
Cc: shy828301, willy, zokeefe, songmuchun, linux-mm, linux-kernel, linmiaohe
Use helper touch_pmd to set pmd accessed to simplify the code and improve
the readability. No functional change intended.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
mm/huge_memory.c | 22 +++++++---------------
1 file changed, 7 insertions(+), 15 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 30acb3b994cf..f9b6eb3f2215 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1121,15 +1121,15 @@ EXPORT_SYMBOL_GPL(vmf_insert_pfn_pud_prot);
#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
static void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
- pmd_t *pmd, int flags)
+ pmd_t *pmd, bool write)
{
pmd_t _pmd;
_pmd = pmd_mkyoung(*pmd);
- if (flags & FOLL_WRITE)
+ if (write)
_pmd = pmd_mkdirty(_pmd);
if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK,
- pmd, _pmd, flags & FOLL_WRITE))
+ pmd, _pmd, write))
update_mmu_cache_pmd(vma, addr, pmd);
}
@@ -1162,7 +1162,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
return NULL;
if (flags & FOLL_TOUCH)
- touch_pmd(vma, addr, pmd, flags);
+ touch_pmd(vma, addr, pmd, flags & FOLL_WRITE);
/*
* device mapped pages can only be returned if the
@@ -1399,21 +1399,13 @@ void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud)
void huge_pmd_set_accessed(struct vm_fault *vmf)
{
- pmd_t entry;
- unsigned long haddr;
bool write = vmf->flags & FAULT_FLAG_WRITE;
- pmd_t orig_pmd = vmf->orig_pmd;
vmf->ptl = pmd_lock(vmf->vma->vm_mm, vmf->pmd);
- if (unlikely(!pmd_same(*vmf->pmd, orig_pmd)))
+ if (unlikely(!pmd_same(*vmf->pmd, vmf->orig_pmd)))
goto unlock;
- entry = pmd_mkyoung(orig_pmd);
- if (write)
- entry = pmd_mkdirty(entry);
- haddr = vmf->address & HPAGE_PMD_MASK;
- if (pmdp_set_access_flags(vmf->vma, haddr, vmf->pmd, entry, write))
- update_mmu_cache_pmd(vmf->vma, vmf->address, vmf->pmd);
+ touch_pmd(vmf->vma, vmf->address, vmf->pmd, write);
unlock:
spin_unlock(vmf->ptl);
@@ -1549,7 +1541,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
return ERR_PTR(-ENOMEM);
if (flags & FOLL_TOUCH)
- touch_pmd(vma, addr, pmd, flags);
+ touch_pmd(vma, addr, pmd, flags & FOLL_WRITE);
page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT;
VM_BUG_ON_PAGE(!PageCompound(page) && !is_zone_device_page(page), page);
--
2.23.0
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH v3 06/16] mm/huge_memory: rename mmun_start to haddr in remove_migration_pmd
2022-07-04 13:21 [PATCH v3 00/16] A few cleanup patches for huge_memory Miaohe Lin
` (4 preceding siblings ...)
2022-07-04 13:21 ` [PATCH v3 05/16] mm/huge_memory: use helper touch_pmd in huge_pmd_set_accessed Miaohe Lin
@ 2022-07-04 13:21 ` Miaohe Lin
2022-07-04 13:21 ` [PATCH v3 07/16] mm/huge_memory: use helper function vma_lookup in split_huge_pages_pid Miaohe Lin
` (9 subsequent siblings)
15 siblings, 0 replies; 21+ messages in thread
From: Miaohe Lin @ 2022-07-04 13:21 UTC (permalink / raw)
To: akpm
Cc: shy828301, willy, zokeefe, songmuchun, linux-mm, linux-kernel, linmiaohe
mmun_start indicates mmu_notifier start address but there's no mmu_notifier
stuff in remove_migration_pmd. This will make it hard to get the meaning of
mmun_start. Rename it to haddr to avoid confusing readers and also imporve
readability.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
---
mm/huge_memory.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f9b6eb3f2215..f2856cfac900 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3284,7 +3284,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
struct vm_area_struct *vma = pvmw->vma;
struct mm_struct *mm = vma->vm_mm;
unsigned long address = pvmw->address;
- unsigned long mmun_start = address & HPAGE_PMD_MASK;
+ unsigned long haddr = address & HPAGE_PMD_MASK;
pmd_t pmde;
swp_entry_t entry;
@@ -3307,12 +3307,12 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
if (!is_readable_migration_entry(entry))
rmap_flags |= RMAP_EXCLUSIVE;
- page_add_anon_rmap(new, vma, mmun_start, rmap_flags);
+ page_add_anon_rmap(new, vma, haddr, rmap_flags);
} else {
page_add_file_rmap(new, vma, true);
}
VM_BUG_ON(pmd_write(pmde) && PageAnon(new) && !PageAnonExclusive(new));
- set_pmd_at(mm, mmun_start, pvmw->pmd, pmde);
+ set_pmd_at(mm, haddr, pvmw->pmd, pmde);
/* No need to invalidate - it was non-present before */
update_mmu_cache_pmd(vma, address, pvmw->pmd);
--
2.23.0
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH v3 07/16] mm/huge_memory: use helper function vma_lookup in split_huge_pages_pid
2022-07-04 13:21 [PATCH v3 00/16] A few cleanup patches for huge_memory Miaohe Lin
` (5 preceding siblings ...)
2022-07-04 13:21 ` [PATCH v3 06/16] mm/huge_memory: rename mmun_start to haddr in remove_migration_pmd Miaohe Lin
@ 2022-07-04 13:21 ` Miaohe Lin
2022-07-04 13:54 ` Muchun Song
2022-07-04 13:21 ` [PATCH v3 08/16] mm/huge_memory: use helper macro __ATTR_RW Miaohe Lin
` (8 subsequent siblings)
15 siblings, 1 reply; 21+ messages in thread
From: Miaohe Lin @ 2022-07-04 13:21 UTC (permalink / raw)
To: akpm
Cc: shy828301, willy, zokeefe, songmuchun, linux-mm, linux-kernel, linmiaohe
Use helper function vma_lookup to lookup the needed vma to simplify the
code. Minor readability improvement.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
mm/huge_memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f2856cfac900..5f5123130b28 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3045,10 +3045,10 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
* table filled with PTE-mapped THPs, each of which is distinct.
*/
for (addr = vaddr_start; addr < vaddr_end; addr += PAGE_SIZE) {
- struct vm_area_struct *vma = find_vma(mm, addr);
+ struct vm_area_struct *vma = vma_lookup(mm, addr);
struct page *page;
- if (!vma || addr < vma->vm_start)
+ if (!vma)
break;
/* skip special VMA and hugetlb VMA */
--
2.23.0
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH v3 08/16] mm/huge_memory: use helper macro __ATTR_RW
2022-07-04 13:21 [PATCH v3 00/16] A few cleanup patches for huge_memory Miaohe Lin
` (6 preceding siblings ...)
2022-07-04 13:21 ` [PATCH v3 07/16] mm/huge_memory: use helper function vma_lookup in split_huge_pages_pid Miaohe Lin
@ 2022-07-04 13:21 ` Miaohe Lin
2022-07-04 13:21 ` [PATCH v3 09/16] mm/huge_memory: fix comment in zap_huge_pud Miaohe Lin
` (7 subsequent siblings)
15 siblings, 0 replies; 21+ messages in thread
From: Miaohe Lin @ 2022-07-04 13:21 UTC (permalink / raw)
To: akpm
Cc: shy828301, willy, zokeefe, songmuchun, linux-mm, linux-kernel, linmiaohe
Use helper macro __ATTR_RW to define use_zero_page_attr, defrag_attr and
enabled_attr to make code more clear. Minor readability improvement.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
---
mm/huge_memory.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5f5123130b28..32a45a1e98b7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -277,8 +277,8 @@ static ssize_t enabled_store(struct kobject *kobj,
}
return ret;
}
-static struct kobj_attribute enabled_attr =
- __ATTR(enabled, 0644, enabled_show, enabled_store);
+
+static struct kobj_attribute enabled_attr = __ATTR_RW(enabled);
ssize_t single_hugepage_flag_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf,
@@ -367,8 +367,7 @@ static ssize_t defrag_store(struct kobject *kobj,
return count;
}
-static struct kobj_attribute defrag_attr =
- __ATTR(defrag, 0644, defrag_show, defrag_store);
+static struct kobj_attribute defrag_attr = __ATTR_RW(defrag);
static ssize_t use_zero_page_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
@@ -382,8 +381,7 @@ static ssize_t use_zero_page_store(struct kobject *kobj,
return single_hugepage_flag_store(kobj, attr, buf, count,
TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG);
}
-static struct kobj_attribute use_zero_page_attr =
- __ATTR(use_zero_page, 0644, use_zero_page_show, use_zero_page_store);
+static struct kobj_attribute use_zero_page_attr = __ATTR_RW(use_zero_page);
static ssize_t hpage_pmd_size_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
--
2.23.0
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH v3 09/16] mm/huge_memory: fix comment in zap_huge_pud
2022-07-04 13:21 [PATCH v3 00/16] A few cleanup patches for huge_memory Miaohe Lin
` (7 preceding siblings ...)
2022-07-04 13:21 ` [PATCH v3 08/16] mm/huge_memory: use helper macro __ATTR_RW Miaohe Lin
@ 2022-07-04 13:21 ` Miaohe Lin
2022-07-04 13:21 ` [PATCH v3 10/16] mm/huge_memory: check pmd_present first in is_huge_zero_pmd Miaohe Lin
` (6 subsequent siblings)
15 siblings, 0 replies; 21+ messages in thread
From: Miaohe Lin @ 2022-07-04 13:21 UTC (permalink / raw)
To: akpm
Cc: shy828301, willy, zokeefe, songmuchun, linux-mm, linux-kernel, linmiaohe
The comment about deposited pgtable is borrowed from zap_huge_pmd but
there's no deposited pgtable stuff for huge pud in zap_huge_pud. Remove
it to avoid confusion.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
mm/huge_memory.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 32a45a1e98b7..8a40dc8edb7a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2014,12 +2014,7 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma,
ptl = __pud_trans_huge_lock(pud, vma);
if (!ptl)
return 0;
- /*
- * For architectures like ppc64 we look at deposited pgtable
- * when calling pudp_huge_get_and_clear. So do the
- * pgtable_trans_huge_withdraw after finishing pudp related
- * operations.
- */
+
pudp_huge_get_and_clear_full(tlb->mm, addr, pud, tlb->fullmm);
tlb_remove_pud_tlb_entry(tlb, pud, addr);
if (vma_is_special_huge(vma)) {
--
2.23.0
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH v3 10/16] mm/huge_memory: check pmd_present first in is_huge_zero_pmd
2022-07-04 13:21 [PATCH v3 00/16] A few cleanup patches for huge_memory Miaohe Lin
` (8 preceding siblings ...)
2022-07-04 13:21 ` [PATCH v3 09/16] mm/huge_memory: fix comment in zap_huge_pud Miaohe Lin
@ 2022-07-04 13:21 ` Miaohe Lin
2022-07-04 13:21 ` [PATCH v3 11/16] mm/huge_memory: try to free subpage in swapcache when possible Miaohe Lin
` (5 subsequent siblings)
15 siblings, 0 replies; 21+ messages in thread
From: Miaohe Lin @ 2022-07-04 13:21 UTC (permalink / raw)
To: akpm
Cc: shy828301, willy, zokeefe, songmuchun, linux-mm, linux-kernel, linmiaohe
When pmd is non-present, pmd_pfn returns an insane value. So we should
check pmd_present first to avoid acquiring such insane value and also
avoid touching possible cold huge_zero_pfn cache line when pmd isn't
present.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
---
include/linux/huge_mm.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index ae3d8e2fd9e2..12b297f9951d 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -273,7 +273,7 @@ static inline bool is_huge_zero_page(struct page *page)
static inline bool is_huge_zero_pmd(pmd_t pmd)
{
- return READ_ONCE(huge_zero_pfn) == pmd_pfn(pmd) && pmd_present(pmd);
+ return pmd_present(pmd) && READ_ONCE(huge_zero_pfn) == pmd_pfn(pmd);
}
static inline bool is_huge_zero_pud(pud_t pud)
--
2.23.0
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH v3 11/16] mm/huge_memory: try to free subpage in swapcache when possible
2022-07-04 13:21 [PATCH v3 00/16] A few cleanup patches for huge_memory Miaohe Lin
` (9 preceding siblings ...)
2022-07-04 13:21 ` [PATCH v3 10/16] mm/huge_memory: check pmd_present first in is_huge_zero_pmd Miaohe Lin
@ 2022-07-04 13:21 ` Miaohe Lin
2022-07-04 13:21 ` [PATCH v3 12/16] mm/huge_memory: minor cleanup for split_huge_pages_all Miaohe Lin
` (4 subsequent siblings)
15 siblings, 0 replies; 21+ messages in thread
From: Miaohe Lin @ 2022-07-04 13:21 UTC (permalink / raw)
To: akpm
Cc: shy828301, willy, zokeefe, songmuchun, linux-mm, linux-kernel, linmiaohe
Subpages in swapcache won't be freed even if it is the last user of the
page until next time reclaim. It shouldn't hurt indeed, but we could try
to free these pages to save more memory for system.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
mm/huge_memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8a40dc8edb7a..6d95751ebfc9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2643,7 +2643,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
* requires taking the lru_lock so we do the put_page
* of the tail pages after the split is complete.
*/
- put_page(subpage);
+ free_page_and_swap_cache(subpage);
}
}
--
2.23.0
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH v3 12/16] mm/huge_memory: minor cleanup for split_huge_pages_all
2022-07-04 13:21 [PATCH v3 00/16] A few cleanup patches for huge_memory Miaohe Lin
` (10 preceding siblings ...)
2022-07-04 13:21 ` [PATCH v3 11/16] mm/huge_memory: try to free subpage in swapcache when possible Miaohe Lin
@ 2022-07-04 13:21 ` Miaohe Lin
2022-07-04 13:21 ` [PATCH v3 13/16] mm/huge_memory: fix comment of page_deferred_list Miaohe Lin
` (3 subsequent siblings)
15 siblings, 0 replies; 21+ messages in thread
From: Miaohe Lin @ 2022-07-04 13:21 UTC (permalink / raw)
To: akpm
Cc: shy828301, willy, zokeefe, songmuchun, linux-mm, linux-kernel, linmiaohe
There is nothing to do if a zone doesn't have any pages managed by the
buddy allocator. So we should check managed_zone instead. Also if a thp
is found, there's no need to traverse the subpages again.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
mm/huge_memory.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 6d95751ebfc9..77be7dec1420 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2961,9 +2961,12 @@ static void split_huge_pages_all(void)
unsigned long total = 0, split = 0;
pr_debug("Split all THPs\n");
- for_each_populated_zone(zone) {
+ for_each_zone(zone) {
+ if (!managed_zone(zone))
+ continue;
max_zone_pfn = zone_end_pfn(zone);
for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) {
+ int nr_pages;
if (!pfn_valid(pfn))
continue;
@@ -2979,8 +2982,10 @@ static void split_huge_pages_all(void)
total++;
lock_page(page);
+ nr_pages = thp_nr_pages(page);
if (!split_huge_page(page))
split++;
+ pfn += nr_pages - 1;
unlock_page(page);
next:
put_page(page);
--
2.23.0
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH v3 13/16] mm/huge_memory: fix comment of page_deferred_list
2022-07-04 13:21 [PATCH v3 00/16] A few cleanup patches for huge_memory Miaohe Lin
` (11 preceding siblings ...)
2022-07-04 13:21 ` [PATCH v3 12/16] mm/huge_memory: minor cleanup for split_huge_pages_all Miaohe Lin
@ 2022-07-04 13:21 ` Miaohe Lin
2022-07-04 13:55 ` Muchun Song
2022-07-04 13:21 ` [PATCH v3 14/16] mm/huge_memory: correct comment of prep_transhuge_page Miaohe Lin
` (2 subsequent siblings)
15 siblings, 1 reply; 21+ messages in thread
From: Miaohe Lin @ 2022-07-04 13:21 UTC (permalink / raw)
To: akpm
Cc: shy828301, willy, zokeefe, songmuchun, linux-mm, linux-kernel, linmiaohe
The current comment is confusing because if global or memcg deferred list
in the second tail page is occupied by compound_head, why we still use
page[2].deferred_list here? I think it wants to say that Global or memcg
deferred list in the first tail page is occupied by compound_mapcount and
compound_pincount so we use the second tail page's deferred_list instead.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
include/linux/huge_mm.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 12b297f9951d..37f2f11a6d7e 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -294,8 +294,8 @@ static inline bool thp_migration_supported(void)
static inline struct list_head *page_deferred_list(struct page *page)
{
/*
- * Global or memcg deferred list in the second tail pages is
- * occupied by compound_head.
+ * See organization of tail pages of compound page in
+ * "struct page" definition.
*/
return &page[2].deferred_list;
}
--
2.23.0
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v3 13/16] mm/huge_memory: fix comment of page_deferred_list
2022-07-04 13:21 ` [PATCH v3 13/16] mm/huge_memory: fix comment of page_deferred_list Miaohe Lin
@ 2022-07-04 13:55 ` Muchun Song
0 siblings, 0 replies; 21+ messages in thread
From: Muchun Song @ 2022-07-04 13:55 UTC (permalink / raw)
To: Miaohe Lin; +Cc: akpm, shy828301, willy, zokeefe, linux-mm, linux-kernel
On Mon, Jul 04, 2022 at 09:21:58PM +0800, Miaohe Lin wrote:
> The current comment is confusing because if global or memcg deferred list
> in the second tail page is occupied by compound_head, why we still use
> page[2].deferred_list here? I think it wants to say that Global or memcg
> deferred list in the first tail page is occupied by compound_mapcount and
> compound_pincount so we use the second tail page's deferred_list instead.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Thanks.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 14/16] mm/huge_memory: correct comment of prep_transhuge_page
2022-07-04 13:21 [PATCH v3 00/16] A few cleanup patches for huge_memory Miaohe Lin
` (12 preceding siblings ...)
2022-07-04 13:21 ` [PATCH v3 13/16] mm/huge_memory: fix comment of page_deferred_list Miaohe Lin
@ 2022-07-04 13:21 ` Miaohe Lin
2022-07-04 13:22 ` [PATCH v3 15/16] mm/huge_memory: comment the subtly logic in __split_huge_pmd Miaohe Lin
2022-07-04 13:22 ` [PATCH v3 16/16] mm/huge_memory: use helper macro IS_ERR_OR_NULL in split_huge_pages_pid Miaohe Lin
15 siblings, 0 replies; 21+ messages in thread
From: Miaohe Lin @ 2022-07-04 13:21 UTC (permalink / raw)
To: akpm
Cc: shy828301, willy, zokeefe, songmuchun, linux-mm, linux-kernel, linmiaohe
We use page->mapping and page->index, instead of page->indexlru in second
tail page as list_head. Correct it.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
---
mm/huge_memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 77be7dec1420..36f3fc2e7306 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -682,7 +682,7 @@ static inline void split_queue_unlock_irqrestore(struct deferred_split *queue,
void prep_transhuge_page(struct page *page)
{
/*
- * we use page->mapping and page->indexlru in second tail page
+ * we use page->mapping and page->index in second tail page
* as list_head: assuming THP order >= 2
*/
--
2.23.0
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH v3 15/16] mm/huge_memory: comment the subtly logic in __split_huge_pmd
2022-07-04 13:21 [PATCH v3 00/16] A few cleanup patches for huge_memory Miaohe Lin
` (13 preceding siblings ...)
2022-07-04 13:21 ` [PATCH v3 14/16] mm/huge_memory: correct comment of prep_transhuge_page Miaohe Lin
@ 2022-07-04 13:22 ` Miaohe Lin
2022-07-04 13:22 ` [PATCH v3 16/16] mm/huge_memory: use helper macro IS_ERR_OR_NULL in split_huge_pages_pid Miaohe Lin
15 siblings, 0 replies; 21+ messages in thread
From: Miaohe Lin @ 2022-07-04 13:22 UTC (permalink / raw)
To: akpm
Cc: shy828301, willy, zokeefe, songmuchun, linux-mm, linux-kernel, linmiaohe
It's dangerous and wrong to call page_folio(pmd_page(*pmd)) when pmd isn't
present. But the caller guarantees pmd is present when folio is set. So we
should be safe here. Add comment to make it clear.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
mm/huge_memory.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 36f3fc2e7306..8380912b39fd 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2336,6 +2336,10 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
is_pmd_migration_entry(*pmd)) {
+ /*
+ * It's safe to call pmd_page when folio is set because it's
+ * guaranteed that pmd is present.
+ */
if (folio && folio != page_folio(pmd_page(*pmd)))
goto out;
__split_huge_pmd_locked(vma, pmd, range.start, freeze);
--
2.23.0
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH v3 16/16] mm/huge_memory: use helper macro IS_ERR_OR_NULL in split_huge_pages_pid
2022-07-04 13:21 [PATCH v3 00/16] A few cleanup patches for huge_memory Miaohe Lin
` (14 preceding siblings ...)
2022-07-04 13:22 ` [PATCH v3 15/16] mm/huge_memory: comment the subtly logic in __split_huge_pmd Miaohe Lin
@ 2022-07-04 13:22 ` Miaohe Lin
15 siblings, 0 replies; 21+ messages in thread
From: Miaohe Lin @ 2022-07-04 13:22 UTC (permalink / raw)
To: akpm
Cc: shy828301, willy, zokeefe, songmuchun, linux-mm, linux-kernel, linmiaohe
Use helper macro IS_ERR_OR_NULL to check the validity of page to simplify
the code. Minor readability improvement.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
---
mm/huge_memory.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8380912b39fd..fd9d502aadc4 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3062,9 +3062,7 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
/* FOLL_DUMP to ignore special (like zero) pages */
page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
- if (IS_ERR(page))
- continue;
- if (!page || is_zone_device_page(page))
+ if (IS_ERR_OR_NULL(page) || is_zone_device_page(page))
continue;
if (!is_transparent_hugepage(page))
--
2.23.0
^ permalink raw reply [flat|nested] 21+ messages in thread