From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua1-f71.google.com (mail-ua1-f71.google.com [209.85.222.71]) by kanga.kvack.org (Postfix) with ESMTP id 7C82D6B02CF for ; Wed, 31 Oct 2018 02:18:36 -0400 (EDT) Received: by mail-ua1-f71.google.com with SMTP id y3so805841uao.23 for ; Tue, 30 Oct 2018 23:18:36 -0700 (PDT) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id 188sor6908741vsi.53.2018.10.30.23.18.35 for (Google Transport Security); Tue, 30 Oct 2018 23:18:35 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <1539621759-5967-4-git-send-email-schwidefsky@de.ibm.com> References: <1539621759-5967-1-git-send-email-schwidefsky@de.ibm.com> <1539621759-5967-4-git-send-email-schwidefsky@de.ibm.com> From: Li Wang Date: Wed, 31 Oct 2018 14:18:33 +0800 Message-ID: Subject: Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes Content-Type: multipart/alternative; boundary="000000000000a832be0579804529" Sender: owner-linux-mm@kvack.org List-ID: To: Martin Schwidefsky Cc: Guenter Roeck , Janosch Frank , "Kirill A. Shutemov" , Heiko Carstens , linux-kernel , Linux-MM --000000000000a832be0579804529 Content-Type: text/plain; charset="UTF-8" On Tue, Oct 16, 2018 at 12:42 AM, Martin Schwidefsky wrote: > In case a fork or a clone system fails in copy_process and the error > handling does the mmput() at the bad_fork_cleanup_mm label, the > following warning messages will appear on the console: > > BUG: non-zero pgtables_bytes on freeing mm: 16384 > > The reason for that is the tricks we play with mm_inc_nr_puds() and > mm_inc_nr_pmds() in init_new_context(). > > A normal 64-bit process has 3 levels of page table, the p4d level and > the pud level are folded. On process termination the free_pud_range() > function in mm/memory.c will subtract 16KB from pgtable_bytes with a > mm_dec_nr_puds() call, but there actually is not really a pud table. > > One issue with this is the fact that pgtable_bytes is usually off > by a few kilobytes, but the more severe problem is that for a failed > fork or clone the free_pgtables() function is not called. In this case > there is no mm_dec_nr_puds() or mm_dec_nr_pmds() that go together with > the mm_inc_nr_puds() and mm_inc_nr_pmds in init_new_context(). > The pgtable_bytes will be off by 16384 or 32768 bytes and we get the > BUG message. The message itself is purely cosmetic, but annoying. > > To fix this override the mm_pmd_folded, mm_pud_folded and mm_p4d_folded > function to check for the true size of the address space. > I can confirm that it works to the problem, the warning message is gone after applying this patch on s390x. And I also done ltp syscalls/cve test for the patch set on x86_64 arch, there has no new regression. Tested-by: Li Wang > Reported-by: Li Wang > Signed-off-by: Martin Schwidefsky > --- > arch/s390/include/asm/mmu_context.h | 5 ----- > arch/s390/include/asm/pgalloc.h | 6 +++--- > arch/s390/include/asm/pgtable.h | 18 ++++++++++++++++++ > arch/s390/include/asm/tlb.h | 6 +++--- > 4 files changed, 24 insertions(+), 11 deletions(-) > > diff --git a/arch/s390/include/asm/mmu_context.h > b/arch/s390/include/asm/mmu_context.h > index 0717ee76885d..f1ab9420ccfb 100644 > --- a/arch/s390/include/asm/mmu_context.h > +++ b/arch/s390/include/asm/mmu_context.h > @@ -45,8 +45,6 @@ static inline int init_new_context(struct task_struct > *tsk, > mm->context.asce_limit = STACK_TOP_MAX; > mm->context.asce = __pa(mm->pgd) | _ASCE_TABLE_LENGTH | > _ASCE_USER_BITS | _ASCE_TYPE_REGION3; > - /* pgd_alloc() did not account this pud */ > - mm_inc_nr_puds(mm); > break; > case -PAGE_SIZE: > /* forked 5-level task, set new asce with new_mm->pgd */ > @@ -62,9 +60,6 @@ static inline int init_new_context(struct task_struct > *tsk, > /* forked 2-level compat task, set new asce with new > mm->pgd */ > mm->context.asce = __pa(mm->pgd) | _ASCE_TABLE_LENGTH | > _ASCE_USER_BITS | _ASCE_TYPE_SEGMENT; > - /* pgd_alloc() did not account this pmd */ > - mm_inc_nr_pmds(mm); > - mm_inc_nr_puds(mm); > } > crst_table_init((unsigned long *) mm->pgd, pgd_entry_type(mm)); > return 0; > diff --git a/arch/s390/include/asm/pgalloc.h > b/arch/s390/include/asm/pgalloc.h > index f0f9bcf94c03..5ee733720a57 100644 > --- a/arch/s390/include/asm/pgalloc.h > +++ b/arch/s390/include/asm/pgalloc.h > @@ -36,11 +36,11 @@ static inline void crst_table_init(unsigned long > *crst, unsigned long entry) > > static inline unsigned long pgd_entry_type(struct mm_struct *mm) > { > - if (mm->context.asce_limit <= _REGION3_SIZE) > + if (mm_pmd_folded(mm)) > return _SEGMENT_ENTRY_EMPTY; > - if (mm->context.asce_limit <= _REGION2_SIZE) > + if (mm_pud_folded(mm)) > return _REGION3_ENTRY_EMPTY; > - if (mm->context.asce_limit <= _REGION1_SIZE) > + if (mm_p4d_folded(mm)) > return _REGION2_ENTRY_EMPTY; > return _REGION1_ENTRY_EMPTY; > } > diff --git a/arch/s390/include/asm/pgtable.h > b/arch/s390/include/asm/pgtable.h > index 0e7cb0dc9c33..de05466ce50c 100644 > --- a/arch/s390/include/asm/pgtable.h > +++ b/arch/s390/include/asm/pgtable.h > @@ -485,6 +485,24 @@ static inline int is_module_addr(void *addr) > _REGION_ENTRY_PROTECT | \ > _REGION_ENTRY_NOEXEC) > > +static inline bool mm_p4d_folded(struct mm_struct *mm) > +{ > + return mm->context.asce_limit <= _REGION1_SIZE; > +} > +#define mm_p4d_folded(mm) mm_p4d_folded(mm) > + > +static inline bool mm_pud_folded(struct mm_struct *mm) > +{ > + return mm->context.asce_limit <= _REGION2_SIZE; > +} > +#define mm_pud_folded(mm) mm_pud_folded(mm) > + > +static inline bool mm_pmd_folded(struct mm_struct *mm) > +{ > + return mm->context.asce_limit <= _REGION3_SIZE; > +} > +#define mm_pmd_folded(mm) mm_pmd_folded(mm) > + > static inline int mm_has_pgste(struct mm_struct *mm) > { > #ifdef CONFIG_PGSTE > diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h > index 457b7ba0fbb6..b31c779cf581 100644 > --- a/arch/s390/include/asm/tlb.h > +++ b/arch/s390/include/asm/tlb.h > @@ -136,7 +136,7 @@ static inline void pte_free_tlb(struct mmu_gather > *tlb, pgtable_t pte, > static inline void pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd, > unsigned long address) > { > - if (tlb->mm->context.asce_limit <= _REGION3_SIZE) > + if (mm_pmd_folded(tlb->mm)) > return; > pgtable_pmd_page_dtor(virt_to_page(pmd)); > tlb_remove_table(tlb, pmd); > @@ -152,7 +152,7 @@ static inline void pmd_free_tlb(struct mmu_gather > *tlb, pmd_t *pmd, > static inline void p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d, > unsigned long address) > { > - if (tlb->mm->context.asce_limit <= _REGION1_SIZE) > + if (mm_p4d_folded(tlb->mm)) > return; > tlb_remove_table(tlb, p4d); > } > @@ -167,7 +167,7 @@ static inline void p4d_free_tlb(struct mmu_gather > *tlb, p4d_t *p4d, > static inline void pud_free_tlb(struct mmu_gather *tlb, pud_t *pud, > unsigned long address) > { > - if (tlb->mm->context.asce_limit <= _REGION2_SIZE) > + if (mm_pud_folded(tlb->mm)) > return; > tlb_remove_table(tlb, pud); > } > -- > 2.16.4 > > -- Regards, Li Wang --000000000000a832be0579804529 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Tue, Oct 16, 2018 at 12:42 AM, Martin Schwidefsky <schwidef= sky@de.ibm.com> wrote:
In case a fork or a clone system fails in copy_process and t= he error
handling does the mmput() at the bad_fork_cleanup_mm label, the
following warning messages will appear on the console:

=C2=A0 BUG: non-zero pgtables_bytes on freeing mm: 16384

The reason for that is the tricks we play with mm_inc_nr_puds() and
mm_inc_nr_pmds() in init_new_context().

A normal 64-bit process has 3 levels of page table, the p4d level and
the pud level are folded. On process termination the free_pud_range()
function in mm/memory.c will subtract 16KB from pgtable_bytes with a
mm_dec_nr_puds() call, but there actually is not really a pud table.

One issue with this is the fact that pgtable_bytes is usually off
by a few kilobytes, but the more severe problem is that for a failed
fork or clone the free_pgtables() function is not called. In this case
there is no mm_dec_nr_puds() or mm_dec_nr_pmds() that go together with
the mm_inc_nr_puds() and mm_inc_nr_pmds in init_new_context().
The pgtable_bytes will be off by 16384 or 32768 bytes and we get the
BUG message. The message itself is purely cosmetic, but annoying.

To fix this override the mm_pmd_folded, mm_pud_folded and mm_p4d_folded
function to check for the true size of the address space.
<= div>
I= can confirm that it works to the problem, the warning message is gone afte= r applying this patch on s390x. And I also done ltp syscalls/cve test for t= he patch set on x86_64 arch, there has no new regression.
<= br>
Tested= -by: Li Wang <liwang@redhat.com>

<= /div>

Reported-by: Li Wang <liwang@= redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
=C2=A0arch/s390/include/asm/mmu_context.h |=C2=A0 5 -----
=C2=A0arch/s390/include/asm/pgalloc.h=C2=A0 =C2=A0 =C2=A0|=C2=A0 6 +++= ---
=C2=A0arch/s390/include/asm/pgtable.h=C2=A0 =C2=A0 =C2=A0| 18 ++++++++= ++++++++++
=C2=A0arch/s390/include/asm/tlb.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 = 6 +++---
=C2=A04 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/a= sm/mmu_context.h
index 0717ee76885d..f1ab9420ccfb 100644
--- a/arch/s390/include/asm/mmu_context.h
+++ b/arch/s390/include/asm/mmu_context.h
@@ -45,8 +45,6 @@ static inline int init_new_context(struct task_struct *ts= k,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mm->context.asce= _limit =3D STACK_TOP_MAX;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mm->context.asce= =3D __pa(mm->pgd) | _ASCE_TABLE_LENGTH |
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0_ASCE_USER_BITS | _ASCE= _TYPE_REGION3;
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* pgd_alloc() did = not account this pud */
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mm_inc_nr_puds(mm);=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 break;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 case -PAGE_SIZE:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* forked 5-level t= ask, set new asce with new_mm->pgd */
@@ -62,9 +60,6 @@ static inline int init_new_context(struct task_struct *ts= k,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* forked 2-level c= ompat task, set new asce with new mm->pgd */
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mm->context.asce= =3D __pa(mm->pgd) | _ASCE_TABLE_LENGTH |
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0_ASCE_USER_BITS | _ASCE= _TYPE_SEGMENT;
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* pgd_alloc() did = not account this pmd */
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mm_inc_nr_pmds(mm);=
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mm_inc_nr_puds(mm);=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
=C2=A0 =C2=A0 =C2=A0 =C2=A0 crst_table_init((unsigned long *) mm->pgd, p= gd_entry_type(mm));
=C2=A0 =C2=A0 =C2=A0 =C2=A0 return 0;
diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/p= galloc.h
index f0f9bcf94c03..5ee733720a57 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -36,11 +36,11 @@ static inline void crst_table_init(unsigned long *crst,= unsigned long entry)

=C2=A0static inline unsigned long pgd_entry_type(struct mm_struct *mm)
=C2=A0{
-=C2=A0 =C2=A0 =C2=A0 =C2=A0if (mm->context.asce_limit <=3D _REGION3_= SIZE)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (mm_pmd_folded(mm))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return _SEGMENT_ENT= RY_EMPTY;
-=C2=A0 =C2=A0 =C2=A0 =C2=A0if (mm->context.asce_limit <=3D _REGION2_= SIZE)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (mm_pud_folded(mm))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return _REGION3_ENT= RY_EMPTY;
-=C2=A0 =C2=A0 =C2=A0 =C2=A0if (mm->context.asce_limit <=3D _REGION1_= SIZE)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (mm_p4d_folded(mm))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return _REGION2_ENT= RY_EMPTY;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 return _REGION1_ENTRY_EMPTY;
=C2=A0}
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/p= gtable.h
index 0e7cb0dc9c33..de05466ce50c 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -485,6 +485,24 @@ static inline int is_module_addr(void *addr)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0_REGION_ENTRY_PROTECT |= \
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0_REGION_ENTRY_NOEXEC)
+static inline bool mm_p4d_folded(struct mm_struct *mm)
+{
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return mm->context.asce_limit <=3D _REGIO= N1_SIZE;
+}
+#define mm_p4d_folded(mm) mm_p4d_folded(mm)
+
+static inline bool mm_pud_folded(struct mm_struct *mm)
+{
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return mm->context.asce_limit <=3D _REGIO= N2_SIZE;
+}
+#define mm_pud_folded(mm) mm_pud_folded(mm)
+
+static inline bool mm_pmd_folded(struct mm_struct *mm)
+{
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return mm->context.asce_limit <=3D _REGIO= N3_SIZE;
+}
+#define mm_pmd_folded(mm) mm_pmd_folded(mm)
+
=C2=A0static inline int mm_has_pgste(struct mm_struct *mm)
=C2=A0{
=C2=A0#ifdef CONFIG_PGSTE
diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
index 457b7ba0fbb6..b31c779cf581 100644
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -136,7 +136,7 @@ static inline void pte_free_tlb(struct mmu_gather *tlb,= pgtable_t pte,
=C2=A0static inline void pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd, =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 unsigned long address)
=C2=A0{
-=C2=A0 =C2=A0 =C2=A0 =C2=A0if (tlb->mm->context.asce_limit <=3D _= REGION3_SIZE)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (mm_pmd_folded(tlb->mm))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 pgtable_pmd_page_dtor(virt_to_page(pmd));<= br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 tlb_remove_table(tlb, pmd);
@@ -152,7 +152,7 @@ static inline void pmd_free_tlb(struct mmu_gather *tlb,= pmd_t *pmd,
=C2=A0static inline void p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d, =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 unsigned long address)
=C2=A0{
-=C2=A0 =C2=A0 =C2=A0 =C2=A0if (tlb->mm->context.asce_limit <=3D _= REGION1_SIZE)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (mm_p4d_folded(tlb->mm))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 tlb_remove_table(tlb, p4d);
=C2=A0}
@@ -167,7 +167,7 @@ static inline void p4d_free_tlb(struct mmu_gather *tlb,= p4d_t *p4d,
=C2=A0static inline void pud_free_tlb(struct mmu_gather *tlb, pud_t *pud, =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 unsigned long address)
=C2=A0{
-=C2=A0 =C2=A0 =C2=A0 =C2=A0if (tlb->mm->context.asce_limit <=3D _= REGION2_SIZE)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (mm_pud_folded(tlb->mm))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 tlb_remove_table(tlb, pud);
=C2=A0}
-= -
2.16.4




--
Regards,
Li Wang
--000000000000a832be0579804529--