linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Christophe Leroy <christophe.leroy@csgroup.eu>
To: Chih-En Lin <shiyn.lin@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Christian Brauner <brauner@kernel.org>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	William Kucharski <william.kucharski@oracle.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Yunsheng Lin <linyunsheng@huawei.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Suren Baghdasaryan <surenb@google.com>,
	Colin Cross <ccross@google.com>, Feng Tang <feng.tang@intel.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Mike Rapoport <rppt@kernel.org>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>,
	Daniel Axtens <dja@axtens.net>,
	Jonathan Marek <jonathan@marek.ca>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Peter Xu <peterx@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andy Lutomirski <luto@kernel.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Fenghua Yu <fenghua.yu@intel.com>,
	David Hildenbrand <david@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Kaiyang Zhao <zhao776@purdue.edu>,
	Huichun Feng <foxhoundsk.tw@gmail.com>,
	Jim Huang <jserv.tw@gmail.com>
Subject: Re: [RFC PATCH 4/6] mm: Add COW PTE fallback function
Date: Fri, 20 May 2022 14:21:54 +0000	[thread overview]
Message-ID: <68c8a99e-52b5-9bbf-4847-3337165d99a8@csgroup.eu> (raw)
In-Reply-To: <20220519183127.3909598-5-shiyn.lin@gmail.com>



Le 19/05/2022 à 20:31, Chih-En Lin a écrit :
> The lifetime of COW PTE will handle by ownership and a reference count.
> When the process wants to write the COW PTE, which reference count is 1,
> it will reuse the COW PTE instead of copying then free.
> 
> Only the owner will update its RSS state and the record of page table
> bytes allocation. So we need to handle when the non-owner process gets
> the fallback COW PTE.
> 
> This commit prepares for the following implementation of the reference
> count for COW PTE.
> 
> Signed-off-by: Chih-En Lin <shiyn.lin@gmail.com>
> ---
>   mm/memory.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 66 insertions(+)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 76e3af9639d9..dcb678cbb051 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1000,6 +1000,34 @@ page_copy_prealloc(struct mm_struct *src_mm, struct vm_area_struct *vma,
>          return new_page;
>   }
> 
> +static inline void cow_pte_rss(struct mm_struct *mm, struct vm_area_struct *vma,
> +       pmd_t *pmdp, unsigned long addr, unsigned long end, bool inc_dec)

Parenthesis alignment is not correct.

You should run 'scripts/checkpatch.pl --strict' on you patch.

> +{
> +       int rss[NR_MM_COUNTERS];
> +       pte_t *orig_ptep, *ptep;
> +       struct page *page;
> +
> +       init_rss_vec(rss);
> +
> +       ptep = pte_offset_map(pmdp, addr);
> +       orig_ptep = ptep;
> +       arch_enter_lazy_mmu_mode();
> +       do {
> +               if (pte_none(*ptep) || pte_special(*ptep))
> +                       continue;
> +
> +               page = vm_normal_page(vma, addr, *ptep);
> +               if (page) {
> +                       if (inc_dec)
> +                               rss[mm_counter(page)]++;
> +                       else
> +                               rss[mm_counter(page)]--;
> +               }
> +       } while (ptep++, addr += PAGE_SIZE, addr != end);
> +       arch_leave_lazy_mmu_mode();
> +       add_mm_rss_vec(mm, rss);
> +}
> +
>   static int
>   copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
>                 pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr,
> @@ -4554,6 +4582,44 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud)
>          return VM_FAULT_FALLBACK;
>   }
> 
> +/* COW PTE fallback to normal PTE:
> + * - two state here
> + *   - After break child :   [parent, rss=1, ref=1, write=NO , owner=parent]
> + *                        to [parent, rss=1, ref=1, write=YES, owner=NULL  ]
> + *   - After break parent:   [child , rss=0, ref=1, write=NO , owner=NULL  ]
> + *                        to [child , rss=1, ref=1, write=YES, owner=NULL  ]
> + */
> +void cow_pte_fallback(struct vm_area_struct *vma, pmd_t *pmd,
> +               unsigned long addr)

There should be a prototype in a header somewhere for a non static function.

You are encouraged to run 'make mm/memory.o C=2' to check sparse reports.

> +{
> +       struct mm_struct *mm = vma->vm_mm;
> +       unsigned long start, end;
> +       pmd_t new;
> +
> +       BUG_ON(pmd_write(*pmd));

You seem to add a lot of BUG_ONs(). Are they really necessary ? See 
https://docs.kernel.org/process/deprecated.html?highlight=bug_on#bug-and-bug-on

You may also use VM_BUG_ON().

> +
> +       start = addr & PMD_MASK;
> +       end = (addr + PMD_SIZE) & PMD_MASK;
> +
> +       /* If pmd is not owner, it needs to increase the rss.
> +        * Since only the owner has the RSS state for the COW PTE.
> +        */
> +       if (!cow_pte_owner_is_same(pmd, pmd)) {
> +               cow_pte_rss(mm, vma, pmd, start, end, true /* inc */);
> +               mm_inc_nr_ptes(mm);
> +               smp_wmb();
> +               pmd_populate(mm, pmd, pmd_page(*pmd));
> +       }
> +
> +       /* Reuse the pte page */
> +       set_cow_pte_owner(pmd, NULL);
> +       new = pmd_mkwrite(*pmd);
> +       set_pmd_at(mm, addr, pmd, new);
> +
> +       BUG_ON(!pmd_write(*pmd));
> +       BUG_ON(pmd_page(*pmd)->cow_pte_owner);
> +}
> +
>   /*
>    * These routines also need to handle stuff like marking pages dirty
>    * and/or accessed for architectures that don't do it in hardware (most
> --
> 2.36.1
> 

  reply	other threads:[~2022-05-20 14:21 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-19 18:31 [RFC PATCH 0/6] Introduce Copy-On-Write to Page Table Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 1/6] mm: Add a new mm flag for Copy-On-Write PTE table Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 2/6] mm: clone3: Add CLONE_COW_PGTABLE flag Chih-En Lin
2022-05-20 14:13   ` Christophe Leroy
2022-05-21  3:50     ` Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 3/6] mm, pgtable: Add ownership for the PTE table Chih-En Lin
2022-05-20 14:15   ` Christophe Leroy
2022-05-21  4:03     ` Chih-En Lin
2022-05-21  4:02   ` Matthew Wilcox
2022-05-21  5:01     ` Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 4/6] mm: Add COW PTE fallback function Chih-En Lin
2022-05-20 14:21   ` Christophe Leroy [this message]
2022-05-21  4:15     ` Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 5/6] mm, pgtable: Add the reference counter for COW PTE Chih-En Lin
2022-05-20 14:30   ` Christophe Leroy
2022-05-21  4:22     ` Chih-En Lin
2022-05-21  4:08   ` Matthew Wilcox
2022-05-21  5:10     ` Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 6/6] mm: Expand Copy-On-Write to PTE table Chih-En Lin
2022-05-20 14:49   ` Christophe Leroy
2022-05-21  4:38     ` Chih-En Lin
2022-05-21  8:59 ` [External] [RFC PATCH 0/6] Introduce Copy-On-Write to Page Table Qi Zheng
2022-05-21 19:08   ` Chih-En Lin
2022-05-21 16:07 ` David Hildenbrand
2022-05-21 18:50   ` Chih-En Lin
2022-05-21 20:28     ` David Hildenbrand
2022-05-21 20:12   ` Matthew Wilcox
2022-05-21 20:22     ` David Hildenbrand
2022-05-21 22:19     ` Andy Lutomirski
2022-05-22  0:31       ` Matthew Wilcox
2022-05-22 15:20         ` Andy Lutomirski
2022-05-22 19:40           ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=68c8a99e-52b5-9bbf-4847-3337165d99a8@csgroup.eu \
    --to=christophe.leroy@csgroup.eu \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=arnd@arndb.de \
    --cc=bigeasy@linutronix.de \
    --cc=brauner@kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=ccross@google.com \
    --cc=david@redhat.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=dja@axtens.net \
    --cc=ebiederm@xmission.com \
    --cc=feng.tang@intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=foxhoundsk.tw@gmail.com \
    --cc=geert@linux-m68k.org \
    --cc=jhubbard@nvidia.com \
    --cc=jonathan@marek.ca \
    --cc=jserv.tw@gmail.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linyunsheng@huawei.com \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=shiyn.lin@gmail.com \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=vincent.guittot@linaro.org \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    --cc=zhao776@purdue.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox