From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F1E8C369D5 for ; Mon, 28 Apr 2025 23:31:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2846F6B0005; Mon, 28 Apr 2025 19:31:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1E2F26B0006; Mon, 28 Apr 2025 19:31:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 084866B0007; Mon, 28 Apr 2025 19:31:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DB7FB6B0005 for ; Mon, 28 Apr 2025 19:31:19 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 47E5480C32 for ; Mon, 28 Apr 2025 23:31:21 +0000 (UTC) X-FDA: 83385051162.10.591443B Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) by imf18.hostedemail.com (Postfix) with ESMTP id 5E2B41C0008 for ; Mon, 28 Apr 2025 23:31:19 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=OSGGgeo1; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of surenb@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745883079; a=rsa-sha256; cv=none; b=zN/3sdwHf/542hFKfW3lKuEA3uHDraAagKcFuwK4bkINLiprljbiSMwiNtiozEBXT6gvM9 iy4mNgNnsmdktgtYyCI5rx5VlwqL/OCJszHWGsiEMGdNVl7KqtzRBilX6qKNEChfRhWWku ymB+RqQm8obdnnuTo4U402uj3VIn+3Y= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=OSGGgeo1; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of surenb@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745883079; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GcYkHq3j2F3qtaXjRAwMQfG11dqlXUytxmFErxv2ZvA=; b=Y+2LGeKrULfthJ7FYvE3GydJ/cORbCGjeNcYKm9oqQdn1AypI5vy+6htxJajuuViiEOgdw 2RFlkWHzlAtKchx3bBjpNvXLbdekmwRJXJy9hZDqf3THyh5DdC4vMNVONPJjOADJqA8UId kykVznr820ExjlSXbq04JHiN313pnmc= Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-47e9fea29easo32181cf.1 for ; Mon, 28 Apr 2025 16:31:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1745883078; x=1746487878; darn=kvack.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GcYkHq3j2F3qtaXjRAwMQfG11dqlXUytxmFErxv2ZvA=; b=OSGGgeo1FACzT4Idip4Xv6yDQaNUU9E/7TelZ9A56RXOISfKytsJd2BXmK1q0u69Jc 2vuQbTdygRqBcT7UZxZAfr35FfMme7fkTVhKpxBNQEszfTnL2ml4v0lBJ92eioDZwWcR Jh4nfwiLEjzf1JP96PDSlNCYEoYcFf77kq3rKqExCE6Qt1IF4hYDMf9IecdRGFE6SwdI 9rD+3vPsrSraYauLBZMeOrU6nJHVGHm/1l+SnZ2vNnSyHAXZBCklcfoqrTvh/VeWE/ot 1NjQ4pOYN2L3fFPQRVIlbzz+HTy0Iltqqys/9z2/L067VZCkGgZLMYw6o9XpsQAiwa7s DODw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745883078; x=1746487878; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GcYkHq3j2F3qtaXjRAwMQfG11dqlXUytxmFErxv2ZvA=; b=SWKnCmuqficWeOnxuRb6pdT5EU1Aa1i2es2BiD2DwWmISvHEcWut2JCchMfWrkqMD+ 2gyKmcgpPCE4HE2X/1V9qzBcaGX74kXw167GWmwVz0VKvTUTu02mPPkYScqt0D2wiWyk yD72Baqx9pJ6GrDV9+ZXLzGis65OQz3WV3+J4aYSF3G6+XY3JJOzi7Aqu/KhDImAAFDy Cf7C/dSafHWv8LaRXL9DMo03gGr/+NTk3+F+fcBv1NmL0ZcpnyLl4ssZFIOon334eD4p tkJvnWu+gqGks+gQmvLeUuZMft+OoaiZ42WRDSuAR+jmk1aEkoQf8c/yyE8ZuxRrdZQn y9Hg== X-Forwarded-Encrypted: i=1; AJvYcCXW6Gc3wcJm2mr0Tj7s3O2fnc0o6hib1hXhr3phoqWAyUq1UAgywJSEKKvgawadrlYsHJcONloMyQ==@kvack.org X-Gm-Message-State: AOJu0Ywb4I7KPrETAzR5NJgrmx4/Xerb8+GTtVPoazQshVvHQOLGmXKG W+xll4jYYsFqnKC4yO5hqB4oEXLQ7DPKt7rqMFUKdpge2h8BKH+8sRcF7lcFoVGsTiraNTngBNE mD2Fy/5yvAwfaX8EbxlJkWPA3UwxuaZEwBEEE X-Gm-Gg: ASbGncuCiFS3Fq0SBSvVYS6KBT/MPupnVbd94rwMxXY9cND9bvxUgLcclhQWbtMSZUA jBQCS6EqIYsuXDZ7f6Wn+3nTH+YjzMKyLQHxiYZLR0P2TQRxIxJuF1USBFXLVh4Fh+N6klFbkSQ GKO4tVj/S1MtS8hqsIZKfCWIBhDiU4UaI= X-Google-Smtp-Source: AGHT+IGdzui/r6+AKII6O9xA+IdFvn5VRyjmpgIeXeHoD8o5xBCCKwyUmLCqTlyTOT6zf6tAdRMTkyBWLTLm/uhfkxk= X-Received: by 2002:a05:622a:453:b0:486:c718:1578 with SMTP id d75a77b69052e-4885f13f418mr1783741cf.22.1745883078151; Mon, 28 Apr 2025 16:31:18 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Suren Baghdasaryan Date: Mon, 28 Apr 2025 16:31:07 -0700 X-Gm-Features: ATxdqUEaoJAAqWYGFwAmXdRfWVotqyD0Yj7Ng6zcKMRKXHlEEdELj5fXjr3M2-Q Message-ID: Subject: Re: [PATCH v3 3/4] mm: move dup_mmap() to mm To: "Liam R. Howlett" , Lorenzo Stoakes , Andrew Morton , Vlastimil Babka , Jann Horn , Pedro Falcato , David Hildenbrand , Kees Cook , Alexander Viro , Christian Brauner , Jan Kara , Suren Baghdasaryan , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 5E2B41C0008 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: h3b5ebcxypyy7gfo9pukbihjtm8e6mdd X-HE-Tag: 1745883079-606206 X-HE-Meta: U2FsdGVkX18eVUhQyyHh7kwKLrzBHUTpmUH7yYXYiVZFjtOYxSxt1Q5yrOC84DPAWb/Gl7355WbXY3SZ5igeee/L/MvZSvsb9kgVP7/nzwTcXH7y8vM0fnMOLH8QGT3RxJZE/UzNbL0SfZAVqzBPRi0VNZrlFZzrbrJIWYKW7E3BFxSQmx+Tot/TBfktRVcK2bxWFeBVZvjWP8O0vhQRZ/o64U1/gO72pavVWUZSTItP4w9/P3bfQbrI+MKFoNDOGv+51t03Lo+Nh/d29N1pkMGOmwVSxxaQyugNkYpIr6lojWG66w0wSMPlNjHsTH45s/CqJXRWL4McNYERHEZZutmgVaa/EhpB1lpI9pfdN9JQhU9L3OqUhRijb24M7U+hAfIXOu7HyhvTIuBG0ZpKBy0YAYC9n/i21xGMfPaV7YvyaSb+Xvim2MlzyarVwKmUMpZmh8rU8pKu5iway7pZH+vXsA7kAHj2UWyZPrZN3n0lC6fv/l8x2IZuMMLbiFqbY1dwslcvg+Rx8scNuEKR2XqzYOnGNojgp4V5EIVsLGzs3uj6GzzYq4aShu0ggnBFjcUgekg0WyzfBA0wk9sf2D+KpB6p1bmW5aytmFj4fQH3szqZSYVjCgq3ZQyzqR8afldT/AdaxKnSbkoM3UAbd6hv2/3ORwMskhzKhPusD5caRmdHn0L1cxYvSh96MTKAdGhcfqszU0YbINxxK42WfHMPQu73JL59+TNCTv4Yr5Lyznu/Kt4nWqheJ8z6PRPz/0eilxAFZNKPnC5S5vkTNVB4eKNoYHxNo9/ptuuOE/uYqGgx3eKpHkf61xr3mAq738IvocJI/4kqGrKI1oY/G4WsSH+7D36bL0hO/gSpodPbCGbHwC7YpqoRXjhREh/8gmSDF8UNj4seFM1JYf34cn/1c2aWuZremUQ8UqZkdCmbTRYvHJZVl1RVCTGA5EvnoZu901z0PrHlLBgP6Yz 3H3OIRFH chnNaxqz5QKVYZSinFa0M/kfaMi7K+Iav6zTARvfUwrk89TnjaT8742vaZ6hxC8G7WsijpY4vZWr/g0bo7rfF2pNqbox1SkuWaUNJ5+aBjHolxGfuK1V1Oi5imbwyVAhnWfmRvsb8Qt6GLFFHIvx+14xfTvdkO0PeVu/ELLGWqiq7UDzPk4xIApYLjoVVxxTfBvgPs/hvnTrG8Fo1u4GBfPzddOxN8cV66esXyT8BIxl1tBA4d9+SKrj6pxeLOrOkttcGh0xz4/RGmhcAEyRN5Wsyt73GjL9dah5L X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Apr 28, 2025 at 12:13=E2=80=AFPM Liam R. Howlett wrote: > > * Lorenzo Stoakes [250428 11:28]: > > This is a key step in our being able to abstract and isolate VMA alloca= tion > > and destruction logic. > > > > This function is the last one where vm_area_free() and vm_area_dup() ar= e > > directly referenced outside of mmap, so having this in mm allows us to > > isolate these. > > > > We do the same for the nommu version which is substantially simpler. > > > > We place the declaration for dup_mmap() in mm/internal.h and have > > kernel/fork.c import this in order to prevent improper use of this > > functionality elsewhere in the kernel. > > > > While we're here, we remove the useless #ifdef CONFIG_MMU check around > > mmap_read_lock_maybe_expand() in mmap.c, mmap.c is compiled only if > > CONFIG_MMU is set. > > > > Signed-off-by: Lorenzo Stoakes > > Suggested-by: Pedro Falcato > > Reviewed-by: Pedro Falcato > > Reviewed-by: Liam R. Howlett Reviewed-by: Suren Baghdasaryan > > > --- > > kernel/fork.c | 189 ++------------------------------------------------ > > mm/internal.h | 2 + > > mm/mmap.c | 181 +++++++++++++++++++++++++++++++++++++++++++++-- > > mm/nommu.c | 8 +++ > > 4 files changed, 189 insertions(+), 191 deletions(-) > > > > diff --git a/kernel/fork.c b/kernel/fork.c > > index 168681fc4b25..ac9f9267a473 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -112,6 +112,9 @@ > > #include > > #include > > > > +/* For dup_mmap(). */ > > +#include "../mm/internal.h" > > + > > #include > > > > #define CREATE_TRACE_POINTS > > @@ -589,7 +592,7 @@ void free_task(struct task_struct *tsk) > > } > > EXPORT_SYMBOL(free_task); > > > > -static void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *ol= dmm) > > +void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm) > > { > > struct file *exe_file; > > > > @@ -604,183 +607,6 @@ static void dup_mm_exe_file(struct mm_struct *mm,= struct mm_struct *oldmm) > > } > > > > #ifdef CONFIG_MMU > > -static __latent_entropy int dup_mmap(struct mm_struct *mm, > > - struct mm_struct *oldmm) > > -{ > > - struct vm_area_struct *mpnt, *tmp; > > - int retval; > > - unsigned long charge =3D 0; > > - LIST_HEAD(uf); > > - VMA_ITERATOR(vmi, mm, 0); > > - > > - if (mmap_write_lock_killable(oldmm)) > > - return -EINTR; > > - flush_cache_dup_mm(oldmm); > > - uprobe_dup_mmap(oldmm, mm); > > - /* > > - * Not linked in yet - no deadlock potential: > > - */ > > - mmap_write_lock_nested(mm, SINGLE_DEPTH_NESTING); > > - > > - /* No ordering required: file already has been exposed. */ > > - dup_mm_exe_file(mm, oldmm); > > - > > - mm->total_vm =3D oldmm->total_vm; > > - mm->data_vm =3D oldmm->data_vm; > > - mm->exec_vm =3D oldmm->exec_vm; > > - mm->stack_vm =3D oldmm->stack_vm; > > - > > - /* Use __mt_dup() to efficiently build an identical maple tree. *= / > > - retval =3D __mt_dup(&oldmm->mm_mt, &mm->mm_mt, GFP_KERNEL); > > - if (unlikely(retval)) > > - goto out; > > - > > - mt_clear_in_rcu(vmi.mas.tree); > > - for_each_vma(vmi, mpnt) { > > - struct file *file; > > - > > - vma_start_write(mpnt); > > - if (mpnt->vm_flags & VM_DONTCOPY) { > > - retval =3D vma_iter_clear_gfp(&vmi, mpnt->vm_star= t, > > - mpnt->vm_end, GFP_KER= NEL); > > - if (retval) > > - goto loop_out; > > - > > - vm_stat_account(mm, mpnt->vm_flags, -vma_pages(mp= nt)); > > - continue; > > - } > > - charge =3D 0; > > - /* > > - * Don't duplicate many vmas if we've been oom-killed (fo= r > > - * example) > > - */ > > - if (fatal_signal_pending(current)) { > > - retval =3D -EINTR; > > - goto loop_out; > > - } > > - if (mpnt->vm_flags & VM_ACCOUNT) { > > - unsigned long len =3D vma_pages(mpnt); > > - > > - if (security_vm_enough_memory_mm(oldmm, len)) /* = sic */ > > - goto fail_nomem; > > - charge =3D len; > > - } > > - tmp =3D vm_area_dup(mpnt); > > - if (!tmp) > > - goto fail_nomem; > > - > > - /* track_pfn_copy() will later take care of copying inter= nal state. */ > > - if (unlikely(tmp->vm_flags & VM_PFNMAP)) > > - untrack_pfn_clear(tmp); > > - > > - retval =3D vma_dup_policy(mpnt, tmp); > > - if (retval) > > - goto fail_nomem_policy; > > - tmp->vm_mm =3D mm; > > - retval =3D dup_userfaultfd(tmp, &uf); > > - if (retval) > > - goto fail_nomem_anon_vma_fork; > > - if (tmp->vm_flags & VM_WIPEONFORK) { > > - /* > > - * VM_WIPEONFORK gets a clean slate in the child. > > - * Don't prepare anon_vma until fault since we do= n't > > - * copy page for current vma. > > - */ > > - tmp->anon_vma =3D NULL; > > - } else if (anon_vma_fork(tmp, mpnt)) > > - goto fail_nomem_anon_vma_fork; > > - vm_flags_clear(tmp, VM_LOCKED_MASK); > > - /* > > - * Copy/update hugetlb private vma information. > > - */ > > - if (is_vm_hugetlb_page(tmp)) > > - hugetlb_dup_vma_private(tmp); > > - > > - /* > > - * Link the vma into the MT. After using __mt_dup(), memo= ry > > - * allocation is not necessary here, so it cannot fail. > > - */ > > - vma_iter_bulk_store(&vmi, tmp); > > - > > - mm->map_count++; > > - > > - if (tmp->vm_ops && tmp->vm_ops->open) > > - tmp->vm_ops->open(tmp); > > - > > - file =3D tmp->vm_file; > > - if (file) { > > - struct address_space *mapping =3D file->f_mapping= ; > > - > > - get_file(file); > > - i_mmap_lock_write(mapping); > > - if (vma_is_shared_maywrite(tmp)) > > - mapping_allow_writable(mapping); > > - flush_dcache_mmap_lock(mapping); > > - /* insert tmp into the share list, just after mpn= t */ > > - vma_interval_tree_insert_after(tmp, mpnt, > > - &mapping->i_mmap); > > - flush_dcache_mmap_unlock(mapping); > > - i_mmap_unlock_write(mapping); > > - } > > - > > - if (!(tmp->vm_flags & VM_WIPEONFORK)) > > - retval =3D copy_page_range(tmp, mpnt); > > - > > - if (retval) { > > - mpnt =3D vma_next(&vmi); > > - goto loop_out; > > - } > > - } > > - /* a new mm has just been created */ > > - retval =3D arch_dup_mmap(oldmm, mm); > > -loop_out: > > - vma_iter_free(&vmi); > > - if (!retval) { > > - mt_set_in_rcu(vmi.mas.tree); > > - ksm_fork(mm, oldmm); > > - khugepaged_fork(mm, oldmm); > > - } else { > > - > > - /* > > - * The entire maple tree has already been duplicated. If = the > > - * mmap duplication fails, mark the failure point with > > - * XA_ZERO_ENTRY. In exit_mmap(), if this marker is encou= ntered, > > - * stop releasing VMAs that have not been duplicated afte= r this > > - * point. > > - */ > > - if (mpnt) { > > - mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_= end - 1); > > - mas_store(&vmi.mas, XA_ZERO_ENTRY); > > - /* Avoid OOM iterating a broken tree */ > > - set_bit(MMF_OOM_SKIP, &mm->flags); > > - } > > - /* > > - * The mm_struct is going to exit, but the locks will be = dropped > > - * first. Set the mm_struct as unstable is advisable as = it is > > - * not fully initialised. > > - */ > > - set_bit(MMF_UNSTABLE, &mm->flags); > > - } > > -out: > > - mmap_write_unlock(mm); > > - flush_tlb_mm(oldmm); > > - mmap_write_unlock(oldmm); > > - if (!retval) > > - dup_userfaultfd_complete(&uf); > > - else > > - dup_userfaultfd_fail(&uf); > > - return retval; > > - > > -fail_nomem_anon_vma_fork: > > - mpol_put(vma_policy(tmp)); > > -fail_nomem_policy: > > - vm_area_free(tmp); > > -fail_nomem: > > - retval =3D -ENOMEM; > > - vm_unacct_memory(charge); > > - goto loop_out; > > -} > > - > > static inline int mm_alloc_pgd(struct mm_struct *mm) > > { > > mm->pgd =3D pgd_alloc(mm); > > @@ -794,13 +620,6 @@ static inline void mm_free_pgd(struct mm_struct *m= m) > > pgd_free(mm, mm->pgd); > > } > > #else > > -static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) > > -{ > > - mmap_write_lock(oldmm); > > - dup_mm_exe_file(mm, oldmm); > > - mmap_write_unlock(oldmm); > > - return 0; > > -} > > #define mm_alloc_pgd(mm) (0) > > #define mm_free_pgd(mm) > > #endif /* CONFIG_MMU */ > > diff --git a/mm/internal.h b/mm/internal.h > > index 40464f755092..b3e011976f74 100644 > > --- a/mm/internal.h > > +++ b/mm/internal.h > > @@ -1631,5 +1631,7 @@ static inline bool reclaim_pt_is_enabled(unsigned= long start, unsigned long end, > > } > > #endif /* CONFIG_PT_RECLAIM */ > > > > +void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm); > > +int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm); > > > > #endif /* __MM_INTERNAL_H */ > > diff --git a/mm/mmap.c b/mm/mmap.c > > index 9e09eac0021c..5259df031e15 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -1675,7 +1675,6 @@ static int __meminit init_reserve_notifier(void) > > } > > subsys_initcall(init_reserve_notifier); > > > > -#ifdef CONFIG_MMU > > /* > > * Obtain a read lock on mm->mmap_lock, if the specified address is be= low the > > * start of the VMA, the intent is to perform a write, and it is a > > @@ -1719,10 +1718,180 @@ bool mmap_read_lock_maybe_expand(struct mm_str= uct *mm, > > mmap_write_downgrade(mm); > > return true; > > } > > -#else > > -bool mmap_read_lock_maybe_expand(struct mm_struct *mm, struct vm_area_= struct *vma, > > - unsigned long addr, bool write) > > + > > +__latent_entropy int dup_mmap(struct mm_struct *mm, struct mm_struct *= oldmm) > > { > > - return false; > > + struct vm_area_struct *mpnt, *tmp; > > + int retval; > > + unsigned long charge =3D 0; > > + LIST_HEAD(uf); > > + VMA_ITERATOR(vmi, mm, 0); > > + > > + if (mmap_write_lock_killable(oldmm)) > > + return -EINTR; > > + flush_cache_dup_mm(oldmm); > > + uprobe_dup_mmap(oldmm, mm); > > + /* > > + * Not linked in yet - no deadlock potential: > > + */ > > + mmap_write_lock_nested(mm, SINGLE_DEPTH_NESTING); > > + > > + /* No ordering required: file already has been exposed. */ > > + dup_mm_exe_file(mm, oldmm); > > + > > + mm->total_vm =3D oldmm->total_vm; > > + mm->data_vm =3D oldmm->data_vm; > > + mm->exec_vm =3D oldmm->exec_vm; > > + mm->stack_vm =3D oldmm->stack_vm; > > + > > + /* Use __mt_dup() to efficiently build an identical maple tree. *= / > > + retval =3D __mt_dup(&oldmm->mm_mt, &mm->mm_mt, GFP_KERNEL); > > + if (unlikely(retval)) > > + goto out; > > + > > + mt_clear_in_rcu(vmi.mas.tree); > > + for_each_vma(vmi, mpnt) { > > + struct file *file; > > + > > + vma_start_write(mpnt); > > + if (mpnt->vm_flags & VM_DONTCOPY) { > > + retval =3D vma_iter_clear_gfp(&vmi, mpnt->vm_star= t, > > + mpnt->vm_end, GFP_KER= NEL); > > + if (retval) > > + goto loop_out; > > + > > + vm_stat_account(mm, mpnt->vm_flags, -vma_pages(mp= nt)); > > + continue; > > + } > > + charge =3D 0; > > + /* > > + * Don't duplicate many vmas if we've been oom-killed (fo= r > > + * example) > > + */ > > + if (fatal_signal_pending(current)) { > > + retval =3D -EINTR; > > + goto loop_out; > > + } > > + if (mpnt->vm_flags & VM_ACCOUNT) { > > + unsigned long len =3D vma_pages(mpnt); > > + > > + if (security_vm_enough_memory_mm(oldmm, len)) /* = sic */ > > + goto fail_nomem; > > + charge =3D len; > > + } > > + > > + tmp =3D vm_area_dup(mpnt); > > + if (!tmp) > > + goto fail_nomem; > > + > > + /* track_pfn_copy() will later take care of copying inter= nal state. */ > > + if (unlikely(tmp->vm_flags & VM_PFNMAP)) > > + untrack_pfn_clear(tmp); > > + > > + retval =3D vma_dup_policy(mpnt, tmp); > > + if (retval) > > + goto fail_nomem_policy; > > + tmp->vm_mm =3D mm; > > + retval =3D dup_userfaultfd(tmp, &uf); > > + if (retval) > > + goto fail_nomem_anon_vma_fork; > > + if (tmp->vm_flags & VM_WIPEONFORK) { > > + /* > > + * VM_WIPEONFORK gets a clean slate in the child. > > + * Don't prepare anon_vma until fault since we do= n't > > + * copy page for current vma. > > + */ > > + tmp->anon_vma =3D NULL; > > + } else if (anon_vma_fork(tmp, mpnt)) > > + goto fail_nomem_anon_vma_fork; > > + vm_flags_clear(tmp, VM_LOCKED_MASK); > > + /* > > + * Copy/update hugetlb private vma information. > > + */ > > + if (is_vm_hugetlb_page(tmp)) > > + hugetlb_dup_vma_private(tmp); > > + > > + /* > > + * Link the vma into the MT. After using __mt_dup(), memo= ry > > + * allocation is not necessary here, so it cannot fail. > > + */ > > + vma_iter_bulk_store(&vmi, tmp); > > + > > + mm->map_count++; > > + > > + if (tmp->vm_ops && tmp->vm_ops->open) > > + tmp->vm_ops->open(tmp); > > + > > + file =3D tmp->vm_file; > > + if (file) { > > + struct address_space *mapping =3D file->f_mapping= ; > > + > > + get_file(file); > > + i_mmap_lock_write(mapping); > > + if (vma_is_shared_maywrite(tmp)) > > + mapping_allow_writable(mapping); > > + flush_dcache_mmap_lock(mapping); > > + /* insert tmp into the share list, just after mpn= t */ > > + vma_interval_tree_insert_after(tmp, mpnt, > > + &mapping->i_mmap); > > + flush_dcache_mmap_unlock(mapping); > > + i_mmap_unlock_write(mapping); > > + } > > + > > + if (!(tmp->vm_flags & VM_WIPEONFORK)) > > + retval =3D copy_page_range(tmp, mpnt); > > + > > + if (retval) { > > + mpnt =3D vma_next(&vmi); > > + goto loop_out; > > + } > > + } > > + /* a new mm has just been created */ > > + retval =3D arch_dup_mmap(oldmm, mm); > > +loop_out: > > + vma_iter_free(&vmi); > > + if (!retval) { > > + mt_set_in_rcu(vmi.mas.tree); > > + ksm_fork(mm, oldmm); > > + khugepaged_fork(mm, oldmm); > > + } else { > > + > > + /* > > + * The entire maple tree has already been duplicated. If = the > > + * mmap duplication fails, mark the failure point with > > + * XA_ZERO_ENTRY. In exit_mmap(), if this marker is encou= ntered, > > + * stop releasing VMAs that have not been duplicated afte= r this > > + * point. > > + */ > > + if (mpnt) { > > + mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_= end - 1); > > + mas_store(&vmi.mas, XA_ZERO_ENTRY); > > + /* Avoid OOM iterating a broken tree */ > > + set_bit(MMF_OOM_SKIP, &mm->flags); > > + } > > + /* > > + * The mm_struct is going to exit, but the locks will be = dropped > > + * first. Set the mm_struct as unstable is advisable as = it is > > + * not fully initialised. > > + */ > > + set_bit(MMF_UNSTABLE, &mm->flags); > > + } > > +out: > > + mmap_write_unlock(mm); > > + flush_tlb_mm(oldmm); > > + mmap_write_unlock(oldmm); > > + if (!retval) > > + dup_userfaultfd_complete(&uf); > > + else > > + dup_userfaultfd_fail(&uf); > > + return retval; > > + > > +fail_nomem_anon_vma_fork: > > + mpol_put(vma_policy(tmp)); > > +fail_nomem_policy: > > + vm_area_free(tmp); > > +fail_nomem: > > + retval =3D -ENOMEM; > > + vm_unacct_memory(charge); > > + goto loop_out; > > } > > -#endif > > diff --git a/mm/nommu.c b/mm/nommu.c > > index 2b4d304c6445..a142fc258d39 100644 > > --- a/mm/nommu.c > > +++ b/mm/nommu.c > > @@ -1874,3 +1874,11 @@ static int __meminit init_admin_reserve(void) > > return 0; > > } > > subsys_initcall(init_admin_reserve); > > + > > +int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) > > +{ > > + mmap_write_lock(oldmm); > > + dup_mm_exe_file(mm, oldmm); > > + mmap_write_unlock(oldmm); > > + return 0; > > +} > > -- > > 2.49.0 > >