From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C12CBECAAA1 for ; Tue, 6 Sep 2022 20:13:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 43F8C8D000A; Tue, 6 Sep 2022 16:13:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3C6DE8D0009; Tue, 6 Sep 2022 16:13:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F20B8D000A; Tue, 6 Sep 2022 16:13:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0A14D8D0009 for ; Tue, 6 Sep 2022 16:13:18 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D0E6BAB3F1 for ; Tue, 6 Sep 2022 20:13:17 +0000 (UTC) X-FDA: 79882760034.11.D9381C1 Received: from mail-io1-f46.google.com (mail-io1-f46.google.com [209.85.166.46]) by imf03.hostedemail.com (Postfix) with ESMTP id 9119C2006E for ; Tue, 6 Sep 2022 20:13:17 +0000 (UTC) Received: by mail-io1-f46.google.com with SMTP id i77so9853700ioa.7 for ; Tue, 06 Sep 2022 13:13:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date; bh=tSQJvyoK1ZBBJye0Ax7+7hUQkw5NfAovX9JEyzVfbrE=; b=BmZuTIGnTuwT2wOwFsmjKhoZzaUFY/ddFJi0v6nI+NCEtSh3JbL6TsjdLoBPUeAtKg 1MvcyJ8WujKBbwZGcxe/IJUn7hFOYwBXglI/jSx+BE9y2MQSZIf+sESRvZGqAL+AASYr vsmfzWal4g+eoM94S4TN6iDKwnTqwc4QmMuXLTy6e559r8dokJQy/1ReoK34RxQ0pkpQ MWjMVxwJ7mCLVK7Rji7q5k3uuWLUWKAKqpyI0F+ZvpDfLSqrHuZHLP2xUZu//VIT6Uyn vUhdzVJZita1oG83TuSr2lAj5hWe8HGCigKJZ2MznfuN/kCox1LHXWWIhcgeOQZWlNzX Cx+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date; bh=tSQJvyoK1ZBBJye0Ax7+7hUQkw5NfAovX9JEyzVfbrE=; b=Chs44T1mQ+1b1bAVPVSGokNJRwlNg1RZbxL8L7MhpSo3WK7+8zGkU2dM6QevPqbkvm 20iZxMm9k0OliO7sUbt5vKArcecsO0b7VRodxcmZBF1hlAk0tThBiYWvWTD7XbQonUug etDHsk7BZy4RGllg64r98FzZuMtJYwcSUtwcEmAv6X7tykEmB0zRFQ0QBZ+FlrV1ogEi 0WfsBLGq2pQvPJwI4uG69Q2PQeGB9s5pZjQbQye2SWkqbbwDd8tEvEG75LHBhDZWsBfb b/OfOJmNKmVeEm6xXRJNvTTtDdcIw0mT4f0kHNtBDEAhyXXNGTAhpC5UUUTIEjVnOG94 OacQ== X-Gm-Message-State: ACgBeo2VikSEoytb13a1hsIzEI3bkNkle0+Q2//19LtMc2XdOWDsZwow nMfpMvcgO0JD+QN2xD9IDcIt7I4DA3V4RMrjAdhWsg== X-Google-Smtp-Source: AA6agR5JRgfVPp6sXbXbtq7o0GKDtk0mN/iM21q3LuG4EnVUPgR3bkNWsuRmfTsnX33iHmNB1k4OktAWx5hvLaAIG6g= X-Received: by 2002:a6b:2a88:0:b0:68a:e898:2822 with SMTP id q130-20020a6b2a88000000b0068ae8982822mr74435ioq.75.1662495196659; Tue, 06 Sep 2022 13:13:16 -0700 (PDT) MIME-Version: 1.0 References: <20220901173516.702122-1-surenb@google.com> <20220901173516.702122-7-surenb@google.com> <1624be86-4c17-46e5-fafc-eb8afb7b9b4a@linux.ibm.com> <20220906195949.7nln7y6urs6rfyyd@revolver> In-Reply-To: <20220906195949.7nln7y6urs6rfyyd@revolver> From: Suren Baghdasaryan Date: Tue, 6 Sep 2022 13:13:05 -0700 Message-ID: Subject: Re: [RFC PATCH RESEND 06/28] mm: mark VMA as locked whenever vma->vm_flags are modified To: Liam Howlett Cc: Laurent Dufour , Andrew Morton , Michel Lespinasse , Jerome Glisse , Michal Hocko , Vlastimil Babka , Johannes Weiner , Mel Gorman , Davidlohr Bueso , Matthew Wilcox , Peter Zijlstra , Laurent Dufour , "Paul E . McKenney" , Andy Lutomirski , Song Liu , Peter Xu , David Hildenbrand , "dhowells@redhat.com" , Hugh Dickins , Sebastian Andrzej Siewior , Kent Overstreet , David Rientjes , Axel Rasmussen , Joel Fernandes , Minchan Kim , kernel-team , linux-mm , "linux-arm-kernel@lists.infradead.org" , "linuxppc-dev@lists.ozlabs.org" , "x86@kernel.org" , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662495197; a=rsa-sha256; cv=none; b=hwVhOwcujNxxiWInaLV4bPDP85nrNwxgV2XXCEYmewSoPS1FpI6KaKo1KRQZtYEL8w46cu lZFNhp0kZdV4wNHh6gOVF+LiHgbboCz+A3gzCGByHVnTBCFgjOg8etzu9NM5KsoXcdKXF3 EWqsTKns8Gm8fH6CgcExxyanuFJ0c8k= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=BmZuTIGn; spf=pass (imf03.hostedemail.com: domain of surenb@google.com designates 209.85.166.46 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662495197; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tSQJvyoK1ZBBJye0Ax7+7hUQkw5NfAovX9JEyzVfbrE=; b=EMc3GwByg4pugp2kzM5sNkyulbHtB96mp+zgNTIt/t8eFtTeOyKxpZ/IH3onfyrqaayKDh YaY0B5A4qphx19+aRww6GhJsFItcvYUCORaeF9CFzDr4tRygaLI9nsjTmrMDO4jzx+MJ1R fDc2sXu6elovp4wVbdK3QeFIbgg//aE= X-Rspamd-Queue-Id: 9119C2006E Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=BmZuTIGn; spf=pass (imf03.hostedemail.com: domain of surenb@google.com designates 209.85.166.46 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: aupgngp4b88gizxmkhfazkc3ybhescys X-HE-Tag: 1662495197-529607 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Sep 6, 2022 at 1:00 PM Liam Howlett wrote= : > > * Suren Baghdasaryan [220906 15:01]: > > On Tue, Sep 6, 2022 at 7:27 AM Laurent Dufour w= rote: > > > > > > Le 01/09/2022 =C3=A0 19:34, Suren Baghdasaryan a =C3=A9crit : > > > > VMA flag modifications should be done under VMA lock to prevent con= current > > > > page fault handling in that area. > > > > > > > > Signed-off-by: Suren Baghdasaryan > > > > --- > > > > fs/proc/task_mmu.c | 1 + > > > > fs/userfaultfd.c | 6 ++++++ > > > > mm/madvise.c | 1 + > > > > mm/mlock.c | 2 ++ > > > > mm/mmap.c | 1 + > > > > mm/mprotect.c | 1 + > > > > 6 files changed, 12 insertions(+) > > > > > > There are few changes also done in the driver's space, for instance: > > > > > > *** arch/x86/kernel/cpu/sgx/driver.c: > > > sgx_mmap[98] vma->vm_flags |=3D VM_PFNMAP | VM_DONT= EXPAND | > > > VM_DONTDUMP | VM_IO; > > > *** arch/x86/kernel/cpu/sgx/virt.c: > > > sgx_vepc_mmap[108] vma->vm_flags |=3D VM_PFNMAP | VM_IO | > > > VM_DONTDUMP | VM_DONTCOPY; > > > *** drivers/dax/device.c: > > > dax_mmap[311] vma->vm_flags |=3D VM_HUGEPAGE; > > > > > > I guess these changes to vm_flags should be protected as well, or to = be > > > checked one by one. > > > > Thanks for noting these! I'll add necessary locking here and will look > > for other places I might have missed. > > Would an inline set/clear bit function be worth while for vm_flags? If > it is then a name change to vm_flags may get the compiler to catch any > missed cases. There doesn't seem to be many cases (12 inserts) so maybe > not. That would probably simplify the maintenance for flags in the future and we can add vma_mark_locked directly in the set/clear functions. > > > > > > > > > > > > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > > > index 4e0023643f8b..ceffa5c2c650 100644 > > > > --- a/fs/proc/task_mmu.c > > > > +++ b/fs/proc/task_mmu.c > > > > @@ -1285,6 +1285,7 @@ static ssize_t clear_refs_write(struct file *= file, const char __user *buf, > > > > for (vma =3D mm->mmap; vma; vma =3D vma->vm_n= ext) { > > > > if (!(vma->vm_flags & VM_SOFTDIRTY)) > > > > continue; > > > > + vma_mark_locked(vma); > > > > vma->vm_flags &=3D ~VM_SOFTDIRTY; > > > > vma_set_page_prot(vma); > > > > } > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > > > index 175de70e3adf..fe557b3d1c07 100644 > > > > --- a/fs/userfaultfd.c > > > > +++ b/fs/userfaultfd.c > > > > @@ -620,6 +620,7 @@ static void userfaultfd_event_wait_completion(s= truct userfaultfd_ctx *ctx, > > > > mmap_write_lock(mm); > > > > for (vma =3D mm->mmap; vma; vma =3D vma->vm_next) > > > > if (vma->vm_userfaultfd_ctx.ctx =3D=3D releas= e_new_ctx) { > > > > + vma_mark_locked(vma); > > > > vma->vm_userfaultfd_ctx =3D NULL_VM_U= FFD_CTX; > > > > vma->vm_flags &=3D ~__VM_UFFD_FLAGS; > > > > } > > > > @@ -653,6 +654,7 @@ int dup_userfaultfd(struct vm_area_struct *vma,= struct list_head *fcs) > > > > > > > > octx =3D vma->vm_userfaultfd_ctx.ctx; > > > > if (!octx || !(octx->features & UFFD_FEATURE_EVENT_FORK)) { > > > > + vma_mark_locked(vma); > > > > vma->vm_userfaultfd_ctx =3D NULL_VM_UFFD_CTX; > > > > vma->vm_flags &=3D ~__VM_UFFD_FLAGS; > > > > return 0; > > > > @@ -734,6 +736,7 @@ void mremap_userfaultfd_prep(struct vm_area_str= uct *vma, > > > > atomic_inc(&ctx->mmap_changing); > > > > } else { > > > > /* Drop uffd context if remap feature not enabled */ > > > > + vma_mark_locked(vma); > > > > vma->vm_userfaultfd_ctx =3D NULL_VM_UFFD_CTX; > > > > vma->vm_flags &=3D ~__VM_UFFD_FLAGS; > > > > } > > > > @@ -891,6 +894,7 @@ static int userfaultfd_release(struct inode *in= ode, struct file *file) > > > > vma =3D prev; > > > > else > > > > prev =3D vma; > > > > + vma_mark_locked(vma); > > > > vma->vm_flags =3D new_flags; > > > > vma->vm_userfaultfd_ctx =3D NULL_VM_UFFD_CTX; > > > > } > > > > @@ -1449,6 +1453,7 @@ static int userfaultfd_register(struct userfa= ultfd_ctx *ctx, > > > > * the next vma was merged into the current one and > > > > * the current one has not been updated yet. > > > > */ > > > > + vma_mark_locked(vma); > > > > vma->vm_flags =3D new_flags; > > > > vma->vm_userfaultfd_ctx.ctx =3D ctx; > > > > > > > > @@ -1630,6 +1635,7 @@ static int userfaultfd_unregister(struct user= faultfd_ctx *ctx, > > > > * the next vma was merged into the current one and > > > > * the current one has not been updated yet. > > > > */ > > > > + vma_mark_locked(vma); > > > > vma->vm_flags =3D new_flags; > > > > vma->vm_userfaultfd_ctx =3D NULL_VM_UFFD_CTX; > > > > > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > > > index 5f0f0948a50e..a173f0025abd 100644 > > > > --- a/mm/madvise.c > > > > +++ b/mm/madvise.c > > > > @@ -181,6 +181,7 @@ static int madvise_update_vma(struct vm_area_st= ruct *vma, > > > > /* > > > > * vm_flags is protected by the mmap_lock held in write mode. > > > > */ > > > > + vma_mark_locked(vma); > > > > vma->vm_flags =3D new_flags; > > > > if (!vma->vm_file) { > > > > error =3D replace_anon_vma_name(vma, anon_name); > > > > diff --git a/mm/mlock.c b/mm/mlock.c > > > > index b14e929084cc..f62e1a4d05f2 100644 > > > > --- a/mm/mlock.c > > > > +++ b/mm/mlock.c > > > > @@ -380,6 +380,7 @@ static void mlock_vma_pages_range(struct vm_are= a_struct *vma, > > > > */ > > > > if (newflags & VM_LOCKED) > > > > newflags |=3D VM_IO; > > > > + vma_mark_locked(vma); > > > > WRITE_ONCE(vma->vm_flags, newflags); > > > > > > > > lru_add_drain(); > > > > @@ -456,6 +457,7 @@ static int mlock_fixup(struct vm_area_struct *v= ma, struct vm_area_struct **prev, > > > > > > > > if ((newflags & VM_LOCKED) && (oldflags & VM_LOCKED)) { > > > > /* No work to do, and mlocking twice would be wrong *= / > > > > + vma_mark_locked(vma); > > > > vma->vm_flags =3D newflags; > > > > } else { > > > > mlock_vma_pages_range(vma, start, end, newflags); > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > > index 693e6776be39..f89c9b058105 100644 > > > > --- a/mm/mmap.c > > > > +++ b/mm/mmap.c > > > > @@ -1818,6 +1818,7 @@ unsigned long mmap_region(struct file *file, = unsigned long addr, > > > > out: > > > > perf_event_mmap(vma); > > > > > > > > + vma_mark_locked(vma); > > > > vm_stat_account(mm, vm_flags, len >> PAGE_SHIFT); > > > > if (vm_flags & VM_LOCKED) { > > > > if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) || > > > > > > I guess, this doesn't really impact, but the call to vma_mark_locked(= vma) > > > may be done only in the case the vm_flags field is touched. > > > Something like this: > > > > > > vm_stat_account(mm, vm_flags, len >> PAGE_SHIFT); > > > if (vm_flags & VM_LOCKED) { > > > if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) || > > > is_vm_hugetlb_page(vma) || > > > - vma =3D=3D get_gate_vma(curre= nt->mm)) > > > + vma =3D=3D get_gate_vma(curre= nt->mm)) { > > > + vma_mark_locked(vma); > > > vma->vm_flags &=3D VM_LOCKED_CLEAR_MASK; > > > - else > > > + } else > > > mm->locked_vm +=3D (len >> PAGE_SHIFT); > > > } > > > > > > > > > > diff --git a/mm/mprotect.c b/mm/mprotect.c > > > > index bc6bddd156ca..df47fc21b0e4 100644 > > > > --- a/mm/mprotect.c > > > > +++ b/mm/mprotect.c > > > > @@ -621,6 +621,7 @@ mprotect_fixup(struct mmu_gather *tlb, struct v= m_area_struct *vma, > > > > * vm_flags and vm_page_prot are protected by the mmap_lock > > > > * held in write mode. > > > > */ > > > > + vma_mark_locked(vma); > > > > vma->vm_flags =3D newflags; > > > > /* > > > > * We want to check manually if we can change individual PTEs= writable > > > > > -- > To unsubscribe from this group and stop receiving emails from it, send an= email to kernel-team+unsubscribe@android.com. >