From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7F80C433EF for ; Tue, 23 Nov 2021 01:47:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1CB426B0071; Mon, 22 Nov 2021 20:47:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 153FB6B0072; Mon, 22 Nov 2021 20:47:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F0F996B0073; Mon, 22 Nov 2021 20:47:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0083.hostedemail.com [216.40.44.83]) by kanga.kvack.org (Postfix) with ESMTP id DB2576B0071 for ; Mon, 22 Nov 2021 20:47:37 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 97131184D424E for ; Tue, 23 Nov 2021 01:47:27 +0000 (UTC) X-FDA: 78838507734.28.4D47C7D Received: from mail-yb1-f180.google.com (mail-yb1-f180.google.com [209.85.219.180]) by imf16.hostedemail.com (Postfix) with ESMTP id D6B64F00008F for ; Tue, 23 Nov 2021 01:47:23 +0000 (UTC) Received: by mail-yb1-f180.google.com with SMTP id e136so55109005ybc.4 for ; Mon, 22 Nov 2021 17:47:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=iPJygyTVm3LaiTYDIf43ooZ8K/o9AhsrwBhV4qKQzSk=; b=IkYsbKDEF+3U7qHV98R8IgRkeTR9j0jpUpn1csVzaW/2NuL9GlSJJRcuIfHL2+o+5E dEo0tzmnq0a9B3cRgPuk3djQmmKaDtt3+epyWtpsoq3Jb9fvkVuugDDroEzLGkMF8b+x coxNzN20o0RCTriOmdxRpQwI+voaYkHuUbhr9BGC23LAgApD+M2/5wavuTz48CS8szhi xiNN9ImcXPiYIBA3LK1QRaeL8uYgWOJCx2VIgqv8MKIV9fceafgSixtIfsGewGwNEau1 Wv+J1PfwGHPi+y8y4vwh+xgIVWsKM1zSBevVULUj0C36lIYQKuZKQsRpKvv0qA4N1DxU +D7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=iPJygyTVm3LaiTYDIf43ooZ8K/o9AhsrwBhV4qKQzSk=; b=fka9bruHvB3cBiSXFGL751szc+41oQKFaI1zaxOjW2PDJWa/qE9xqHaGOu0zwKnWxt 2XaPZzmXmwTNKexHdZ32B+YvlRbY7IfLDgA5c5XqBe5K/J/Wrhqg2reCgmGf5/RfTbNW dcrwSQoeXjlakvtNJ690oyzynhEm/phbIGZ7+coAsk7HOTFRskBwkoZ4O4Cb8+bty3xP IY9GZE1A6teyDi6u+rqlclVJmYRqMdej4NxzxU6Z4yb4iwq00lvU7Tq+O3BDEOLDZ8zD 6bz1s5mIhD9i6VXaUhcEaBpp1udvF+9wnF27tfNoJvKTdyJP9tt8z3uqn08Ix6+XC3Se Tkfg== X-Gm-Message-State: AOAM530+WDXKjpqBQZBGTVGrYAwUUvZWwnhBnpqqqAqGYmF7EDVQDIO7 emN6tngC2iFdE9G/W45huOeg8RxYnv4Ho2ALdkqqrw== X-Google-Smtp-Source: ABdhPJwHxFLwDJzjCt9YMqN1C1kz288fbcwTv20u44vVF/XxAZFtdLQISEavPFXFN3tXBb18su6J4APNqGl1rcG/6Ts= X-Received: by 2002:a25:a429:: with SMTP id f38mr1949614ybi.34.1637632046025; Mon, 22 Nov 2021 17:47:26 -0800 (PST) MIME-Version: 1.0 References: <20211116215715.645231-1-surenb@google.com> In-Reply-To: <20211116215715.645231-1-surenb@google.com> From: Suren Baghdasaryan Date: Mon, 22 Nov 2021 17:47:14 -0800 Message-ID: Subject: Re: [PATCH 1/2] mm: protect free_pgtables with mmap_lock write lock in exit_mmap To: akpm@linux-foundation.org Cc: mhocko@kernel.org, mhocko@suse.com, rientjes@google.com, willy@infradead.org, hannes@cmpxchg.org, guro@fb.com, riel@surriel.com, minchan@kernel.org, kirill@shutemov.name, aarcange@redhat.com, christian@brauner.io, hch@infradead.org, oleg@redhat.com, david@redhat.com, jannh@google.com, shakeelb@google.com, luto@kernel.org, christian.brauner@ubuntu.com, fweimer@redhat.com, jengelh@inai.de, timmurray@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: D6B64F00008F X-Stat-Signature: 5dzfxqu1qoqiwfs3919yazszp1kk7ans Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=IkYsbKDE; spf=pass (imf16.hostedemail.com: domain of surenb@google.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1637632043-109850 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Nov 16, 2021 at 1:57 PM Suren Baghdasaryan wrote: > > oom-reaper and process_mrelease system call should protect against > races with exit_mmap which can destroy page tables while they > walk the VMA tree. oom-reaper protects from that race by setting > MMF_OOM_VICTIM and by relying on exit_mmap to set MMF_OOM_SKIP > before taking and releasing mmap_write_lock. process_mrelease has > to elevate mm->mm_users to prevent such race. Both oom-reaper and > process_mrelease hold mmap_read_lock when walking the VMA tree. > The locking rules and mechanisms could be simpler if exit_mmap takes > mmap_write_lock while executing destructive operations such as > free_pgtables. > Change exit_mmap to hold the mmap_write_lock when calling > free_pgtables. Operations like unmap_vmas() and unlock_range() are not > destructive and could run under mmap_read_lock but for simplicity we > take one mmap_write_lock during almost the entire operation. Note > also that because oom-reaper checks VM_LOCKED flag, unlock_range() > should not be allowed to race with it. > In most cases this lock should be uncontended. Previously, Kirill > reported ~4% regression caused by a similar change [1]. We reran the > same test and although the individual results are quite noisy, the > percentiles show lower regression with 1.6% being the worst case [2]. > The change allows oom-reaper and process_mrelease to execute safely > under mmap_read_lock without worries that exit_mmap might destroy page > tables from under them. > > [1] https://lore.kernel.org/all/20170725141723.ivukwhddk2voyhuc@node.shutemov.name/ > [2] https://lore.kernel.org/all/CAJuCfpGC9-c9P40x7oy=jy5SphMcd0o0G_6U1-+JAziGKG6dGA@mail.gmail.com/ Friendly nudge. Michal, Matthew, from our discussion in https://lore.kernel.org/all/YXKhOKIIngIuJaYi@casper.infradead.org I was under the impression this change would be interesting for you. Any feedback? > > Signed-off-by: Suren Baghdasaryan > --- > mm/mmap.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index bfb0ea164a90..69b3036c6dee 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -3142,25 +3142,27 @@ void exit_mmap(struct mm_struct *mm) > * to mmu_notifier_release(mm) ensures mmu notifier callbacks in > * __oom_reap_task_mm() will not block. > * > - * This needs to be done before calling munlock_vma_pages_all(), > + * This needs to be done before calling unlock_range(), > * which clears VM_LOCKED, otherwise the oom reaper cannot > * reliably test it. > */ > (void)__oom_reap_task_mm(mm); > > set_bit(MMF_OOM_SKIP, &mm->flags); > - mmap_write_lock(mm); > - mmap_write_unlock(mm); > } > > + mmap_write_lock(mm); > if (mm->locked_vm) > unlock_range(mm->mmap, ULONG_MAX); > > arch_exit_mmap(mm); > > vma = mm->mmap; > - if (!vma) /* Can happen if dup_mmap() received an OOM */ > + if (!vma) { > + /* Can happen if dup_mmap() received an OOM */ > + mmap_write_unlock(mm); > return; > + } > > lru_add_drain(); > flush_cache_mm(mm); > @@ -3170,6 +3172,7 @@ void exit_mmap(struct mm_struct *mm) > unmap_vmas(&tlb, vma, 0, -1); > free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); > tlb_finish_mmu(&tlb); > + mmap_write_unlock(mm); > > /* > * Walk the list again, actually closing and freeing it, > -- > 2.34.0.rc1.387.gb447b232ab-goog >