From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2A14C43334 for ; Wed, 1 Jun 2022 21:36:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 429D46B0081; Wed, 1 Jun 2022 17:36:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3D5386B0082; Wed, 1 Jun 2022 17:36:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 277176B0083; Wed, 1 Jun 2022 17:36:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 15F0F6B0081 for ; Wed, 1 Jun 2022 17:36:44 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id DBBE680AB5 for ; Wed, 1 Jun 2022 21:36:43 +0000 (UTC) X-FDA: 79530976686.25.DAEA38C Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf02.hostedemail.com (Postfix) with ESMTP id E26348006B for ; Wed, 1 Jun 2022 21:36:37 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 8071FB819B8; Wed, 1 Jun 2022 21:36:41 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5DBA7C385A5; Wed, 1 Jun 2022 21:36:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1654119400; bh=N11gAQiSefrt49HoG+VdPLrSyzJ8gsx80XzIVHABxP8=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=Iptty91BhQzTEYnVSDH/IZ1dLjBE84XzIfhYEZ4yix6nJ70yTApfoXwFDuNibyU2c OzOgEScmNBKLWnNjw2lL5oH5TiIqUDcpVFGkdmBwA2cc/Z9n1ZE8PTfvzwTwpqFg8I xCQGqthWF9NNimTpr9X6/Oaf26/JBGAyITBOvO68= Date: Wed, 1 Jun 2022 14:36:38 -0700 From: Andrew Morton To: Suren Baghdasaryan Cc: mhocko@suse.com, rientjes@google.com, willy@infradead.org, hannes@cmpxchg.org, guro@fb.com, minchan@kernel.org, kirill@shutemov.name, aarcange@redhat.com, brauner@kernel.org, hch@infradead.org, oleg@redhat.com, david@redhat.com, jannh@google.com, shakeelb@google.com, peterx@redhat.com, jhubbard@nvidia.com, shuah@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, kernel-team@android.com, Liam Howlett Subject: Re: [PATCH RESEND v2 1/2] mm: drop oom code from exit_mmap Message-Id: <20220601143638.9e78c470d2c980053cc8059a@linux-foundation.org> In-Reply-To: <20220531223100.510392-1-surenb@google.com> References: <20220531223100.510392-1-surenb@google.com> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Stat-Signature: h81zqpkutjydcb8sjojh68j7zc6fu3ue X-Rspam-User: Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=Iptty91B; dmarc=none; spf=pass (imf02.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: E26348006B X-HE-Tag: 1654119397-499476 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 31 May 2022 15:30:59 -0700 Suren Baghdasaryan wrote: > The primary reason to invoke the oom reaper from the exit_mmap path used > to be a prevention of an excessive oom killing if the oom victim exit > races with the oom reaper (see [1] for more details). The invocation has > moved around since then because of the interaction with the munlock > logic but the underlying reason has remained the same (see [2]). > > Munlock code is no longer a problem since [3] and there shouldn't be > any blocking operation before the memory is unmapped by exit_mmap so > the oom reaper invocation can be dropped. The unmapping part can be done > with the non-exclusive mmap_sem and the exclusive one is only required > when page tables are freed. > > Remove the oom_reaper from exit_mmap which will make the code easier to > read. This is really unlikely to make any observable difference although > some microbenchmarks could benefit from one less branch that needs to be > evaluated even though it almost never is true. > > [1] 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") > [2] 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3") > [3] a213e5cf71cb ("mm/munlock: delete munlock_vma_pages_all(), allow oomreap") > I've just reinstated the mapletree patchset so there are some conflicting changes. > --- a/include/linux/oom.h > +++ b/include/linux/oom.h > @@ -106,8 +106,6 @@ static inline vm_fault_t check_stable_address_space(struct mm_struct *mm) > return 0; > } > > -bool __oom_reap_task_mm(struct mm_struct *mm); > - > long oom_badness(struct task_struct *p, > unsigned long totalpages); > > diff --git a/mm/mmap.c b/mm/mmap.c > index 2b9305ed0dda..b7918e6bb0db 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -3110,30 +3110,13 @@ void exit_mmap(struct mm_struct *mm) > /* mm's last user has gone, and its about to be pulled down */ > mmu_notifier_release(mm); > > - if (unlikely(mm_is_oom_victim(mm))) { > - /* > - * Manually reap the mm to free as much memory as possible. > - * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard > - * this mm from further consideration. Taking mm->mmap_lock for > - * write after setting MMF_OOM_SKIP will guarantee that the oom > - * reaper will not run on this mm again after mmap_lock is > - * dropped. > - * > - * Nothing can be holding mm->mmap_lock here and the above call > - * to mmu_notifier_release(mm) ensures mmu notifier callbacks in > - * __oom_reap_task_mm() will not block. > - */ > - (void)__oom_reap_task_mm(mm); > - set_bit(MMF_OOM_SKIP, &mm->flags); > - } > - > - mmap_write_lock(mm); > + mmap_read_lock(mm); Unclear why this patch fiddles with the mm_struct locking in this fashion - changelogging that would have been helpful. But iirc mapletree wants to retain a write_lock here, so I ended up with void exit_mmap(struct mm_struct *mm) { struct mmu_gather tlb; struct vm_area_struct *vma; unsigned long nr_accounted = 0; MA_STATE(mas, &mm->mm_mt, 0, 0); int count = 0; /* mm's last user has gone, and its about to be pulled down */ mmu_notifier_release(mm); mmap_write_lock(mm); arch_exit_mmap(mm); vma = mas_find(&mas, ULONG_MAX); if (!vma) { /* Can happen if dup_mmap() received an OOM */ mmap_write_unlock(mm); return; } lru_add_drain(); flush_cache_mm(mm); tlb_gather_mmu_fullmm(&tlb, mm); /* update_hiwater_rss(mm) here? but nobody should be looking */ /* Use ULONG_MAX here to ensure all VMAs in the mm are unmapped */ unmap_vmas(&tlb, &mm->mm_mt, vma, 0, ULONG_MAX); /* * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper * because the memory has been already freed. Do not bother checking * mm_is_oom_victim because setting a bit unconditionally is cheaper. */ set_bit(MMF_OOM_SKIP, &mm->flags); free_pgtables(&tlb, &mm->mm_mt, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); tlb_finish_mmu(&tlb); /* * Walk the list again, actually closing and freeing it, with preemption * enabled, without holding any MM locks besides the unreachable * mmap_write_lock. */ do { if (vma->vm_flags & VM_ACCOUNT) nr_accounted += vma_pages(vma); remove_vma(vma); count++; cond_resched(); } while ((vma = mas_find(&mas, ULONG_MAX)) != NULL); BUG_ON(count != mm->map_count); trace_exit_mmap(mm); __mt_destroy(&mm->mm_mt); mm->mmap = NULL; mmap_write_unlock(mm); vm_unacct_memory(nr_accounted); }