From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, HTML_MESSAGE,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02FA9C433DB for ; Wed, 10 Mar 2021 22:08:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8726164FC1 for ; Wed, 10 Mar 2021 22:08:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8726164FC1 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 224968D0240; Wed, 10 Mar 2021 17:08:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D4F68D023C; Wed, 10 Mar 2021 17:08:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 04E418D0240; Wed, 10 Mar 2021 17:08:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0026.hostedemail.com [216.40.44.26]) by kanga.kvack.org (Postfix) with ESMTP id DA0CB8D023C for ; Wed, 10 Mar 2021 17:08:44 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 9C593181AF5C3 for ; Wed, 10 Mar 2021 22:08:44 +0000 (UTC) X-FDA: 77905354968.10.2426B2C Received: from mail-ot1-f52.google.com (mail-ot1-f52.google.com [209.85.210.52]) by imf07.hostedemail.com (Postfix) with ESMTP id D21B3A0009ED for ; Wed, 10 Mar 2021 22:08:43 +0000 (UTC) Received: by mail-ot1-f52.google.com with SMTP id f8so12954399otp.8 for ; Wed, 10 Mar 2021 14:08:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Am6OklDomKz9rlru4aZG4nX4jC63FQxI+gvaQmzr4jo=; b=aloAbWiwyRF+6G4o6fp0oyBXb4EYokuKkIjA19/p0z22WqyEO5aHsC3Q8c1fFLD/bq 8p0WxVGv6ApwkRUX3OPlenxnQwUMZoPfkrKRhOH4Prlf2KI4/c+olDHaU2C4pDr/ONQY gSTtckmdmuagEmMaqQGWbphaRfDwOpS61vff1czUQIWS/kwJ/ywxiiZRnhaQWGQMEknF ilMPISelgysuk+04y2HAJCee84lbuj+Mz1HO9LY5O1fUoQ3mgxP/95MHC2QK0LlonJ77 jODeG2T62EHDdZWbPg4y0FJsvygEFyK/joBzcM+YRdTl+unbQnPqeQpOIKq6yCuWqDO/ rEqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Am6OklDomKz9rlru4aZG4nX4jC63FQxI+gvaQmzr4jo=; b=esdvx1CeqxINdSj7+4LIByhUkVQysYVqNgS7StxYVl6yWxqMs+B8pykWPi0zhpYy4K om/WbadrNzaFY9Uq3JaUuzk4Dw6sGTFEoahqoRUafHnnuvgTWRy5jUpVFpc4uEy4ETPD jyPo397vthW7Uec4JJ/hQFf9eSg3437hzKpSm+WqJfXgIj2sj+DuT2SRC9lOnsRnHQeb JcHTxcJEPo1RzAvsgNFm17d9ytDJI/HiDrQxGefp2aP8xG14Z+kvVKfUw1r7kMkt3aAL ZXzO2c/gajJWR+EdHWkgnZw16OQbbIjz7TzmPZPEcMNlQU9XFHOBe8ps0EAs25Z3tXCd sTQQ== X-Gm-Message-State: AOAM5331jE4Cjt066FyiYgVRgrZt2ihfM3ziRA9d2R+y2MppE3qIYK// rJlU7aG/j+d65xDqXjPhY70SCloXK8x/FyZYCJevhQ== X-Google-Smtp-Source: ABdhPJzOxcgPaKckZ+hb8vOFm17NanTmOrRjQtLnyQrQq2PKMPovRgkrY5ke9TkZl2Z5tfETxtUa3qSHeENQQPZqamc= X-Received: by 2002:a05:6830:1399:: with SMTP id d25mr4143771otq.249.1615414123094; Wed, 10 Mar 2021 14:08:43 -0800 (PST) MIME-Version: 1.0 References: <20210310213117.1444147-1-seanjc@google.com> In-Reply-To: <20210310213117.1444147-1-seanjc@google.com> From: Ben Gardon Date: Wed, 10 Mar 2021 14:08:31 -0800 Message-ID: Subject: Re: [PATCH] mm/oom_kill: Ensure MMU notifier range_end() is paired with range_start() To: Sean Christopherson Cc: Andrew Morton , linux-mm@kvack.org, LKML , David Rientjes , Jason Gunthorpe , Michal Hocko , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Andrea Arcangeli , Johannes Weiner , Dimitri Sivanich Content-Type: multipart/alternative; boundary="000000000000f9739a05bd35e864" X-Stat-Signature: 9n346pszu7j8qtxycturc8bnsza1ui6w X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: D21B3A0009ED Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf07; identity=mailfrom; envelope-from=""; helo=mail-ot1-f52.google.com; client-ip=209.85.210.52 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615414123-755932 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --000000000000f9739a05bd35e864 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Mar 10, 2021 at 1:31 PM Sean Christopherson wrote: > Invoke the MMU notifier's .invalidate_range_end() callbacks even if one > of the .invalidate_range_start() callbacks failed. If there are multiple > notifiers, the notifier that did not fail may have performed actions in > its ...start() that it expects to unwind via ...end(). Per the > mmu_notifier_ops documentation, ...start() and ...end() must be paired. > > The only in-kernel usage that is fatally broken is the SGI UV GRU driver, > which effectively blocks and sleeps fault handlers during ...start(), and > unblocks/wakes the handlers during ...end(). But, the only users that > can fail ...start() are the i915 and Nouveau drivers, which are unlikely > to collide with the SGI driver. > > KVM is the only other user of ...end(), and while KVM also blocks fault > handlers in ...start(), the fault handlers do not sleep and originate in > killable ioctl() calls. So while it's possible for the i915 and Nouveau > drivers to collide with KVM, the bug is benign for KVM since the process > is dying and KVM's guest is about to be terminated. > > So, as of today, the bug is likely benign. But, that may not always be > true, e.g. there is a potential use case for blocking memslot updates in > KVM while an invalidation is in-progress, and failure to unblock would > result in said updates being blocked indefinitely and hanging. > > Found by inspection. Verified by adding a second notifier in KVM that > periodically returns -EAGAIN on non-blockable ranges, triggering OOM, > and observing that KVM exits with an elevated notifier count. > > Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu > notifiers") > Cc: stable@vger.kernel.org > Cc: David Rientjes > Cc: Ben Gardon > Cc: Jason Gunthorpe > Cc: Michal Hocko > Cc: "J=C3=A9r=C3=B4me Glisse" > Cc: Andrea Arcangeli > Cc: Johannes Weiner > Cc: Dimitri Sivanich > Signed-off-by: Sean Christopherson > Reviewed-by: Ben Gardon Thanks for catching and fixing this. > --- > mm/oom_kill.c | 8 +++----- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index bc65ba4f5192..acc3ba8b2ed7 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -546,12 +546,10 @@ bool __oom_reap_task_mm(struct mm_struct *mm) > vma, mm, vma->vm_start, > vma->vm_end); > tlb_gather_mmu(&tlb, mm); > - if > (mmu_notifier_invalidate_range_start_nonblock(&range)) { > - tlb_finish_mmu(&tlb); > + if > (!mmu_notifier_invalidate_range_start_nonblock(&range)) > + unmap_page_range(&tlb, vma, range.start, > range.end, NULL); > + else > ret =3D false; > - continue; > - } > - unmap_page_range(&tlb, vma, range.start, > range.end, NULL); > mmu_notifier_invalidate_range_end(&range); > tlb_finish_mmu(&tlb); > } > -- > 2.30.1.766.gb4fecdf3b7-goog > > --000000000000f9739a05bd35e864 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Wed, Mar 10, 2021 at 1:31 PM Sean = Christopherson <seanjc@google.com> wrote:
In= voke the MMU notifier's .invalidate_range_end() callbacks even if one of the .invalidate_range_start() callbacks failed.=C2=A0 If there are multi= ple
notifiers, the notifier that did not fail may have performed actions in
its ...start() that it expects to unwind via ...end().=C2=A0 Per the
mmu_notifier_ops documentation, ...start() and ...end() must be paired.

The only in-kernel usage that is fatally broken is the SGI UV GRU driver, which effectively blocks and sleeps fault handlers during ...start(), and unblocks/wakes the handlers during ...end().=C2=A0 But, the only users that=
can fail ...start() are the i915 and Nouveau drivers, which are unlikely to collide with the SGI driver.

KVM is the only other user of ...end(), and while KVM also blocks fault
handlers in ...start(), the fault handlers do not sleep and originate in killable ioctl() calls.=C2=A0 So while it's possible for the i915 and N= ouveau
drivers to collide with KVM, the bug is benign for KVM since the process is dying and KVM's guest is about to be terminated.

So, as of today, the bug is likely benign.=C2=A0 But, that may not always b= e
true, e.g. there is a potential use case for blocking memslot updates in KVM while an invalidation is in-progress, and failure to unblock would
result in said updates being blocked indefinitely and hanging.

Found by inspection.=C2=A0 Verified by adding a second notifier in KVM that=
periodically returns -EAGAIN on non-blockable ranges, triggering OOM,
and observing that KVM exits with an elevated notifier count.

Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu noti= fiers")
Cc:
stable@vger= .kernel.org
Cc: David Rientjes <rientjes@google.com>
Cc: Ben Gardon <= bgardon@google.com>
Cc: Jason Gunthorpe <j= gg@ziepe.ca>
Cc: Michal Hocko <m= hocko@suse.com>
Cc: "J=C3=A9r=C3=B4me Glisse" <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Ben Gardon <bga= rdon@google.com>

Thanks for catching and fi= xing this.
=C2=A0
---
=C2=A0mm/oom_kill.c | 8 +++-----
=C2=A01 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index bc65ba4f5192..acc3ba8b2ed7 100644
--- a/mm/oom_kill.c
+++ b/= mm/oom_kill.c
@@ -546,12 +546,10 @@ bool __oom_reap_task_mm(struct mm_struct *mm)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 vma, mm, vma->vm_start,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 vma->vm_end);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 tlb_gather_mmu(&tlb, mm);
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0if (mmu_notifier_invalidate_range_start_nonblock(&range)) { -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0tlb_finish_mmu(&tlb);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0if (!mmu_notifier_invalidate_range_start_nonblock(&range)) +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0unmap_page_range(&tlb, vma, range= .start, range.end, NULL);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0else
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ret =3D false;
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0continue;
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0}
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0unmap_page_range(&tlb, vma, range.start, range.end, NULL); =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 mmu_notifier_invalidate_range_end(&range);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 tlb_finish_mmu(&tlb);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }
--
2.30.1.766.gb4fecdf3b7-goog

--000000000000f9739a05bd35e864--