From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AB3EC433E0 for ; Thu, 11 Mar 2021 00:28:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 66EBA64F60 for ; Thu, 11 Mar 2021 00:28:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 66EBA64F60 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EDA798D0257; Wed, 10 Mar 2021 19:28:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E8A228D0250; Wed, 10 Mar 2021 19:28:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D07D68D0257; Wed, 10 Mar 2021 19:28:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0045.hostedemail.com [216.40.44.45]) by kanga.kvack.org (Postfix) with ESMTP id B3EB58D0250 for ; Wed, 10 Mar 2021 19:28:10 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 77301181AEF10 for ; Thu, 11 Mar 2021 00:28:10 +0000 (UTC) X-FDA: 77905706340.12.FAF9B6F Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) by imf03.hostedemail.com (Postfix) with ESMTP id BC783C0001FE for ; Thu, 11 Mar 2021 00:28:07 +0000 (UTC) Received: by mail-qt1-f178.google.com with SMTP id j3so55055qtj.12 for ; Wed, 10 Mar 2021 16:28:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Z3YKt4+akWhMshxfqPj2vK5QRPyAtX0nM+/P8ZuJzWI=; b=kQhtmvIuopCDkiD0BrNpbYQWaqlyX9kPT1SkJiwKsAHPqXaEoOclDK4//zisHVBpaR 0RLcCC5s98ZVJcd8PswziLh/32dormacZQDWfNeszwexNBmNbXJF7pNQHHKkPB53t4K3 ayLIVAnbQowJ1GQrjyNvZ1cs2veHOHWV8rTSFIaxWjkCH03RHSm91YUvNRp4LMLuIe/p DqbCuO/XSMeOcvGdNE+RgZ73LPwsXVuUX6ONyxAoP9zTXxpTYc6FR5n7tkWCXQfBafpc 5Bpd+f/qLd3nUNE+dymOxq4r+Tk3I2pC185Wt6UQ/uLKt//8RQLrF+sRmn6MQBU9+snM 9Ubw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Z3YKt4+akWhMshxfqPj2vK5QRPyAtX0nM+/P8ZuJzWI=; b=k2nOdfE/ypSmcMFrgl3oWd5v1Tnom+kipbhyBlcfRHkY4FYmLEC0HfKp2h39njfVXF wip6eepdj3HzRo7qIuw8wz6GrZVfxf+iFXwHGgiKnhpxhK/ueVfttrxwm5glxDBpSoBl khdn+s04bAJa6LPnyGzerjXiRssJPmVmDys8h0vvpPEO16htrpm5haB/hSR9WCpQM+nN LDBwiQ7T9yWFntHDRyd6bsZy1Yymm54b/0mqyjf8Jc09AWkR4NeMfOsS9gdrztNQHeuL UrYeoQFAr50abnKR9LzRne1+bg/tZcskW7yAZrdZ0EPRihBuog7+T1iPAQ1jCWQ7xmXL EizQ== X-Gm-Message-State: AOAM532EITD8f2vbLJrSlVpLL2Tfrst0b4uY6+b9mqfTSIbJl3YwDxmw D7bAc1N2pgbqsuimCiWW2Q3gQw== X-Google-Smtp-Source: ABdhPJyWW591/HyiVr02S5RNoD/FHv7m+kcsWBCJy1T5O7ymTxyaq33JmAa84+pjUz1+Izm8JQlEVw== X-Received: by 2002:ac8:6059:: with SMTP id k25mr5338430qtm.251.1615422489207; Wed, 10 Mar 2021 16:28:09 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-142-162-115-133.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.115.133]) by smtp.gmail.com with ESMTPSA id l65sm728774qkf.113.2021.03.10.16.28.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Mar 2021 16:28:08 -0800 (PST) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1lK9BP-00Aywe-Ve; Wed, 10 Mar 2021 20:28:07 -0400 Date: Wed, 10 Mar 2021 20:28:07 -0400 From: Jason Gunthorpe To: Sean Christopherson Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, David Rientjes , Ben Gardon , Michal Hocko , =?utf-8?B?SsOpcsO0bWU=?= Glisse , Andrea Arcangeli , Johannes Weiner , Dimitri Sivanich Subject: Re: [PATCH] mm/oom_kill: Ensure MMU notifier range_end() is paired with range_start() Message-ID: <20210311002807.GQ444867@ziepe.ca> References: <20210310213117.1444147-1-seanjc@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210310213117.1444147-1-seanjc@google.com> X-Stat-Signature: na8ys3io55u64kscww7okhnmfm9aiaph X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: BC783C0001FE Received-SPF: none (ziepe.ca>: No applicable sender policy available) receiver=imf03; identity=mailfrom; envelope-from=""; helo=mail-qt1-f178.google.com; client-ip=209.85.160.178 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615422487-259427 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 10, 2021 at 01:31:17PM -0800, Sean Christopherson wrote: > Invoke the MMU notifier's .invalidate_range_end() callbacks even if one > of the .invalidate_range_start() callbacks failed. If there are multiple > notifiers, the notifier that did not fail may have performed actions in > its ...start() that it expects to unwind via ...end(). Per the > mmu_notifier_ops documentation, ...start() and ...end() must be paired. No this is not OK, if invalidate_start returns EBUSY invalidate_end should *not* be called. As you observed: > The only in-kernel usage that is fatally broken is the SGI UV GRU driver, > which effectively blocks and sleeps fault handlers during ...start(), and > unblocks/wakes the handlers during ...end(). But, the only users that > can fail ...start() are the i915 and Nouveau drivers, which are unlikely > to collide with the SGI driver. It used to be worse but I've since moved most of the other problematic users to the itree notifier which doesn't have the problem. > KVM is the only other user of ...end(), and while KVM also blocks fault > handlers in ...start(), the fault handlers do not sleep and originate in KVM will have its mmu_notifier_count become imbalanced: static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, const struct mmu_notifier_range *range) { kvm->mmu_notifier_count++; static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, const struct mmu_notifier_range *range) { kvm->mmu_notifier_count--; Which I believe is fatal to kvm? These notifiers certainly do not only happen at process exit. So, both of the remaining _end users become corrupted with this patch! I've tried to fix this before, the only thing that seems like it will work is to sort the hlist and only call ends that have succeeded their starts by comparing pointers with <. This is because the hlist can have items removed concurrently under SRCU so there is no easy way to compute the subset that succeeded in calling start. I had a prior effort to just ban more than 1 hlist notifier with end, but it turns out kvm on ARM uses two all the time (IIRC) > Found by inspection. Verified by adding a second notifier in KVM > that AFAIK it is a non-problem in real life because kvm is not mixed with notifier_start's that fail (and GRU is dead?). Everything else was fixed by moving to itree. Jason