From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8603AC43331 for ; Thu, 26 Mar 2020 13:06:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2904D20737 for ; Thu, 26 Mar 2020 13:06:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lca.pw header.i=@lca.pw header.b="Sz8ox9WA" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2904D20737 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=lca.pw Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AD6966B009C; Thu, 26 Mar 2020 09:06:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A87C66B009D; Thu, 26 Mar 2020 09:06:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 99CFB6B009E; Thu, 26 Mar 2020 09:06:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0110.hostedemail.com [216.40.44.110]) by kanga.kvack.org (Postfix) with ESMTP id 813446B009C for ; Thu, 26 Mar 2020 09:06:16 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 752F4181AD0A7 for ; Thu, 26 Mar 2020 13:06:16 +0000 (UTC) X-FDA: 76637536752.19.girls97_8e00a8e3dcb1a X-HE-Tag: girls97_8e00a8e3dcb1a X-Filterd-Recvd-Size: 10247 Received: from mail-qt1-f193.google.com (mail-qt1-f193.google.com [209.85.160.193]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Thu, 26 Mar 2020 13:06:15 +0000 (UTC) Received: by mail-qt1-f193.google.com with SMTP id 10so5128682qtp.1 for ; Thu, 26 Mar 2020 06:06:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lca.pw; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=snSh1LNulrRLcXG+TYi4BzDL4VqELsUpACJ8hYZBx+M=; b=Sz8ox9WAVamLSvG2fTKg5V8bB7w7CGf3HnqfIrZTz07ltirgtoVaVzf0qBvfG/lLA/ vIazm9rqCK4hUTgRHtLV4+WCx62ibIXHH3yJ56HcRFDiHe2GJmFm4DP79jy2jV2nOcXP AOyPYpjFpGfT9Mtj3Lo31KN1VLQxY7MwZc3DeNBxLLxUvITFdJvb3I98N6rWgSOWFHxk 3XTTJF46OmzFU6WSZDpCwoXM+rHB2AmgLRhXT+rQ0eOaGip3zqFAG5BHOYudMN+VB7q3 E9h9QCY0JTjyO4XR3w1ft+oJTutPsnznQ+wGjVMWFhNbItXWLmUxO+VW9NIew6Ekb8PW UnTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=snSh1LNulrRLcXG+TYi4BzDL4VqELsUpACJ8hYZBx+M=; b=FNlstLTGxy5i8rLAQCwYWsh4IDnd4oVFMfYKR5AI6ogL3fwbNpZ9f0ey8EXq6OYhtH SbOMLL5lFJ1PspDnQZ0TZ49BMtw0UBP4nFeU+ISdBpsXwuO2GKy/dW/+6ymod8CbC69U zkF+u2sVMjNfcQ7yJC0OAOAzcIF7CY1wF+PINZqZhrbXKFlp5ltQZGEw00lx60yauo1P /nEgYG20iIu88AHg7vsT40E8CKZyQ/px4ONHdVaYL7KIgwu6pVC6YnlNKIxOMIctlrjf +6MNNphFinvj2345eoXmxrjscaiP9IIH1Fgc303zGPlPyD81ia865LkGbhuEsNNfqtvX xLow== X-Gm-Message-State: ANhLgQ3W4NONr5TsUIA9eEirC4dau+Qjo0OMG0VCdXW2Ix4KaeQaidDj m7bWAmTgOsm77kkfJEbJSoZm3w== X-Google-Smtp-Source: ADFU+vvynZd2AR7lvmhL7kXLl6RCVyqI/uYbH2XOd+VCIu/0cR1hcwlAl4JbWv1BcOFvRvr1XHW7Pw== X-Received: by 2002:ac8:175d:: with SMTP id u29mr8185341qtk.150.1585227974644; Thu, 26 Mar 2020 06:06:14 -0700 (PDT) Received: from [192.168.1.153] (pool-71-184-117-43.bstnma.fios.verizon.net. [71.184.117.43]) by smtp.gmail.com with ESMTPSA id 10sm1074370qtt.54.2020.03.26.06.06.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Mar 2020 06:06:13 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.60.0.2.5\)) Subject: Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end From: Qian Cai In-Reply-To: <20200211205252.GA10003@ziepe.ca> Date: Thu, 26 Mar 2020 09:06:12 -0400 Cc: Linux Memory Management List , Michal Hocko , =?utf-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Christoph Hellwig Content-Transfer-Encoding: quoted-printable Message-Id: References: <20200211205252.GA10003@ziepe.ca> To: Jason Gunthorpe X-Mailer: Apple Mail (2.3608.60.0.2.5) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Feb 11, 2020, at 3:52 PM, Jason Gunthorpe wrote: >=20 > Many users of the mmu_notifier invalidate_range callbacks maintain > locking/counters/etc on a paired basis and have long expected that > invalidate_range_start/end() are always paired. >=20 > For instance kvm_mmu_notifier_invalidate_range_end() undoes > kvm->mmu_notifier_count which was incremented during start(). >=20 > The recent change to add non-blocking notifiers breaks this assumption > when multiple notifiers are present in the list. When EAGAIN is = returned > from an invalidate_range_start() then no invalidate_range_ends() are > called, even if the subscription's start had previously been called. >=20 > Unfortunately, due to the RCU list traversal we can't reliably = generate a > subset of the linked list representing the notifiers already called to > generate an invalidate_range_end() pairing. >=20 > One case works correctly, if only one subscription requires > invalidate_range_end() and it is the last entry in the hlist. In this > case, when invalidate_range_start() returns -EAGAIN there will be = nothing > to unwind. >=20 > Keep the notifier hlist sorted so that notifiers that require > invalidate_range_end() are always last, and if two are added then = disable > non-blocking invalidation for the mm. >=20 > A warning is printed for this case, if in future we determine this = never > happens then we can simply fail during registration when there are > unsupported combinations of notifiers. This will generate a warning when running a simple qemu-kvm on arm64, qemu-kvm (37712) created two mmu_notifier's with invalidate_range_end(): = kvm_mmu_notifier_invalidate_range_end and = kvm_mmu_notifier_invalidate_range_end, non-blocking notifiers disabled >=20 > Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu = notifiers") > Cc: Michal Hocko > Cc: "J=C3=A9r=C3=B4me Glisse" > Cc: Christoph Hellwig > Signed-off-by: Jason Gunthorpe > --- > mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 50 insertions(+), 3 deletions(-) >=20 > v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@ziepe.ca/ > v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@ziepe.ca/ > * Abandon attempting to fix it by calling invalidate_range_end() = during an > EAGAIN start > * Just trivially ban multiple subscriptions > v3: > * Be more sophisticated, ban only multiple subscriptions if the result = is > a failure. Allows multiple subscriptions without invalidate_range_end > * Include a printk when this condition is hit (Michal) >=20 > At this point the rework Christoph requested during the first posting > is completed and there are now only 3 drivers using > invalidate_range_end(): >=20 > drivers/misc/mic/scif/scif_dma.c: .invalidate_range_end =3D = scif_mmu_notifier_invalidate_range_end}; > drivers/misc/sgi-gru/grutlbpurge.c: .invalidate_range_end =3D = gru_invalidate_range_end, > virt/kvm/kvm_main.c: .invalidate_range_end =3D = kvm_mmu_notifier_invalidate_range_end, >=20 > While I think it is unlikely that any of these drivers will be used in > combination with each other, display a printk in hopes to check. >=20 > Someday I expect to just fail the registration on this condition. >=20 > I think this also addresses Michal's concern about a 'big hammer' as > it probably won't ever trigger now. >=20 > Regards, > Jason >=20 > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > index ef3973a5d34a94..f3aba7a970f576 100644 > --- a/mm/mmu_notifier.c > +++ b/mm/mmu_notifier.c > @@ -37,7 +37,8 @@ struct lockdep_map = __mmu_notifier_invalidate_range_start_map =3D { > struct mmu_notifier_subscriptions { > /* all mmu notifiers registered in this mm are queued in this = list */ > struct hlist_head list; > - bool has_itree; > + u8 has_itree; > + u8 no_blocking; > /* to serialize the list modifications and hlist_unhashed */ > spinlock_t lock; > unsigned long invalidate_seq; > @@ -475,6 +476,10 @@ static int mn_hlist_invalidate_range_start( > int ret =3D 0; > int id; >=20 > + if (unlikely(subscriptions->no_blocking && > + !mmu_notifier_range_blockable(range))) > + return -EAGAIN; > + > id =3D srcu_read_lock(&srcu); > hlist_for_each_entry_rcu(subscription, &subscriptions->list, = hlist) { > const struct mmu_notifier_ops *ops =3D = subscription->ops; > @@ -590,6 +595,48 @@ void __mmu_notifier_invalidate_range(struct = mm_struct *mm, > srcu_read_unlock(&srcu, id); > } >=20 > +/* > + * Add a hlist subscription to the list. The list is kept sorted by = the > + * existence of ops->invalidate_range_end. If there is more than one > + * invalidate_range_end in the list then this process can no longer = support > + * non-blocking invalidation. > + * > + * non-blocking invalidation is problematic as a requirement to block = results in > + * the invalidation being aborted, however due to the use of RCU we = have no > + * reliable way to ensure that every sueessful = invalidate_range_start() results > + * in a call to invalidate_range_end(). > + * > + * Thus to support blocking only the last subscription in the list = can have > + * invalidate_range_end() set. > + */ > +static void > +mn_hist_add_subscription(struct mmu_notifier_subscriptions = *subscriptions, > + struct mmu_notifier *subscription) > +{ > + struct mmu_notifier *last =3D NULL; > + struct mmu_notifier *itr; > + > + hlist_for_each_entry(itr, &subscriptions->list, hlist) > + last =3D itr; > + > + if (last && last->ops->invalidate_range_end && > + subscription->ops->invalidate_range_end) { > + subscriptions->no_blocking =3D true; > + pr_warn_once( > + "%s (%d) created two mmu_notifier's with = invalidate_range_end(): %ps and %ps, non-blocking notifiers disabled\n", > + current->comm, current->pid, > + last->ops->invalidate_range_end, > + subscription->ops->invalidate_range_end); > + } > + if (!last || !last->ops->invalidate_range_end) > + subscriptions->no_blocking =3D false; > + > + if (last && subscription->ops->invalidate_range_end) > + hlist_add_behind_rcu(&subscription->hlist, = &last->hlist); > + else > + hlist_add_head_rcu(&subscription->hlist, = &subscriptions->list); > +} > + > /* > * Same as mmu_notifier_register but here the caller must hold the = mmap_sem in > * write mode. A NULL mn signals the notifier is being registered for = itree > @@ -660,8 +707,8 @@ int __mmu_notifier_register(struct mmu_notifier = *subscription, > subscription->users =3D 1; >=20 > spin_lock(&mm->notifier_subscriptions->lock); > - hlist_add_head_rcu(&subscription->hlist, > - &mm->notifier_subscriptions->list); > + mn_hist_add_subscription(mm->notifier_subscriptions, > + subscription); > spin_unlock(&mm->notifier_subscriptions->lock); > } else > mm->notifier_subscriptions->has_itree =3D true; > --=20 > 2.25.0 >=20