From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.4 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24603C33CB1 for ; Thu, 16 Jan 2020 14:11:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D99B02077B for ; Thu, 16 Jan 2020 14:11:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="JFr3/SZ3" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D99B02077B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 647488E006E; Thu, 16 Jan 2020 09:11:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5F8178E003F; Thu, 16 Jan 2020 09:11:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 50ED88E006E; Thu, 16 Jan 2020 09:11:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0136.hostedemail.com [216.40.44.136]) by kanga.kvack.org (Postfix) with ESMTP id 39DD48E003F for ; Thu, 16 Jan 2020 09:11:43 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 0718D2466 for ; Thu, 16 Jan 2020 14:11:43 +0000 (UTC) X-FDA: 76383685686.23.pump21_83588c3e9ed23 X-HE-Tag: pump21_83588c3e9ed23 X-Filterd-Recvd-Size: 7284 Received: from mail-qk1-f196.google.com (mail-qk1-f196.google.com [209.85.222.196]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Thu, 16 Jan 2020 14:11:42 +0000 (UTC) Received: by mail-qk1-f196.google.com with SMTP id z14so19175672qkg.9 for ; Thu, 16 Jan 2020 06:11:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=E4TXCvIZZmY4SQCWQzeLR1MFoYNB31lrG2Qln+AaBO4=; b=JFr3/SZ3UQPbE6tN0WIDrmGEvvrNPfsLSUjntfSPf4H8CaB9Ec8m1JVzlwKdxQI7bR 3Wf7o15qkWvfU2BrXgUqDa/P5/s3JU86YwmHmVDJgHJVZpC48Wq+GwpiH70SVGZXzNCW tUB0BZs20iU/qGmkxxbEKTg+rfI7uWyRi6HD7SwwKfn6eeOx5NXa2L4khYzmYqdU/Q73 XJzZwnq1+SkVZNFtHtVAhdji1nJv/bWhk/VqEp4wX7VCiTOc5nBu+Cv/vpBioS4yMhnI nFwmiDzAw7e0YnFmBSZ6FJ2C6Eprq+qfGs/8uk/SzOBENKSHugI2ZSccx15T3V2aqTrv rb8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=E4TXCvIZZmY4SQCWQzeLR1MFoYNB31lrG2Qln+AaBO4=; b=nQRuv7IJu7FnZl0N3MS343Qy59M3jYcJ6nWZRWhgPHyx6+zlCE0U+cGglmiZamjKGg ew+jlXRskWRAw3F8VS3uRLqOknN/aVhTNhWA6CrghGSCi3lkd7DbEEeduGa1LeaZYUcG QHTAcvAx902ixEgYaZUVchqUPoCjdI+q3hDahEaSccEufG7KEhadIwLlfMCoA9pJJfag /h8yzclp/l+8LMIl1UoZbCDiMj5RBPQ9SH6KhMtkkwfdp+CNdBjodyALyiWk0qb286yj p5XF094ETiWZalf3uIugPveHHL92sUbzx+2GUq3GadijK25Yy+sRmdsGRQTncfRtrLdD 5PtA== X-Gm-Message-State: APjAAAXsO1AtuI4cPw716Yc+CfwCddaRYWPtvKbPamhRFLOeDhA3/VUf J+GvQomTTNuorEoDEsCWSH/Z3Q== X-Google-Smtp-Source: APXvYqx6uZpu4GcW1N0sXEyw9K5yx2S6JPwxGtld6dzchDnf5W2zeuoRuG/8THysf1WiLlTh8VyUTg== X-Received: by 2002:a37:5fc2:: with SMTP id t185mr32372460qkb.271.1579183901908; Thu, 16 Jan 2020 06:11:41 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-142-68-57-212.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.57.212]) by smtp.gmail.com with ESMTPSA id g53sm11292147qtk.76.2020.01.16.06.11.40 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 16 Jan 2020 06:11:41 -0800 (PST) Received: from jgg by mlx.ziepe.ca with local (Exim 4.90_1) (envelope-from ) id 1is5s4-0003yp-CZ; Thu, 16 Jan 2020 10:11:40 -0400 Date: Thu, 16 Jan 2020 10:11:40 -0400 From: Jason Gunthorpe To: Ralph Campbell Cc: "linux-rdma@vger.kernel.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "nouveau@lists.freedesktop.org" , "linux-kselftest@vger.kernel.org" , Jerome Glisse , John Hubbard , Christoph Hellwig , Andrew Morton , Ben Skeggs , Shuah Khan Subject: Re: [PATCH v6 4/6] mm/mmu_notifier: add mmu_interval_notifier_find() Message-ID: <20200116141140.GA10759@ziepe.ca> References: <20200113224703.5917-1-rcampbell@nvidia.com> <20200113224703.5917-5-rcampbell@nvidia.com> <20200114124956.GN20978@mellanox.com> <528c1cff-608c-d342-1e72-90d780555204@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <528c1cff-608c-d342-1e72-90d780555204@nvidia.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jan 15, 2020 at 02:05:24PM -0800, Ralph Campbell wrote: > > On 1/14/20 4:49 AM, Jason Gunthorpe wrote: > > On Mon, Jan 13, 2020 at 02:47:01PM -0800, Ralph Campbell wrote: > > > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > > > index 47ad9cc89aab..4efecc0f13cb 100644 > > > +++ b/mm/mmu_notifier.c > > > @@ -1171,6 +1171,39 @@ void mmu_interval_notifier_update(struct mmu_interval_notifier *mni, > > > } > > > EXPORT_SYMBOL_GPL(mmu_interval_notifier_update); > > > +struct mmu_interval_notifier *mmu_interval_notifier_find(struct mm_struct *mm, > > > + const struct mmu_interval_notifier_ops *ops, > > > + unsigned long start, unsigned long last) > > > +{ > > > + struct mmu_notifier_mm *mmn_mm = mm->mmu_notifier_mm; > > > + struct interval_tree_node *node; > > > + struct mmu_interval_notifier *mni; > > > + struct mmu_interval_notifier *res = NULL; > > > + > > > + spin_lock(&mmn_mm->lock); > > > + node = interval_tree_iter_first(&mmn_mm->itree, start, last); > > > + if (node) { > > > + mni = container_of(node, struct mmu_interval_notifier, > > > + interval_tree); > > > + while (true) { > > > + if (mni->ops == ops) { > > > + res = mni; > > > + break; > > > + } > > > + node = interval_tree_iter_next(&mni->interval_tree, > > > + start, last); > > > + if (!node) > > > + break; > > > + mni = container_of(node, struct mmu_interval_notifier, > > > + interval_tree); > > > + } > > > + } > > > + spin_unlock(&mmn_mm->lock); > > > > This doesn't seem safe at all, here we are returning a pointer to > > memory from the interval tree with out any kind of lifetime > > protection. > > It is memory that the driver has allocated and has full control over > the lifetime since the driver does all the insertions and removals. > The driver does have to hold the HW page table lock so lookups are > synchronized with interval insertions and removals and page table > entry insertions and removals. No.. the ->release is async, so having the driver hold a lock around all the mmu_interval_ APIS still doesn't make it safe. The element could be on the defered list and it could become freed at any moment. > > If the interval tree is read it must be left in the read lock state > > until the caller is done with the pointer. > > > > .. and this poses all sorts of questions about consistency with items > > on the deferred list. Should find return an item undergoing deletion? > > I don't think so. The deferred operations are all complete when > mmu_interval_read_begin() returns, and the sequence number check > with mmu_interval_read_retry() guarantees there have been no changes > while not holding the driver page table lock and calling hmm_range_fault(). It seems very dangerous to say, on one hand, that the driver is serialized because it holds a lock around all mmu_interval_* calls, while on the other saying that on rare edge cases find does not return a result that matches the serial-program-order sequence. This seems like a way to create bugs. For instance, if find is consistent with the defered list then it will not return any element that has a pending deletion and the above issue with lifetime wouldn't happen. However, I'm still not sure that providing an API tha requires the driver to provide tricky locking is the best idea. This basically says that if a driver uses find then every single other call to mmu_interval_* must be serialized with a single lock. Jason