From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37025C433F5 for ; Thu, 30 Sep 2021 11:11:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A5699619A6 for ; Thu, 30 Sep 2021 11:11:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A5699619A6 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id C10F5940097; Thu, 30 Sep 2021 07:11:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BC03C94003A; Thu, 30 Sep 2021 07:11:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A6086940097; Thu, 30 Sep 2021 07:11:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0139.hostedemail.com [216.40.44.139]) by kanga.kvack.org (Postfix) with ESMTP id 92E4D94003A for ; Thu, 30 Sep 2021 07:11:22 -0400 (EDT) Received: from smtpin40.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 421F42CBD5 for ; Thu, 30 Sep 2021 11:11:22 +0000 (UTC) X-FDA: 78643973604.40.BB20137 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf06.hostedemail.com (Postfix) with ESMTP id BAA72801AB2B for ; Thu, 30 Sep 2021 11:11:21 +0000 (UTC) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 9DA4822531; Thu, 30 Sep 2021 11:11:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1633000280; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=7xjS+An1ybzYPzXktYFVSP9hALV7kusAyZYGb55xAAc=; b=2Py6yv994VzbECfMKbQc4WamLyJvewa5HmuvCh0S5TugYigvQhFVS1DbitI1XW+NYb6sW7 s4K/Am2C1CzJ3iN+aoTI3rZVs1B0gqkCkpea6g3BHY6UBDZXNq+H3oeKXL7LpbUYMSaSq7 Al9mtflvww5JGU653l2BRt550qL4834= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1633000280; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=7xjS+An1ybzYPzXktYFVSP9hALV7kusAyZYGb55xAAc=; b=UN9C48lxv7NpN8DOD9QyVBr73ltzfJzr4DG83FnjRQblinHLuK7LaSntNrfesdBYInfky1 E2qNEajRxwokS3Bg== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id 92972A3B81; Thu, 30 Sep 2021 11:11:20 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 295A11F2BA4; Thu, 30 Sep 2021 13:11:20 +0200 (CEST) Date: Thu, 30 Sep 2021 13:11:20 +0200 From: Jan Kara To: John Hubbard Cc: Linus Torvalds , Peter Xu , Linux MM Mailing List , Jason Gunthorpe , Jan Kara , Andrew Morton , Andrea Arcangeli Subject: Re: Possible race with page_maybe_dma_pinned? Message-ID: <20210930111120.GA21575@quack2.suse.cz> References: <9636a101-4445-0ede-e3ad-4cecd531f433@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9636a101-4445-0ede-e3ad-4cecd531f433@nvidia.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspamd-Queue-Id: BAA72801AB2B X-Stat-Signature: hcfg4fkp9ty81jt6o8e7xwrfez4hqcrg Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=2Py6yv99; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=UN9C48lx; spf=pass (imf06.hostedemail.com: domain of jack@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none X-Rspamd-Server: rspam06 X-HE-Tag: 1633000281-676601 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed 29-09-21 18:57:33, John Hubbard wrote: > On 9/29/21 15:47, Linus Torvalds wrote: > > On Wed, Sep 29, 2021 at 12:57 PM Peter Xu wrote: > > > > > > Now we have 3 callers of page_maybe_dma_pinned(): > > > > > > 1. page_needs_cow_for_dma > > > 2. pte_is_pinned > > > 3. shrink_page_list > > > > > > The 1st one is good as it takes the seqlock for write properly. The 2nd & 3rd > > > are missing, we may need to add them. > > > > Well, the pte_is_pinned() case at least could do the seqlock in > > clear_soft_dirty() - it has the vma and mm available. > > That part seems straightforward, OK. > > > > > The page shrinker has always been problematic since it doesn't have > > the vm (and by "always" I mean "modern times" - long ago we used to > > scan virtually, in the days before rmap) > > > > One option might be for fast-gup to give up on locked pages. I think > > the page lock is the only thing that shrink_page_list() serializes > > with. > > > > In order to avoid locked pages in gup fast, it is easiest to do a > check for locked pages *after* fast-pinning them, and unpin them before > returning to the caller. This makes the change much smaller. > > However, doing so leaves a window of time during which the pages are > still marked as maybe-dma-pinned, although those pages are never > returned to the caller as such. There is already code that is subject to > this in lockless_pages_from_mm(), for the case of a failed seqlock read. > I'm thinking it's probably OK, because the pages are not going to be > held long-term. They will be unpinned before returning from > lockless_pages_from_mm(). > > The counter argument is that this is merely making the race window > smaller, which is usually something that I argue against because it just > leads to harder-to-find bugs... Yeah, what you propose actually does not guarantee that the reclaim and page pinning cannot race in a way that the page gets unmapped and gup_fast() returns a pinned page. Which is what the code in shrink_page_list() tries to prevent AFAIU. The only place where I can see us doing a sanely race-free check in shrink_page_list() path is inside try_to_unmap() - we could implement unmap_if_not_pinned semantics and inside the rmap walk we can bump up the seqcounts (it even conceptually makes some sense) to make sure page_maybe_dma_pinned() check we'd do during the rmap walk is not racing with pup_fast(). By the time we leave the seqcount critical section, the page will be unmapped so we can be sure there will be no new pins of the page. Honza > > To be specific, here's what I mean: > > diff --git a/mm/gup.c b/mm/gup.c > index 886d6148d3d03..8ba871a927668 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -2657,6 +2657,7 @@ static unsigned long lockless_pages_from_mm(unsigned long start, > { > unsigned long flags; > int nr_pinned = 0; > + int i; > unsigned seq; > > if (!IS_ENABLED(CONFIG_HAVE_FAST_GUP) || > @@ -2693,7 +2694,23 @@ static unsigned long lockless_pages_from_mm(unsigned long start, > unpin_user_pages(pages, nr_pinned); > return 0; > } > + > + /* > + * Avoiding locked pages, in this fast/lockless context, will > + * avoid interfering with shrink_page_list(), in particular. > + * Give up upon finding the first locked page, but keep the > + * earlier pages, so that slow gup does not have to re-pin them. > + */ > + for (i = 0; i < nr_pinned; i++) { > + if (PageLocked(pages[i])) { > + unpin_user_pages(&pages[i], nr_pinned - i); > + nr_pinned = i + 1; > + break; > + } > + } > } > + > + > return nr_pinned; > } > > > thanks, > -- > John Hubbard > NVIDIA -- Jan Kara SUSE Labs, CR