From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1368C0650F for ; Thu, 8 Aug 2019 22:59:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6047C2173C for ; Thu, 8 Aug 2019 22:59:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="qK96gYx0" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6047C2173C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DF0066B0003; Thu, 8 Aug 2019 18:59:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA09C6B0006; Thu, 8 Aug 2019 18:59:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C1B8E6B0007; Thu, 8 Aug 2019 18:59:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 88D5A6B0003 for ; Thu, 8 Aug 2019 18:59:18 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id y7so9446657pgq.3 for ; Thu, 08 Aug 2019 15:59:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding:dkim-signature; bh=M49TAEp0rF+Rj/ENVh2GXbZ42kgeSpTrxkT20El97xE=; b=clu4kI/rrxu4AK8aLKmmsEU9Y62wtlDtBwMk4c263oYJnyYEjslf54LqIHTOEDsXsx 5BuJH6EN9y4vRbUOSctdH+bzax2DLPVdVQ48Lmnk3BjYfZErh5cgeiXmPe3AcXwODmjW tZ2195Fxm4zHzvIMzIdeO0linBJyah2iagCXnRfbR7uEw7eMYc+IumVxZiqd4W+R2zrp 1ckgEVz6PA/S7L/WSwwaum0hSjmBKyBm+mluORzmRcfSAEyVtN351tjmlg9RzPE0RCqh oNnQ3rxpASHAADGwYoNz7R6SWod9v0wCrbLyqGYIPf1u7Dqa0+WO8MFMFZ4VqCBEmQt6 d0Ow== X-Gm-Message-State: APjAAAVewb/d+WtVlqEqUpYrOeDY/HileIJ7hvCpJJDdGe5Kq2IX/SRj jY33wvRYAONEqmhIzGv8kxfS8eWxN4QgSedHIqdyU4ZK40ZK0XdcVS0CEbBPsbxEWuQRHbC2yO4 T5YWA57WscYRc9x1vm4Tr2Ns/E3gdYGUwzRLcF9hESW7fA8E+jkkKxaVzaaU+6c9ZeA== X-Received: by 2002:a63:a35e:: with SMTP id v30mr14264332pgn.129.1565305158120; Thu, 08 Aug 2019 15:59:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqxJ7Q84C6somPVqE4/aQM3s5DO3gdZwpDS1OOkut4dtWz4GAXxx0AZAjNKwccZj6HruWaYW X-Received: by 2002:a63:a35e:: with SMTP id v30mr14264283pgn.129.1565305156978; Thu, 08 Aug 2019 15:59:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565305156; cv=none; d=google.com; s=arc-20160816; b=fMsjoq9aGls7SyYVdukE3vg0I90xKIqBfPEtK7DMVqwVeFTjsFLxwL1/1L9UEdJF4h xJ1qhwny4Vp1bFr8YuslM0lhXUIHweyGn5AHjTVi1ks1kwUvoywh51LVBwIL8c0ct4OY SsghCUbf/p8Y8PZVkBszUrSyIJV1B68aQawWsNpGc6aYJsivjWUccHECT1FUEA4KeMAB A27iJbVLbaP2OB08OEASo1UCcwQKC7JPQEJuOrvAN7Xm9iYVzGoj+Id9kBWK7vKU25NU g0FQesvAmI5xP6i5wuJzy07fsV6kd0UfQMxlI4xTN8J1x1s2pNAV3dZT8IOVJlICxFa+ uKsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=dkim-signature:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:references:cc :to:from:subject; bh=M49TAEp0rF+Rj/ENVh2GXbZ42kgeSpTrxkT20El97xE=; b=ppCDalZrxaR9i1ThxtU+SkHeUUj/tjUfbPbqoJCjnWdlvx5Rnd5yb3xKrfdF6PDju+ i9pFnlS9UKmxN5HUn4bbSg/91zeyohePLdgsv7H8nXbIXA7CW19rS3i5u16R6WEOLVfD wF0c4vPt99Tt7YSJ8L7Yp1GN3h1PLENSooglTgR1NW2ykwOoqd63F+c9sPTTIuuvBhny ihU+K4A0IaXrALHYk5z/nHQZymT6kPL8MSHg+GNr9+eaDmSQYPFGeM3NtGUVgwSzF+2u 8tc4HgB2WqlRHo/8qYVUwjVREWVnBoQbMPqHjbzWbtpnc4KqPlFvqy55/PYM7GNRikzw fI8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=qK96gYx0; spf=pass (google.com: domain of jhubbard@nvidia.com designates 216.228.121.64 as permitted sender) smtp.mailfrom=jhubbard@nvidia.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: from hqemgate15.nvidia.com (hqemgate15.nvidia.com. [216.228.121.64]) by mx.google.com with ESMTPS id f9si56186772pgc.510.2019.08.08.15.59.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 08 Aug 2019 15:59:16 -0700 (PDT) Received-SPF: pass (google.com: domain of jhubbard@nvidia.com designates 216.228.121.64 as permitted sender) client-ip=216.228.121.64; Authentication-Results: mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=qK96gYx0; spf=pass (google.com: domain of jhubbard@nvidia.com designates 216.228.121.64 as permitted sender) smtp.mailfrom=jhubbard@nvidia.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Thu, 08 Aug 2019 15:59:26 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Thu, 08 Aug 2019 15:59:16 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Thu, 08 Aug 2019 15:59:16 -0700 Received: from [10.110.48.28] (172.20.13.39) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 8 Aug 2019 22:59:15 +0000 Subject: Re: [PATCH 1/3] mm/mlock.c: convert put_page() to put_user_page*() From: John Hubbard To: Vlastimil Babka , Michal Hocko CC: Andrew Morton , Christoph Hellwig , Ira Weiny , Jan Kara , Jason Gunthorpe , Jerome Glisse , LKML , , , Dan Williams , Daniel Black , Matthew Wilcox , Mike Kravetz References: <20190805222019.28592-1-jhubbard@nvidia.com> <20190805222019.28592-2-jhubbard@nvidia.com> <20190807110147.GT11812@dhcp22.suse.cz> <01b5ed91-a8f7-6b36-a068-31870c05aad6@nvidia.com> <20190808062155.GF11812@dhcp22.suse.cz> <875dca95-b037-d0c7-38bc-4b4c4deea2c7@suse.cz> <306128f9-8cc6-761b-9b05-578edf6cce56@nvidia.com> X-Nvconfidentiality: public Message-ID: Date: Thu, 8 Aug 2019 15:59:15 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <306128f9-8cc6-761b-9b05-578edf6cce56@nvidia.com> X-Originating-IP: [172.20.13.39] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1565305166; bh=M49TAEp0rF+Rj/ENVh2GXbZ42kgeSpTrxkT20El97xE=; h=X-PGP-Universal:Subject:From:To:CC:References:X-Nvconfidentiality: Message-ID:Date:User-Agent:MIME-Version:In-Reply-To: X-Originating-IP:X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=qK96gYx06tpkH2q4CcpF6OFBCFRxcA8XFBMTGsGorSI049/l40m06bHRMJx1tJDS8 DAxdC1uODwBgaSWpMv4if5Iqt5i6qp+oDs/yhJULQwcTTRKfkWpB4SnYndw1nTt5km 1xsWvHi7FBF0yo5mOFnYaowlOnPQGpSB0fYDx8KBQ1ve/x25U7+7434V1pzCwlvZIk g3hlON18SJuJ2D/c0ndQmqy+GqYV484LBuFviFDcSWUg4RE4xEoQGHJIHzlfkY/R0f OLytPdbTTAL5LajiZx2D0leH8axGGC7GmnAFDCpEVhyRNyBbECyjZ/GMFaZn0/VRiJ mrtUWi75zSo4g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 8/8/19 12:20 PM, John Hubbard wrote: > On 8/8/19 4:09 AM, Vlastimil Babka wrote: >> On 8/8/19 8:21 AM, Michal Hocko wrote: >>> On Wed 07-08-19 16:32:08, John Hubbard wrote: >>>> On 8/7/19 4:01 AM, Michal Hocko wrote: >>>>> On Mon 05-08-19 15:20:17, john.hubbard@gmail.com wrote: >>>>>> From: John Hubbard >>>> Actually, I think follow_page_mask() gets all the pages, right? And the >>>> get_page() in __munlock_pagevec_fill() is there to allow a pagevec_release() >>>> later. >>> >>> Maybe I am misreading the code (looking at Linus tree) but munlock_vma_pages_range >>> calls follow_page for the start address and then if not THP tries to >>> fill up the pagevec with few more pages (up to end), do the shortcut >>> via manual pte walk as an optimization and use generic get_page there. >> > > Yes, I see it finally, thanks. :) > >> That's true. However, I'm not sure munlocking is where the >> put_user_page() machinery is intended to be used anyway? These are >> short-term pins for struct page manipulation, not e.g. dirtying of page >> contents. Reading commit fc1d8e7cca2d I don't think this case falls >> within the reasoning there. Perhaps not all GUP users should be >> converted to the planned separate GUP tracking, and instead we should >> have a GUP/follow_page_mask() variant that keeps using get_page/put_page? >> > > Interesting. So far, the approach has been to get all the gup callers to > release via put_user_page(), but if we add in Jan's and Ira's vaddr_pin_pages() > wrapper, then maybe we could leave some sites unconverted. > > However, in order to do so, we would have to change things so that we have > one set of APIs (gup) that do *not* increment a pin count, and another set > (vaddr_pin_pages) that do. > > Is that where we want to go...? > Oh, and meanwhile, I'm leaning toward a cheap fix: just use gup_fast() instead of get_page(), and also fix the releasing code. So this incremental patch, on top of the existing one, should do it: diff --git a/mm/mlock.c b/mm/mlock.c index b980e6270e8a..2ea272c6fee3 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -318,18 +318,14 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) /* * We won't be munlocking this page in the next phase * but we still need to release the follow_page_mask() - * pin. We cannot do it under lru_lock however. If it's - * the last pin, __page_cache_release() would deadlock. + * pin. */ - pagevec_add(&pvec_putback, pvec->pages[i]); + put_user_page(pages[i]); pvec->pages[i] = NULL; } __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); spin_unlock_irq(&zone->zone_pgdat->lru_lock); - /* Now we can release pins of pages that we are not munlocking */ - pagevec_release(&pvec_putback); - /* Phase 2: page munlock */ for (i = 0; i < nr; i++) { struct page *page = pvec->pages[i]; @@ -394,6 +390,8 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec, start += PAGE_SIZE; while (start < end) { struct page *page = NULL; + int ret; + pte++; if (pte_present(*pte)) page = vm_normal_page(vma, start, *pte); @@ -411,7 +409,13 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec, if (PageTransCompound(page)) break; - get_page(page); + /* + * Use get_user_pages_fast(), instead of get_page() so that the + * releasing code can unconditionally call put_user_page(). + */ + ret = get_user_pages_fast(start, 1, 0, &page); + if (ret != 1) + break; /* * Increase the address that will be returned *before* the * eventual break due to pvec becoming full by adding the page thanks, -- John Hubbard NVIDIA