From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0C56C4320E for ; Mon, 30 Aug 2021 13:07:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4CCC660E77 for ; Mon, 30 Aug 2021 13:07:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4CCC660E77 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id BA5456B0071; Mon, 30 Aug 2021 09:07:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B53DA6B0072; Mon, 30 Aug 2021 09:07:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D0FB8D0001; Mon, 30 Aug 2021 09:07:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0162.hostedemail.com [216.40.44.162]) by kanga.kvack.org (Postfix) with ESMTP id 88BD46B0071 for ; Mon, 30 Aug 2021 09:07:44 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 2A914180AD804 for ; Mon, 30 Aug 2021 13:07:44 +0000 (UTC) X-FDA: 78531774048.06.DE9FA14 Received: from mail-qv1-f46.google.com (mail-qv1-f46.google.com [209.85.219.46]) by imf23.hostedemail.com (Postfix) with ESMTP id C3BC490000A6 for ; Mon, 30 Aug 2021 13:07:43 +0000 (UTC) Received: by mail-qv1-f46.google.com with SMTP id z2so8170109qvl.10 for ; Mon, 30 Aug 2021 06:07:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=1qnygVIdbO5qF5LsQbT9hI/Ok2qT9hqw4DyjZOskK38=; b=FCbsWnp1YMdAAs88hVebCbV+bCns5Qdb54WRn1kBoOvfwzfzSYGllmcv2DaVXBUGrI biw7fjnU8dNz9qKs3ofhsuUdDhf4gGecWEpgv10rtdqNCC7NPCan0ag12uLILA3uv9Fl fX0YQZ2JcmSufsB1g80pckdiEbo8/wZnrbP8BbDziymWYLQ3fOUE4ZnjLs5NcZ+0d17K iSRlJ3Is1H/85BygstMwNJTywr4UlawNJPo9wpKRj3iVwFhu4dMYpCrVtmpDZ2psxdfb lwKdvNMD850i7auhmlR5sIu+zbhpuWIzHkUXgM1FUBWa7cs1mtmF+05dWg/lVzgo+IJe P3+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=1qnygVIdbO5qF5LsQbT9hI/Ok2qT9hqw4DyjZOskK38=; b=CZAyuzE+QuS6jhD1h5wD9fUZg7OvSsoJZnBHZ1HRRxyl+KGlkflXdnOmhZ+Paco58G 2MWyWx36UlLgqq4LZqkr9r1+VhMhKFIU4VYZ0Y0exCLPiWrma2BYdoSkPA4Kkm0J48UX I4pVrGoaEg2HFKbhcDjoSoDZDG3kEGfmOURcO1YKPFmdjjMY33AvyuLWtxzjvwDDHyVa oVffQaGlJnM7+dTF4w6MfLiJczxsMe5TFVr/tzP9lEQwI45V+WcxJ9UOq0Gcug6m/Pmr gIyWszS4j6Te0ZM5q0pcVRUHIP/WQlduYzskynR/5JhwpOahTy/3s8FVjxjN6doyBJG2 uJ4A== X-Gm-Message-State: AOAM530+OR3tXAH5HfloO2v9kZbI/hPCkOEN4Dkoy1ZFAwpHMRLLPltj dknOtdJDuXNCfsaeJ5ZS3N6qlw== X-Google-Smtp-Source: ABdhPJyJOTt2clWfHDjZgw7CKauIEY9N24woeOtrgL67oYkMzCrstsgj0LDvTSCcn/YsRwfBoO1BSw== X-Received: by 2002:a05:6214:250f:: with SMTP id gf15mr23180471qvb.2.1630328863099; Mon, 30 Aug 2021 06:07:43 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-162-113-129.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.113.129]) by smtp.gmail.com with ESMTPSA id c4sm9235157qkf.122.2021.08.30.06.07.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Aug 2021 06:07:42 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1mKh0n-007CHr-Ea; Mon, 30 Aug 2021 10:07:41 -0300 Date: Mon, 30 Aug 2021 10:07:41 -0300 From: Jason Gunthorpe To: Joao Martins Cc: linux-mm@kvack.org, Dan Williams , Vishal Verma , Dave Jiang , Naoya Horiguchi , Matthew Wilcox , John Hubbard , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Jonathan Corbet , Christoph Hellwig , nvdimm@lists.linux.dev, linux-doc@vger.kernel.org Subject: Re: [PATCH v4 08/14] mm/gup: grab head page refcount once for group of subpages Message-ID: <20210830130741.GO1200268@ziepe.ca> References: <20210827145819.16471-1-joao.m.martins@oracle.com> <20210827145819.16471-9-joao.m.martins@oracle.com> <20210827162552.GK1200268@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=FCbsWnp1; spf=pass (imf23.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.219.46 as permitted sender) smtp.mailfrom=jgg@ziepe.ca; dmarc=none X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C3BC490000A6 X-Stat-Signature: 7qtmhzjjpgadibgu3z4cofcwxdhoidxh X-HE-Tag: 1630328863-785520 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Aug 27, 2021 at 07:34:54PM +0100, Joao Martins wrote: > On 8/27/21 5:25 PM, Jason Gunthorpe wrote: > > On Fri, Aug 27, 2021 at 03:58:13PM +0100, Joao Martins wrote: > > > >> #if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && defined(CONFIG_TRANSPARENT_HUGEPAGE) > >> static int __gup_device_huge(unsigned long pfn, unsigned long addr, > >> unsigned long end, unsigned int flags, > >> struct page **pages, int *nr) > >> { > >> - int nr_start = *nr; > >> + int refs, nr_start = *nr; > >> struct dev_pagemap *pgmap = NULL; > >> int ret = 1; > >> > >> do { > >> - struct page *page = pfn_to_page(pfn); > >> + struct page *head, *page = pfn_to_page(pfn); > >> + unsigned long next = addr + PAGE_SIZE; > >> > >> pgmap = get_dev_pagemap(pfn, pgmap); > >> if (unlikely(!pgmap)) { > >> @@ -2252,16 +2265,25 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, > >> ret = 0; > >> break; > >> } > >> - SetPageReferenced(page); > >> - pages[*nr] = page; > >> - if (unlikely(!try_grab_page(page, flags))) { > >> - undo_dev_pagemap(nr, nr_start, flags, pages); > >> + > >> + head = compound_head(page); > >> + /* @end is assumed to be limited at most one compound page */ > >> + if (PageHead(head)) > >> + next = end; > >> + refs = record_subpages(page, addr, next, pages + *nr); > >> + > >> + SetPageReferenced(head); > >> + if (unlikely(!try_grab_compound_head(head, refs, flags))) { > >> + if (PageHead(head)) > >> + ClearPageReferenced(head); > >> + else > >> + undo_dev_pagemap(nr, nr_start, flags, pages); > >> ret = 0; > >> break; > > > > Why is this special cased for devmap? > > > > Shouldn't everything processing pud/pmds/etc use the same basic loop > > that is similar in idea to the 'for_each_compound_head' scheme in > > unpin_user_pages_dirty_lock()? > > > > Doesn't that work for all the special page type cases here? > > We are iterating over PFNs to create an array of base pages (regardless of page table > type), rather than iterating over an array of pages to work on. That is part of it, yes, but the slow bit here is to minimally find the head pages and do the atomics on them, much like the unpin_user_pages_dirty_lock() I would think this should be designed similar to how things work on the unpin side. Sweep the page tables to find a proper start/end - eg even if a compound is spread across multiple pte/pmd/pud/etc we should find a linear range of starting PFN (ie starting page*) and npages across as much of the page tables as we can manage. This is the same as where things end up in the unpin case where all the contiguous PFNs are grouped togeher into a range. Then 'assign' that range to the output array which requires walking over each compount_head in the range and pinning it, then writing out the tail pages to the output struct page array. And this approach should apply universally no matter what is under the pte's - ie huge pages, THPs and devmaps should all be treated the same way. Currently each case is different, like above which is unique to device_huge. The more we can process groups of pages in bulks the faster the whole thing will be. Jason Given that all these gup > functions already give you the boundary (end of pmd or end of pud, etc) then we just need > to grab the ref to pgmap and head page and save the tails. But sadly we need to handle the > base page case which is why there's this outer loop exists sadly. If it was just head > pages we wouldn't need the outer loop (and hence no for_each_compound_head, like the > hugetlb variant). > > But maybe I am being dense and you just mean to replace the outer loop with > for_each_compound_range(). I am a little stuck on the part that I anyways need to record > back the tail pages when iterating over the (single) head page. So I feel that it wouldn't > bring that much improvement, unless I missed your point. > > Joao >