From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB866ECAAD4 for ; Tue, 30 Aug 2022 08:11:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 11570940008; Tue, 30 Aug 2022 04:11:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C5B0940007; Tue, 30 Aug 2022 04:11:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ECFBA940008; Tue, 30 Aug 2022 04:11:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id DD3C6940007 for ; Tue, 30 Aug 2022 04:11:54 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BEBCF1C69D2 for ; Tue, 30 Aug 2022 08:11:54 +0000 (UTC) X-FDA: 79855540548.03.6489AA7 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 59D3F12001F for ; Tue, 30 Aug 2022 08:11:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661847113; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XyVU1ACRwMw2c3Z4knWk/NSoLw2ZcqvVGKGSGx3FqMU=; b=iP0txUS6JPGwRvyEJeIxQNxxIiQY+b+rxaQTA5kB+XrKc01XkeHjPZoZ/wnxfES+Aj9jfd G9CaP2bUJKsHEKC8xdPm8G82jTgk21rXU25lls0An8gNTNq3OcfoDooMTJf56nb7Rns7xc idYeqYMvVn6u6Ur6buqmsWTZNeXOBak= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-623-RN4GKyg9PAOfyAM-NAmigQ-1; Tue, 30 Aug 2022 04:11:52 -0400 X-MC-Unique: RN4GKyg9PAOfyAM-NAmigQ-1 Received: by mail-wm1-f70.google.com with SMTP id h82-20020a1c2155000000b003a64d0510d9so6284514wmh.8 for ; Tue, 30 Aug 2022 01:11:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc; bh=XyVU1ACRwMw2c3Z4knWk/NSoLw2ZcqvVGKGSGx3FqMU=; b=8Oj1OpFZ0uX5UYuaJjgSIYSp8ANOFEvbaxWxICRTKmINjkbvP8Hv5vNz43jmAOgBlH C2BrwiBeAsjnCfFFX8V2JC+xJoeBro9gI8OD5w+m0pXdMyAwRFJWpJVOH66Lr5GW6Vew isEpF2qJjmRneASD5gNqOjDfnsdbpzTk7F6fKUVekx3y6UshwMRs9u2kSCiktVSq/9Js BCWcz1CmKosbpAbLzxX3NAjxMAkf0tJSJJSlZCJoKbNVuVtLBNE/HjzbS0kzcx3I3GE6 kElPL6vfCLaDFYdGQTYchGJ2xETHreLGfb3z3NtU7exds9tbqbH/cvvb1esYvIgiorNP 1aww== X-Gm-Message-State: ACgBeo0hJU8uFuiBNO3GlLPrEkGJ32ong1x8GyyJL0dJ1i6Z7EIAf+YY w22bNDQmbj86ymTyH5r1/w0o/CBcCOZ0QOo+7Ohueq653gqydek3OMVCIY+RDnFkCdWStIrHoI9 KMOGxNVT4NLE= X-Received: by 2002:a05:6000:1f85:b0:225:4057:e95f with SMTP id bw5-20020a0560001f8500b002254057e95fmr8021235wrb.661.1661847111475; Tue, 30 Aug 2022 01:11:51 -0700 (PDT) X-Google-Smtp-Source: AA6agR49o/2TxpeySc3SlwiyevzVDcAYLal2XehazCP+tO9Q3ufOCO3bGHmKcmkLHtYT5LbrAjLBzg== X-Received: by 2002:a05:6000:1f85:b0:225:4057:e95f with SMTP id bw5-20020a0560001f8500b002254057e95fmr8021211wrb.661.1661847111176; Tue, 30 Aug 2022 01:11:51 -0700 (PDT) Received: from ?IPV6:2003:cb:c70a:1000:ecb4:919b:e3d3:e20b? (p200300cbc70a1000ecb4919be3d3e20b.dip0.t-ipconnect.de. [2003:cb:c70a:1000:ecb4:919b:e3d3:e20b]) by smtp.gmail.com with ESMTPSA id b17-20020adfde11000000b0021eaf4138aesm10476909wrm.108.2022.08.30.01.11.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 30 Aug 2022 01:11:50 -0700 (PDT) Message-ID: <608934d4-466d-975e-6458-34a91ccb4669@redhat.com> Date: Tue, 30 Aug 2022 10:11:49 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.12.0 To: Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, inuxppc-dev@lists.ozlabs.org, linux-ia64@vger.kernel.org Cc: Baolin Wang , "Aneesh Kumar K . V" , Naoya Horiguchi , Michael Ellerman , Muchun Song , Andrew Morton References: <20220829234053.159158-1-mike.kravetz@oracle.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH] hugetlb: simplify hugetlb handling in follow_page_mask In-Reply-To: <20220829234053.159158-1-mike.kravetz@oracle.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661847114; a=rsa-sha256; cv=none; b=r6F/yoWuivk8ZQ2GsOz29X965HAbQwmP/G/r03fnp8JiRzv9YOGcwoYyvoxL3vb6L5PfJj B5Smm6Crkw8BxCevgxejaX+7GnbkmpwJFoXKBRBckAlwGhgU+256c0vPy3Km5qilcdDbnR OVVMhyX3wZGFLD33aJjUI6u4gPaP41I= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=iP0txUS6; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661847114; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XyVU1ACRwMw2c3Z4knWk/NSoLw2ZcqvVGKGSGx3FqMU=; b=7MOEJG/uvvo75q+Ru9WiWJmCQvcZMuykCf8Xk6mFMRyn/m19mi3YI4RB0YqaGHSSN+Lij3 5/bz8RokKmDP8cyGOh4/9Y4HVg3WrE4o4VW5LAr+C5YH7hGKXCbIR4jBwLQsimOplDzm8j JHlaFOWJtwNxbEOTeyCIdp0MqNA1ZpQ= Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=iP0txUS6; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 8u9zyueccy1xfui5mmdk6ub75hrn76xi X-Rspamd-Queue-Id: 59D3F12001F X-HE-Tag: 1661847114-260837 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 30.08.22 01:40, Mike Kravetz wrote: > During discussions of this series [1], it was suggested that hugetlb > handling code in follow_page_mask could be simplified. At the beginning Feel free to use a Suggested-by if you consider it appropriate. > of follow_page_mask, there currently is a call to follow_huge_addr which > 'may' handle hugetlb pages. ia64 is the only architecture which provides > a follow_huge_addr routine that does not return error. Instead, at each > level of the page table a check is made for a hugetlb entry. If a hugetlb > entry is found, a call to a routine associated with that entry is made. > > Currently, there are two checks for hugetlb entries at each page table > level. The first check is of the form: > if (p?d_huge()) > page = follow_huge_p?d(); > the second check is of the form: > if (is_hugepd()) > page = follow_huge_pd(). BTW, what about all this hugepd stuff in mm/pagewalk.c? Isn't this all dead code as we're essentially routing all hugetlb VMAs via walk_hugetlb_range? [yes, all that hugepd stuff in generic code that overcomplicates stuff has been annoying me for a long time] > > We can replace these checks, as well as the special handling routines > such as follow_huge_p?d() and follow_huge_pd() with a single routine to > handle hugetlb vmas. > > A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the > beginning of follow_page_mask. hugetlb_follow_page_mask will use the > existing routine huge_pte_offset to walk page tables looking for hugetlb > entries. huge_pte_offset can be overwritten by architectures, and already > handles special cases such as hugepd entries. > > [1] https://lore.kernel.org/linux-mm/cover.1661240170.git.baolin.wang@linux.alibaba.com/ > Signed-off-by: Mike Kravetz [...] > +static struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, > + unsigned long address, unsigned int flags) > +{ > + /* should never happen, but do not want to BUG */ > + return ERR_PTR(-EINVAL); Should there be a WARN_ON_ONCE() instead or could we use a BUILD_BUG_ON()? > +} [...] > @@ -851,10 +814,15 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, > > ctx->page_mask = 0; > > - /* make this handle hugepd */ > - page = follow_huge_addr(mm, address, flags & FOLL_WRITE); > - if (!IS_ERR(page)) { > - WARN_ON_ONCE(flags & (FOLL_GET | FOLL_PIN)); > + /* > + * Call hugetlb_follow_page_mask for hugetlb vmas as it will use > + * special hugetlb page table walking code. This eliminates the > + * need to check for hugetlb entries in the general walking code. > + */ Maybe also comment that ordinary GUP never ends up in here and instead directly uses follow_hugetlb_page(). This is for follow_page() handling only. [me suggestion to rename follow_hugetlb_page() still stands ;) ] > + if (is_vm_hugetlb_page(vma)) { > + page = hugetlb_follow_page_mask(vma, address, flags); > + if (!page) > + page = no_page_table(vma, flags); > return page; > } > > @@ -863,21 +831,6 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, > if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd))) > return no_page_table(vma, flags); > > - if (pgd_huge(*pgd)) { > - page = follow_huge_pgd(mm, address, pgd, flags); > - if (page) > - return page; > - return no_page_table(vma, flags); > - } > - if (is_hugepd(__hugepd(pgd_val(*pgd)))) { > - page = follow_huge_pd(vma, address, > - __hugepd(pgd_val(*pgd)), flags, > - PGDIR_SHIFT); > - if (page) > - return page; > - return no_page_table(vma, flags); > - } > - > return follow_p4d_mask(vma, address, pgd, flags, ctx); > } > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index d0617d64d718..b3da421ba5be 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -6190,6 +6190,62 @@ static inline bool __follow_hugetlb_must_fault(unsigned int flags, pte_t *pte, > return false; > } > > +struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, > + unsigned long address, unsigned int flags) > +{ > + struct hstate *h = hstate_vma(vma); > + struct mm_struct *mm = vma->vm_mm; > + unsigned long haddr = address & huge_page_mask(h); > + struct page *page = NULL; > + spinlock_t *ptl; > + pte_t *pte, entry; > + > + /* > + * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via > + * follow_hugetlb_page(). > + */ > + if (WARN_ON_ONCE(flags & FOLL_PIN)) > + return NULL; > + > + pte = huge_pte_offset(mm, haddr, huge_page_size(h)); > + if (!pte) > + return NULL; > + > +retry: > + ptl = huge_pte_lock(h, mm, pte); > + entry = huge_ptep_get(pte); > + if (pte_present(entry)) { > + page = pte_page(entry) + > + ((address & ~huge_page_mask(h)) >> PAGE_SHIFT); > + /* > + * Note that page may be a sub-page, and with vmemmap > + * optimizations the page struct may be read only. > + * try_grab_page() will increase the ref count on the > + * head page, so this will be OK. > + * > + * try_grab_page() should always succeed here, because we hold > + * the ptl lock and have verified pte_present(). > + */ > + if (WARN_ON_ONCE(!try_grab_page(page, flags))) { > + page = NULL; > + goto out; > + } > + } else { > + if (is_hugetlb_entry_migration(entry)) { > + spin_unlock(ptl); > + __migration_entry_wait_huge(pte, ptl); > + goto retry; > + } > + /* > + * hwpoisoned entry is treated as no_page_table in > + * follow_page_mask(). > + */ > + } > +out: > + spin_unlock(ptl); > + return page; > +} > + > long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, > struct page **pages, struct vm_area_struct **vmas, > unsigned long *position, unsigned long *nr_pages, > @@ -7140,123 +7196,6 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) > * These functions are overwritable if your architecture needs its own > * behavior. > */ [...] Numbers speak for themselves. Acked-by: David Hildenbrand -- Thanks, David / dhildenb