From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B7EDC28D13 for ; Thu, 25 Aug 2022 07:25:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8CB0C940007; Thu, 25 Aug 2022 03:25:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 87A266B0075; Thu, 25 Aug 2022 03:25:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 74297940007; Thu, 25 Aug 2022 03:25:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 663DC6B0074 for ; Thu, 25 Aug 2022 03:25:36 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 3910A81421 for ; Thu, 25 Aug 2022 07:25:36 +0000 (UTC) X-FDA: 79837279872.16.B833C6C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf17.hostedemail.com (Postfix) with ESMTP id A139540004 for ; Thu, 25 Aug 2022 07:25:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661412335; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OYqYKEGJ+KP7LBoMzmuzXA931mPNc1sqXqZsujDSc4k=; b=Vp+o4zr4YsU6ox1zHVDvnMOHCvXTWi9oBIHB83+RuvZp4uaPKfMoRvkpn0rYMgTqOwE3DM q1xzGsIguP+TPR8xRVDA4G6aWWPDHf1piuaWHsslYcx9jHsCo6KLe6p3twhU7jumzXPsQ4 7nimW1a/M9oqedFeC7ZsphWdFXRr9EA= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-620-nhvs0i1aPXmw_BcJrNPo8w-1; Thu, 25 Aug 2022 03:25:33 -0400 X-MC-Unique: nhvs0i1aPXmw_BcJrNPo8w-1 Received: by mail-wm1-f72.google.com with SMTP id 203-20020a1c02d4000000b003a5f5bce876so2096366wmc.2 for ; Thu, 25 Aug 2022 00:25:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc; bh=OYqYKEGJ+KP7LBoMzmuzXA931mPNc1sqXqZsujDSc4k=; b=tEsvaBQq4Lxe6EfmZeUFEFNuxyeAvgW9kk8K/HWb3x4lzlEiUfkelCPho8JktPOzzZ rWRHH34F4FjUX1RcpR8uGUAR13nwOPmbkiGZCn6vJ/02TJDg7eQYl0yh34k3SVhsHUQQ s69MJly6ICVXlebv1QK1nZyoZ76FCIavISTktx1i5COLKXfCAqbcUBUXvpjL//AykGsw bi2Bx0uEvczpg+iSTvPays+ASybL5mU6UE/zr4oYNpEMZpzVVzmJR3Tw/+4ltnkvO2Qy BzlctpPR0xsRTNlAMD4TWjdyJMjxmzg70oYI07gG+864ot59olJX4Cbjbgs/onust9wr linQ== X-Gm-Message-State: ACgBeo2kDddQf9fcCB/Gae0d6Wz5Rm67ZZXcGMsxwbwx/kc5zpXzt1Oj Scw/ZXuE8boWSVlLJEDfAvqioHIaFJc2c7bH4BG0h/unzXbjIqwSLLQ/nyZ+2ToTSSbx3fFySws mkeLgTQNJv+0= X-Received: by 2002:a1c:7c08:0:b0:3a6:2569:4986 with SMTP id x8-20020a1c7c08000000b003a625694986mr7314969wmc.117.1661412332467; Thu, 25 Aug 2022 00:25:32 -0700 (PDT) X-Google-Smtp-Source: AA6agR6uRmhuzDhvDxgRxM6tzXCcfyWriynxXUjR84DVwohmFa/ChGoAnFeQNl95dlMx4+Wnun3CrA== X-Received: by 2002:a1c:7c08:0:b0:3a6:2569:4986 with SMTP id x8-20020a1c7c08000000b003a625694986mr7314952wmc.117.1661412332224; Thu, 25 Aug 2022 00:25:32 -0700 (PDT) Received: from ?IPV6:2a09:80c0:192:0:20af:34be:985b:b6c8? ([2a09:80c0:192:0:20af:34be:985b:b6c8]) by smtp.gmail.com with ESMTPSA id m13-20020a05600c4f4d00b003a3561d4f3fsm4564405wmq.43.2022.08.25.00.25.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 25 Aug 2022 00:25:31 -0700 (PDT) Message-ID: <887ca2e2-a7c5-93a7-46cb-185daccd4444@redhat.com> Date: Thu, 25 Aug 2022 09:25:31 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.12.0 To: Mike Kravetz , Baolin Wang Cc: akpm@linux-foundation.org, songmuchun@bytedance.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <0e5d92da043d147a867f634b17acbcc97a7f0e64.1661240170.git.baolin.wang@linux.alibaba.com> <4c24b891-04ce-2608-79d2-a75dc236533f@redhat.com> <376d2e0a-d28a-984b-903c-1f6451b04a15@linux.alibaba.com> <7d4e7f47-30a5-3cc6-dc9f-aa89120847d8@redhat.com> <64669c0a-4a6e-f034-a15b-c4a8deea9e5d@linux.alibaba.com> <7ee73879-e402-9175-eae8-41471d80d59e@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2 1/5] mm/hugetlb: fix races when looking up a CONT-PTE size hugetlb page In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661412335; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OYqYKEGJ+KP7LBoMzmuzXA931mPNc1sqXqZsujDSc4k=; b=ZT/HdM6JCmA9jKPNikJH/Iq7bvf6tMqZSmFWpMbl/cdtJ5QC+0hU5Eyz+MnLDz6ZH6y41Y hjZCnHpJVgbkx6WLCMiThU5I1FLVsyTKVRnOx1cZOwB+iQyIqENOTNbfos5aiViwnRSnbV Lall5XFzi/duc1hH64mM7RdSTiXJp60= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Vp+o4zr4; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf17.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661412335; a=rsa-sha256; cv=none; b=lqVeZ156n7tM8yCEmuM0vJN5TrvrnSsO186Tv82LgYBhIM6Z9VVCTviL/rm/qih+cArIPF 2Tf2ZJWqXh/AE59dtJWAAr5qMmlQLcgwRagTUHZEGdERu9/XLR5Jvvp8/mqayvoALPREzf napdN1D0/QwNC4lMohV+mkUdgyZqnCc= Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Vp+o4zr4; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf17.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com X-Rspam-User: X-Rspamd-Queue-Id: A139540004 X-Rspamd-Server: rspam10 X-Stat-Signature: wndnatc34ecipsfosyste1m49m5jxsf8 X-HE-Tag: 1661412335-582208 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > Is the primary concern the locking? If so, I am not sure we have an issue. > As mentioned in your commit message, current code will use > pte_offset_map_lock(). pte_offset_map_lock uses pte_lockptr, and pte_lockptr > will either be the mm wide lock or pmd_page lock. To me, it seems that > either would provide correct synchronization for CONT-PTE entries. Am I > missing something or misreading the code? > > I started looking at code cleanup suggested by David. Here is a quick > patch (not tested and likely containing errors) to see if this is a step > in the right direction. > > I like it because we get rid of/combine all those follow_huge_p*d > routines. > Yes, see comments below. > From 35d117a707c1567ddf350554298697d40eace0d7 Mon Sep 17 00:00:00 2001 > From: Mike Kravetz > Date: Wed, 24 Aug 2022 15:59:15 -0700 > Subject: [PATCH] hugetlb: call hugetlb_follow_page_mask for hugetlb pages in > follow_page_mask > > At the beginning of follow_page_mask, there currently is a call to > follow_huge_addr which 'may' handle hugetlb pages. ia64 is the only > architecture which (incorrectly) provides a follow_huge_addr routine > that does not return error. Instead, at each level of the page table a > check is made for a hugetlb entry. If a hugetlb entry is found, a call > to a routine associated with that page table level such as > follow_huge_pmd is made. > > All the follow_huge_p*d routines are basically the same. In addition > huge page size can be derived from the vma, so we know where in the page > table a huge page would reside. So, replace follow_huge_addr with a > new architecture independent routine which will provide the same > functionality as the follow_huge_p*d routines. We can then eliminate > the p*d_huge checks in follow_page_mask page table walking as well as > the follow_huge_p*d routines themselves.> > follow_page_mask still has is_hugepd hugetlb checks during page table > walking. This is due to these checks and follow_huge_pd being > architecture specific. These can be eliminated if > hugetlb_follow_page_mask can be overwritten by architectures (powerpc) > that need to do follow_huge_pd processing. But won't the > + /* hugetlb is special */ > + if (is_vm_hugetlb_page(vma)) > + return hugetlb_follow_page_mask(vma, address, flags); code route everything via hugetlb_follow_page_mask() and all these (beloved) hugepd checks would essentially be unreachable? At least my understanding is that hugepd only applies to hugetlb. Can't we move the hugepd handling code into hugetlb_follow_page_mask() as well? I mean, doesn't follow_hugetlb_page() also have to handle that hugepd stuff already ... ? [...] > > +struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, > + unsigned long address, unsigned int flags) > +{ > + struct hstate *h = hstate_vma(vma); > + struct mm_struct *mm = vma->vm_mm; > + unsigned long haddr = address & huge_page_mask(h); > + struct page *page = NULL; > + spinlock_t *ptl; > + pte_t *pte, entry; > + > + /* > + * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via > + * follow_hugetlb_page(). > + */ > + if (WARN_ON_ONCE(flags & FOLL_PIN)) > + return NULL; > + > + pte = huge_pte_offset(mm, haddr, huge_page_size(h)); > + if (!pte) > + return NULL; > + > +retry: > + ptl = huge_pte_lock(h, mm, pte); > + entry = huge_ptep_get(pte); > + if (pte_present(entry)) { > + page = pte_page(entry); > + /* > + * try_grab_page() should always succeed here, because we hold > + * the ptl lock and have verified pte_present(). > + */ > + if (WARN_ON_ONCE(!try_grab_page(page, flags))) { > + page = NULL; > + goto out; > + } > + } else { > + if (is_hugetlb_entry_migration(entry)) { > + spin_unlock(ptl); > + __migration_entry_wait_huge(pte, ptl); > + goto retry; > + } > + /* > + * hwpoisoned entry is treated as no_page_table in > + * follow_page_mask(). > + */ > + } > +out: > + spin_unlock(ptl); > + return page; This is neat and clean enough to not reuse follow_hugetlb_page(). I wonder if we want to add some comment to the function how this differs to follow_hugetlb_page(). ... or do we maybe want to rename follow_hugetlb_page() to someting like __hugetlb_get_user_pages() to make it clearer in which context it will get called? I guess it might be feasible in the future to eliminate follow_hugetlb_page() and centralizing the faulting code. For now, this certainly improves the situation. -- Thanks, David / dhildenb