From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C1BEC4332F for ; Fri, 9 Dec 2022 15:41:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 706258E0005; Fri, 9 Dec 2022 10:41:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B5F58E0001; Fri, 9 Dec 2022 10:41:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 57D458E0005; Fri, 9 Dec 2022 10:41:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 486818E0001 for ; Fri, 9 Dec 2022 10:41:40 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 144DB412CA for ; Fri, 9 Dec 2022 15:41:40 +0000 (UTC) X-FDA: 80223182760.03.6BBFC2E Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) by imf24.hostedemail.com (Postfix) with ESMTP id 6519818001D for ; Fri, 9 Dec 2022 15:41:38 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=feC9CFwS; spf=pass (imf24.hostedemail.com: domain of jthoughton@google.com designates 209.85.128.43 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670600498; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zLl4Tz9dOxODdkz805/Ivhudgy+WqekBga7RITekvs0=; b=48QcMcazL8jpWxBr2IkHiown3NOxj0czSzC98OaX3omdK/2uZz8EOJWk96UxA54sZ71jBG YRP4iT20wqATp4wtcdZiJjIEQnvKC+vmjtGcRQyCQyr8aJ03EktcIuQuK1bDJjT5NKgDBo daMv+FrLaWCT/HJoamdvAJ1ZY7R0IeA= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=feC9CFwS; spf=pass (imf24.hostedemail.com: domain of jthoughton@google.com designates 209.85.128.43 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670600498; a=rsa-sha256; cv=none; b=ctzCrZd6lNkGvVSa/z88xg3H2XdQdAgWNd+Zm1+rDXS3UYo9KuPBlu5KCqyCtIV0F4uHiy YzIG8rlOqMnOQIzFi8JYfjzTp4GKRhnLOUK++R1OZvmreU/v6bqfslpPBv7soCiCYYOvE8 c9a9H5nOiOxbR7uCRV59khIYY9Jb+x0= Received: by mail-wm1-f43.google.com with SMTP id c65-20020a1c3544000000b003cfffd00fc0so120144wma.1 for ; Fri, 09 Dec 2022 07:41:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=zLl4Tz9dOxODdkz805/Ivhudgy+WqekBga7RITekvs0=; b=feC9CFwS0Yiuqn3p/A3RGjuWnRA9669IU/0/uzpCLL70yZ9pTra2n57v6GDb73pzdM bKtVnBSeXqDuwkhZayfoVJO5HfHoqpUJd9jmYhO8knmEccpqfYzVuceNJu6P8OwkCJcq P/rjNhPDzp2RoH56JsWnRjSSanSqDLHaogGwa5CKxAmuSbYIClvX9YtBhTkI55fxayAx fDPXz4s6sBKRQw0HYvm+X19SZWRRprgNwgOxsMR2eViYi6Ibsd9f7GGVP8XHisZgj5WP pwNkWRat8aoDtNtnM9p1LCmn+pejNEKlJoW9o5DQ7gB9oLBeMmIVHMOYgi4laYZvv/Ph XtQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=zLl4Tz9dOxODdkz805/Ivhudgy+WqekBga7RITekvs0=; b=C90bc/2og+C7C+8uUeaaLZdMGMN+2DoDfcILvJnanuCOi+4trLlNSwaBpRGK5/kUqL IwpflIUTf9gW9Htcys5QhO2ka4Zbxb6SidQypBDVDPGmYZ/T7dY/MqiKqmzHKK1P+N6G gkJI+uDXu733WmhPnJ4zfMtrXRIeJPagzCxeejpXrKEVCtZbKLUeWVL4l8rKdRrUPzDK W/rQN/8GiK07xW885REK4vDz3dW7b3cZPuP7CwYqJDLqQ+Zr966xHAPPhLkOELdTbzui T941D5Tz7G7/2bn8s/yqfYcD4oizToJVCwMeinYEOGsVw+54FJpZL+MNnkB4misJHzyr 9uiA== X-Gm-Message-State: ANoB5pnO9JqFpjsti4CXK8CXtJGOamhZ0hjBcmAxwtB7wyosWKkaSD6R T+aC3Ci9W7QnGqryegcervOwuw/2h17oJtO0s/8yoA== X-Google-Smtp-Source: AA0mqf5ghm4O86DURsnNnI21lgPhQIY7jhcggNQ9RFoccdgwQ2nVJocTrTVPIRbJWii1x6VANb8KhVvy1gEc3URiSoo= X-Received: by 2002:a05:600c:5124:b0:3cf:878c:6555 with SMTP id o36-20020a05600c512400b003cf878c6555mr58256487wms.38.1670600496778; Fri, 09 Dec 2022 07:41:36 -0800 (PST) MIME-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> <20221021163703.3218176-9-jthoughton@google.com> In-Reply-To: From: James Houghton Date: Fri, 9 Dec 2022 10:41:24 -0500 Message-ID: Subject: Re: [RFC PATCH v2 08/47] hugetlb: add HGM enablement functions To: Mina Almasry Cc: Mike Kravetz , Muchun Song , Peter Xu , David Hildenbrand , David Rientjes , Axel Rasmussen , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 6519818001D X-Stat-Signature: g7zao3yxjujranj96u4s48u8nazhzpin X-Rspam-User: X-HE-Tag: 1670600498-778200 X-HE-Meta: U2FsdGVkX19RHjEqzhCV+eWhwVfQPb3RpNRPICZFRgZPRn6tAwJOf2jQL+DSxjQoINUclw74lmLaPxRKMeFNDSQWG9dgZNLHuMXB3l49fi+4BMof7V9/W9MXNqS+nQ3lDtw+yMD3e542Nlu9bsbommFItwKo7Twr1FxDqvUFn752OifN+/tB8LNadcgaxWEOmY4Mo41qMr7Zx/3lf55gOnjFFoqRzJ+Snl250vUsnThJUQFqDNiaJDrR6kgXd127GjFu6Mq01R9iuPjaSxQm44R/Z1pnqpyabaaFWixd1crKmFNUsmsSbP486FgkRuDsAIpY+KUV1cPF4IVqwQuy7iDvSDcSQqWpwMxgdtEpHPSSOD1HBbCKc/eTXTxc6zobFqPnmYoUY7gnTlGeB2hMsoEV2Cp8k9xYxG+npv3DfyF/HGyDaN55diH4uOyDB8Oxya9EAUWLfUWn39pP8+w8pOLy4KiUUOEtKYwbS2KEblF7T3jn5UOCTivJKZZ4nZvvp9pfMDcmIRHlFhlNxwO0mvyjykXuy/cmd7aXI+WHk4yo7RyfTKGUOiCzJ7dIj+XTRhkXNlWgB9ArMg6vxJ8LZVscDUGwEpPrlPPe3cZ0+E3E+eFEOjLDq0Ljwb55RfS/e2Y3srj7IO8UGM9Uj0sIGqCPhXg0l4vL8iipZo4+pNGGLVNFbvrRaFwhDSV4+UndX823r4e7NlhMobdMRxLTryEs26HKl6wJKKdZiR3/lwi9gIwzrioe7NnBJ5IMjGDtKJ0loZDQz/5T3l8YWGFwfqsxX0Nwt7iNvhKsSFJo2YdtWEXZB9wNB0YZ6Q2cyzAG1u19h37aICm24ItYKDBm1Yhv3oWZdLHYG9/R9ew90vyADGjKTFjOLmFr4KHqAOfdGKU5lYjb+hTZq40FoK+1a+M6uMkyEnBdWORWjThckRS4tFYneC4WO1bv3dj00A7u/j1kwK4Pfp19mub37P2 6ux3QtGN VCX9I0YD7YG7/Uek= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Dec 7, 2022 at 7:26 PM Mina Almasry wrote: > > On Fri, Oct 21, 2022 at 9:37 AM James Houghton wrote: > > > > Currently it is possible for all shared VMAs to use HGM, but it must be > > enabled first. This is because with HGM, we lose PMD sharing, and page > > table walks require additional synchronization (we need to take the VMA > > lock). > > > > Signed-off-by: James Houghton > > --- > > include/linux/hugetlb.h | 22 +++++++++++++ > > mm/hugetlb.c | 69 +++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 91 insertions(+) > > > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > > index 534958499ac4..6e0c36b08a0c 100644 > > --- a/include/linux/hugetlb.h > > +++ b/include/linux/hugetlb.h > > @@ -123,6 +123,9 @@ struct hugetlb_vma_lock { > > > > struct hugetlb_shared_vma_data { > > struct hugetlb_vma_lock vma_lock; > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > + bool hgm_enabled; > > +#endif > > }; > > > > extern struct resv_map *resv_map_alloc(void); > > @@ -1179,6 +1182,25 @@ static inline void hugetlb_unregister_node(struct node *node) > > } > > #endif /* CONFIG_HUGETLB_PAGE */ > > > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma); > > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma); > > +int enable_hugetlb_hgm(struct vm_area_struct *vma); > > +#else > > +static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) > > +{ > > + return false; > > +} > > +static inline bool hugetlb_hgm_eligible(struct vm_area_struct *vma) > > +{ > > + return false; > > +} > > +static inline int enable_hugetlb_hgm(struct vm_area_struct *vma) > > +{ > > + return -EINVAL; > > +} > > +#endif > > + > > static inline spinlock_t *huge_pte_lock(struct hstate *h, > > struct mm_struct *mm, pte_t *pte) > > { > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 5ae8bc8c928e..a18143add956 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -6840,6 +6840,10 @@ static bool pmd_sharing_possible(struct vm_area_struct *vma) > > #ifdef CONFIG_USERFAULTFD > > if (uffd_disable_huge_pmd_share(vma)) > > return false; > > +#endif > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > + if (hugetlb_hgm_enabled(vma)) > > + return false; > > #endif > > /* > > * Only shared VMAs can share PMDs. > > @@ -7033,6 +7037,9 @@ static int hugetlb_vma_data_alloc(struct vm_area_struct *vma) > > kref_init(&data->vma_lock.refs); > > init_rwsem(&data->vma_lock.rw_sema); > > data->vma_lock.vma = vma; > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > + data->hgm_enabled = false; > > +#endif > > vma->vm_private_data = data; > > return 0; > > } > > @@ -7290,6 +7297,68 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) > > > > #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ > > > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma) > > +{ > > + /* > > + * All shared VMAs may have HGM. > > + * > > + * HGM requires using the VMA lock, which only exists for shared VMAs. > > + * To make HGM work for private VMAs, we would need to use another > > + * scheme to prevent collapsing/splitting from invalidating other > > + * threads' page table walks. > > + */ > > + return vma && (vma->vm_flags & VM_MAYSHARE); > > +} > > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma) > > +{ > > + struct hugetlb_shared_vma_data *data = vma->vm_private_data; > > + > > + if (!vma || !(vma->vm_flags & VM_MAYSHARE)) > > + return false; > > + > > + return data && data->hgm_enabled; > > Don't you need to lock data->vma_lock before you access data? Or did I > misunderstand the locking? Or are you assuming this is safe before > hgm_enabled can't be disabled? This should be protected by the mmap_lock (we must be holding it for at least reading here). `data` and `data->hgm_enabled` are only changed when holding the mmap_lock for writing. > > +} > > + > > +/* > > + * Enable high-granularity mapping (HGM) for this VMA. Once enabled, HGM > > + * cannot be turned off. > > + * > > + * PMDs cannot be shared in HGM VMAs. > > + */ > > +int enable_hugetlb_hgm(struct vm_area_struct *vma) > > +{ > > + int ret; > > + struct hugetlb_shared_vma_data *data; > > + > > + if (!hugetlb_hgm_eligible(vma)) > > + return -EINVAL; > > + > > + if (hugetlb_hgm_enabled(vma)) > > + return 0; > > + > > + /* > > + * We must hold the mmap lock for writing so that callers can rely on > > + * hugetlb_hgm_enabled returning a consistent result while holding > > + * the mmap lock for reading. > > + */ > > + mmap_assert_write_locked(vma->vm_mm); > > + > > + /* HugeTLB HGM requires the VMA lock to synchronize collapsing. */ > > + ret = hugetlb_vma_data_alloc(vma); > > Confused we need to vma_data_alloc() here. Shouldn't this be done by > hugetlb_vm_op_open()? hugetlb_vma_data_alloc() can fail. In hugetlb_vm_op_open()/other places, it is allowed to fail, and so we call it again here and check that it succeeded so that we can rely on the VMA lock. I think I need to be a little bit more careful with how I handle VMA splitting, though. It's possible for `data` not to be allocated after we split, but for some things to be mapped at high-granularity. The easiest solution here would be to disallow splitting when HGM is enabled; not sure what the best solution is though. Thanks for the review, Mina! > > > + if (ret) > > + return ret; > > + > > + data = vma->vm_private_data; > > + BUG_ON(!data); > > + data->hgm_enabled = true; > > + > > + /* We don't support PMD sharing with HGM. */ > > + hugetlb_unshare_all_pmds(vma); > > + return 0; > > +} > > +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ > > + > > /* > > * These functions are overwritable if your architecture needs its own > > * behavior. > > -- > > 2.38.0.135.g90850a2211-goog > >