From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEAC0C4708E for ; Thu, 8 Dec 2022 00:26:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3658D8E0003; Wed, 7 Dec 2022 19:26:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2EEED8E0001; Wed, 7 Dec 2022 19:26:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1908A8E0003; Wed, 7 Dec 2022 19:26:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 020BF8E0001 for ; Wed, 7 Dec 2022 19:26:25 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A7AEFAB6D1 for ; Thu, 8 Dec 2022 00:26:25 +0000 (UTC) X-FDA: 80217247530.13.1B6B5B1 Received: from mail-vs1-f45.google.com (mail-vs1-f45.google.com [209.85.217.45]) by imf11.hostedemail.com (Postfix) with ESMTP id 106AD4000A for ; Thu, 8 Dec 2022 00:26:23 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=pZQ5HzoM; spf=pass (imf11.hostedemail.com: domain of almasrymina@google.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670459184; a=rsa-sha256; cv=none; b=w5JMPuHwMD9vvPkXv0/j0ENrL2Ik9iDsUGZRBedLxKxIVUgjh0Dsz4JK/L8F0V4oSiQiKq NWV7ORhqr9c/B1dh9+HKZ4j595J/z8IyMIhyVwVSyCr11/cFSkJ+DuKU9T18LpSyS95zMI qT3ekpxTwQZWRT0PT9Swc+05IfnT+qg= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=pZQ5HzoM; spf=pass (imf11.hostedemail.com: domain of almasrymina@google.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670459184; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1cPWCJxHb7vXKDYo5Yh/DR7B6idG0GH1JIk/Nqz7sNQ=; b=bqfPqDPBBdmaiJTL0R1e1AXkZ8LHOegxgrmoJIvMjTIJvUG4vJiGSLjBPrkR5EigDlEpci gnI7PzPuNdhbAoT2s0hVVQlf3JeKxSNdYYAhdH/0zJQZ25RlPrs6aQ7hgjPhBlUvuTEQK0 3VLPc3bKkzDrAltQyjKao7WgRbX5Va4= Received: by mail-vs1-f45.google.com with SMTP id 124so80605vsv.4 for ; Wed, 07 Dec 2022 16:26:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=1cPWCJxHb7vXKDYo5Yh/DR7B6idG0GH1JIk/Nqz7sNQ=; b=pZQ5HzoMeGMTOGt+Q81R3YL1/Dm9TZHe/hV+mWOpfWUPGZLePMLnCO/WQ/15IQnW3x p9bSTvCf5xBpB+fSsZmdLugO7hnMu49EgoiLQCFUExyRqTcynXe2xfk77THNqMzjQt+0 d7Ffe6s7CSXwOOMOK+QO1wwZYejVWI5fQoJl4vg3k9964lppO7b69M1ijH8F6SRjkeCG iPymoh8L9InJSm4guLBZYeG3SzZV7DrjSwVRr6vAG+mnmU3fI/kEZzOjCVDsxwbEfac9 zc3Lba60Ny4ieIXWXeMQHP5VFIL/Z957QXxcQDKBL5iD75wmyPZ7BFyYsxJ5EF7hQgjq EZzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1cPWCJxHb7vXKDYo5Yh/DR7B6idG0GH1JIk/Nqz7sNQ=; b=zZZBimTVxuXOaC7THzn64FczF2lox/Fv0tkk1jTs70VmUCmmpPJ/UY+C9A5RuD6L5x sjSlzJ0aJApj6fL/x6bzdFxCPyQgXtt1oWql11iXb8xC2koWdmUER7sfk8vtW9zHiLkP C77/IOEgv3//g9gVXlUd4UBA5qE8j2csV9l7IlaWjlLMghUvdrZVQugZMNpRquftAknU 9BA2bHPpaj/itoZ2hoBkCNTP5sknHF9rytSaIj5zsCu0wAY3HgaTCHWLZWbQdyP/RYey Rvy2MrS/Vf8V45UOhVzPMQ5ijIXxWnaVuJ0w27OXX623l928rtinY669TGmmaTz1Pm+C 8EWA== X-Gm-Message-State: ANoB5pmzSy2B0w/I3VLainJgyeIwnjQNVSwtWHiFnLQ61u5kFTu1b31T oUGB6d/vDzKFfUJNetDYn1ls8WPyTcwMO32ZCtORZw== X-Google-Smtp-Source: AA0mqf4bXatNHXXC5HX/WLbX3cAjd6Tl/5qwSO3/XkoJmGNgN3JsW4uenQ7PnP8DzHu1GDyu2RgRCQnpYY+RJ47oGXg= X-Received: by 2002:a67:ea04:0:b0:3a7:d7bc:c2e9 with SMTP id g4-20020a67ea04000000b003a7d7bcc2e9mr42542350vso.61.1670459183014; Wed, 07 Dec 2022 16:26:23 -0800 (PST) MIME-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> <20221021163703.3218176-9-jthoughton@google.com> In-Reply-To: <20221021163703.3218176-9-jthoughton@google.com> From: Mina Almasry Date: Wed, 7 Dec 2022 16:26:11 -0800 Message-ID: Subject: Re: [RFC PATCH v2 08/47] hugetlb: add HGM enablement functions To: James Houghton Cc: Mike Kravetz , Muchun Song , Peter Xu , David Hildenbrand , David Rientjes , Axel Rasmussen , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 106AD4000A X-Stat-Signature: cms66kcn4996xob4c16gea6o3ktzizzq X-HE-Tag: 1670459183-326339 X-HE-Meta: U2FsdGVkX1/Y7I5ambf4WFvfNtwYyPfwt2LkrIN8Xv9ucnRm6JBzbF6Imz29Xea++X87wyp+BB4SWlDIP82zCzXplx2HMAAOEErsAiGNIYaWFvGfsrBoTFxFLGT7+4XCJXhTeDaqvouYpjH68xmIVrrKEIW9FywkjchlO/GOJVTcFJsu6Kk7N1WMIutbOba3Imwo8fbEVSw6lW3RLXYyxBB6r+WMt+jddrJQ05aMwawoModj2OVrMTK/9XLwcgmPLLFKOp7jqi5I32/3xM+gIc+rOjZ4m/w4J0DzdyVGSuZpHUfxkvJtyCPK/jH+tzik+U8Daa6/lE7D4oEoP2FWoJ8dk5602HI6nGMm0LIj7w3Sl6z4Hym90IqnTiQ6Ivk5M8dg7FM2ggPhDJ0kyFbl6AOGRXJeoON+Y2TvyU7l8SsCJXL94mALCOv29UfCreCfizq3qfWBY9MCyV7v08AffESwkydjI2E6cvRWvXxDch5nKqEu9HIOmm+mjYF/gLU3Px+pogaC+0PFGHBi6EWDxWSMQTjhZKQU5QyyDC3FvNwjcFM0UYKOquAdKyh4Jh6nxpmCQSWokQtTYdOcu5zDtEqdIhqviQuP6u3Z5CJeIMhS7d4ltdzORJ7TIYhfYe9aTPvi3QvgypOBUulZSEfdPZgOhxX4zXVZTCPZygW7BA7FzQ43q+92foaQP2LXCdG/yZRxTswIlM150glB/Z/QECeESauulpYBavCghglNLUXsvT21QQdumWbGGfOuAc6VJLUpmFEKDPrM+qLAr7hN25HX7YqUl8qBGVLFmGM2DjupAdBkANC2cbYskRJ7WjTz0avauVc6MJSeiFN2K0pImLN5/gBtZIletc7B4vPR9tlb4dtNjf1lfx4h9+nVpRpaNhgN5uSqNGFjgpnuarLLft6K1rzirelAQvZBUSORoFYnXfaU1TPhVVa3KEEnmwxkWJyt+uXncTcuLNpPuYv PUs7MS5Q GGWwVY1Owo2oSzt0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Oct 21, 2022 at 9:37 AM James Houghton wrote: > > Currently it is possible for all shared VMAs to use HGM, but it must be > enabled first. This is because with HGM, we lose PMD sharing, and page > table walks require additional synchronization (we need to take the VMA > lock). > > Signed-off-by: James Houghton > --- > include/linux/hugetlb.h | 22 +++++++++++++ > mm/hugetlb.c | 69 +++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 91 insertions(+) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 534958499ac4..6e0c36b08a0c 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -123,6 +123,9 @@ struct hugetlb_vma_lock { > > struct hugetlb_shared_vma_data { > struct hugetlb_vma_lock vma_lock; > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > + bool hgm_enabled; > +#endif > }; > > extern struct resv_map *resv_map_alloc(void); > @@ -1179,6 +1182,25 @@ static inline void hugetlb_unregister_node(struct node *node) > } > #endif /* CONFIG_HUGETLB_PAGE */ > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma); > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma); > +int enable_hugetlb_hgm(struct vm_area_struct *vma); > +#else > +static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) > +{ > + return false; > +} > +static inline bool hugetlb_hgm_eligible(struct vm_area_struct *vma) > +{ > + return false; > +} > +static inline int enable_hugetlb_hgm(struct vm_area_struct *vma) > +{ > + return -EINVAL; > +} > +#endif > + > static inline spinlock_t *huge_pte_lock(struct hstate *h, > struct mm_struct *mm, pte_t *pte) > { > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 5ae8bc8c928e..a18143add956 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -6840,6 +6840,10 @@ static bool pmd_sharing_possible(struct vm_area_struct *vma) > #ifdef CONFIG_USERFAULTFD > if (uffd_disable_huge_pmd_share(vma)) > return false; > +#endif > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > + if (hugetlb_hgm_enabled(vma)) > + return false; > #endif > /* > * Only shared VMAs can share PMDs. > @@ -7033,6 +7037,9 @@ static int hugetlb_vma_data_alloc(struct vm_area_struct *vma) > kref_init(&data->vma_lock.refs); > init_rwsem(&data->vma_lock.rw_sema); > data->vma_lock.vma = vma; > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > + data->hgm_enabled = false; > +#endif > vma->vm_private_data = data; > return 0; > } > @@ -7290,6 +7297,68 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) > > #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma) > +{ > + /* > + * All shared VMAs may have HGM. > + * > + * HGM requires using the VMA lock, which only exists for shared VMAs. > + * To make HGM work for private VMAs, we would need to use another > + * scheme to prevent collapsing/splitting from invalidating other > + * threads' page table walks. > + */ > + return vma && (vma->vm_flags & VM_MAYSHARE); > +} > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma) > +{ > + struct hugetlb_shared_vma_data *data = vma->vm_private_data; > + > + if (!vma || !(vma->vm_flags & VM_MAYSHARE)) > + return false; > + > + return data && data->hgm_enabled; Don't you need to lock data->vma_lock before you access data? Or did I misunderstand the locking? Or are you assuming this is safe before hgm_enabled can't be disabled? > +} > + > +/* > + * Enable high-granularity mapping (HGM) for this VMA. Once enabled, HGM > + * cannot be turned off. > + * > + * PMDs cannot be shared in HGM VMAs. > + */ > +int enable_hugetlb_hgm(struct vm_area_struct *vma) > +{ > + int ret; > + struct hugetlb_shared_vma_data *data; > + > + if (!hugetlb_hgm_eligible(vma)) > + return -EINVAL; > + > + if (hugetlb_hgm_enabled(vma)) > + return 0; > + > + /* > + * We must hold the mmap lock for writing so that callers can rely on > + * hugetlb_hgm_enabled returning a consistent result while holding > + * the mmap lock for reading. > + */ > + mmap_assert_write_locked(vma->vm_mm); > + > + /* HugeTLB HGM requires the VMA lock to synchronize collapsing. */ > + ret = hugetlb_vma_data_alloc(vma); Confused we need to vma_data_alloc() here. Shouldn't this be done by hugetlb_vm_op_open()? > + if (ret) > + return ret; > + > + data = vma->vm_private_data; > + BUG_ON(!data); > + data->hgm_enabled = true; > + > + /* We don't support PMD sharing with HGM. */ > + hugetlb_unshare_all_pmds(vma); > + return 0; > +} > +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ > + > /* > * These functions are overwritable if your architecture needs its own > * behavior. > -- > 2.38.0.135.g90850a2211-goog >