From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38391C3DA7D for ; Tue, 3 Jan 2023 23:04:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 453608E0002; Tue, 3 Jan 2023 18:04:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 403778E0001; Tue, 3 Jan 2023 18:04:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A34D8E0002; Tue, 3 Jan 2023 18:04:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 17AE08E0001 for ; Tue, 3 Jan 2023 18:04:59 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D9B63AB048 for ; Tue, 3 Jan 2023 23:04:58 +0000 (UTC) X-FDA: 80315019876.17.3DA7F7C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf14.hostedemail.com (Postfix) with ESMTP id AB01D100010 for ; Tue, 3 Jan 2023 23:04:56 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=H2mraaC1; spf=pass (imf14.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1672787097; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fsV4iw9XRai6TVSACmsV0RGWy9G55hSnzu9m14iwstk=; b=tKRNIB5/q03HsQxYv25viDMTIxzpd2qnIADLwJB5JJ/YWgEohPZWXmhV3sauaqUIaEchzg tP7Rcfh5HXVj1H9VJOhOnBZGu4nQlmLMjm6wv6AqZC82vHmi03736GL7FdMmWXt8u7ezuB DPkd1+rQ6TtXmq4bfmnQzeC7i3NNfFI= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=H2mraaC1; spf=pass (imf14.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1672787097; a=rsa-sha256; cv=none; b=p4uQa8zXbzyShyizpV8NRBonD6KUZ95P5e2NyhLqRU3iTwtpHUm3sd6HiOULRjOt0Ef4Fr E/Wa6ferA7De3mZ8damwcuXqDrCMJcrWnNcZ+xQ9hfKGQTBm3VyfhLyXTQSicCAXbB7AIz ZcAwxSNDsr4zCMhNoSCAZTlB93wgw7M= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1672787096; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=fsV4iw9XRai6TVSACmsV0RGWy9G55hSnzu9m14iwstk=; b=H2mraaC1GgDAwJXeIbSAghMWqQLT/D0IZOm2Bwns3mbB9Ol/kdQhiloTPhQ2cz1fZQWow2 +NIN2ZQJ1byzpscrCSimuSRWZaB/FYiUE67rS0rTdQrpQ2RAM/OiD1D1qBytJ8XYulauZW Dc9FWpw3aoE4uhNoRQMHGRIB4xLw9HU= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-468-e_ZI-PB0Nny6mjbt5-eN_g-1; Tue, 03 Jan 2023 18:04:54 -0500 X-MC-Unique: e_ZI-PB0Nny6mjbt5-eN_g-1 Received: by mail-qt1-f199.google.com with SMTP id y24-20020ac85258000000b003ab503f2b28so10922049qtn.5 for ; Tue, 03 Jan 2023 15:04:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=fsV4iw9XRai6TVSACmsV0RGWy9G55hSnzu9m14iwstk=; b=fr7X870baxglzYEflu9pp37MNbdo9C/tTd4bn4HMCs/1nDfDLEwIjzJtryP/B1/ZIB gLkx8L3PmmS6t7BN86KfaLYsdVk00TNlaukuU9PsWt2UWHoDTxaae18fdi1AcLLcbBdm fLSb7++M/BF+IIvXrozlK1FC+ixFoWyHMHF9I2tMVlqdEExFkILyGMm96kN4XhoPf3dC yRb1Ffwld/mU/Kk1S4XeZ0IT6KjUz58QwslsfhVSkSPR82raeSyURrjb14mQreP5Tiju 41ayAt5tARAzfN+VfUhe3ZFV/eYjx+gW0URT/wRkTLCYjBR4NPhbRxVCBrkN9IMu1iC7 iFYQ== X-Gm-Message-State: AFqh2kqVUSpiEd3b0hX2RQnoTu9IK6oZww+eAjNPicS33lcZEdrCnCfm ln40zdCK5ctUk+uMAlWIgtCUxJbYuZKSGhR5NK1ZQO4uuXJRLsIkXL1V/R0Y6caB+ANtJhvCRDf t6HEjHoZW944= X-Received: by 2002:ac8:5c12:0:b0:3a6:18ff:c6e2 with SMTP id i18-20020ac85c12000000b003a618ffc6e2mr90195697qti.28.1672787094136; Tue, 03 Jan 2023 15:04:54 -0800 (PST) X-Google-Smtp-Source: AMrXdXtJiAAjxhaa62K5E5gNgeQKrZG8BuWoDHlLV08V4CS0/xU1yn0QB/vH5nE+HdfgaY4+fQco0g== X-Received: by 2002:ac8:5c12:0:b0:3a6:18ff:c6e2 with SMTP id i18-20020ac85c12000000b003a618ffc6e2mr90195666qti.28.1672787093875; Tue, 03 Jan 2023 15:04:53 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-39-70-52-228-144.dsl.bell.ca. [70.52.228.144]) by smtp.gmail.com with ESMTPSA id h24-20020ac87458000000b003a7ef7a758dsm19434981qtr.59.2023.01.03.15.04.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Jan 2023 15:04:53 -0800 (PST) Date: Tue, 3 Jan 2023 18:04:52 -0500 From: Peter Xu To: James Houghton Cc: Mike Kravetz , Muchun Song , Axel Rasmussen , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] hugetlb: unshare some PMDs when splitting VMAs Message-ID: References: <20230101230042.244286-1-jthoughton@google.com> MIME-Version: 1.0 In-Reply-To: <20230101230042.244286-1-jthoughton@google.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Stat-Signature: yfh5mt5y4855jbz1oz91idrzrd8uc71g X-Rspam-User: X-Rspamd-Queue-Id: AB01D100010 X-Rspamd-Server: rspam06 X-HE-Tag: 1672787096-1939 X-HE-Meta: U2FsdGVkX185GX9O4glyyFrhEUuzKOuKtO6J7g1vb86neqP8OWXVDFIXgBDENG7dweu+e8SdWsKnJi4a5PJFvGPR/S7cQ/Tsq78LFd7dTuq1gIU3/9gd24an/MvkYFSaQUrBocppndcqIB99stluSLma6dzv8wGjVhhnjO32YUj7njaaBbrIQYW3UPxUsDPcJVt5EBCKkRhIfnhyRFnPjzfe3gmJWOBRFe05wwWnilNMG03HiBswxTKjxMlidy8fD9ZRakpukuYf21uQTRYJYGQUOPralqxNskBx2EiFUp9gJadAXvFRtnHShMaFjpPIeQOSe6T4JU1XiaktSP9VxnJIOZya62PXFbai5Yf3LsvgDf4PR0qMnpEmGvIw1K3FHX1QI0Yf0roa6PXa6FSwVP7uEX3UXb9bqbsGkVuOBm2XQAhta7VAY10O0ODFUPas49rDndA8CI0RU6xWGPXxhCVhC07iUywJXrrnPIZBQ3361zTgpFcWUEDAXNCDQvLbj49U3JzJ1L/WIyvzRSuWB7TZTITCuUMIQy6OJcDUiUX9mZTYZ3fZVSLgmZZPqw9p5vVFnhn6AzwbIgtGvKdDV3kWbVK+cjPgxSbcNIekXnd09eW9VcgLIte8LuArdYF6w8XhP55zrYMJoW2k4XTNRZLuxSVG9pSscoKgz3ZMmtiJ+HQbVwvFxQxR9N+QY+dhHIuzgqgXt18vGtK1pjwnwup4eR8baaYigf/q0XjI675fvkptazt1a7k4xH+gRm7qBJVnsP6K+Xw7NGRYtTLMilK9D/9IUu97dDtWglvGQ2cZ3vShpRzvthCpAVPSNMb6sMa9ogxQwGoBWkekpyVdBzblGXmXWUb1sy4AR/YAMSY1N5ZZP9kbw3ep37Qfs0wU/NiG5tcdyxurNxrJNCfxKMnsz0ObiJe9GJ11qVxSNYiJCDD/CIRtcRkJb/21zClnJ0pAGi5huI+EIJ1a7bb NoTyyMz5 QYfDrqn6RItd5/yLJZ6x1NJZocqzexGxizdQdN9AwZKqihz63Jkxokrtll5qjxuTg+SVoDHlMNy6tqDqFdxRoKS7U5zG0AeL+xLBxBabDev/9Hjc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Jan 01, 2023 at 11:00:42PM +0000, James Houghton wrote: > PMD sharing can only be done in PUD_SIZE-aligned pieces of VMAs; > however, it is possible that HugeTLB VMAs are split without unsharing > the PMDs first. > > In some (most?) cases, this is a non-issue, like userfaultfd_register > and mprotect, where PMDs are unshared before anything is done. However, > mbind() and madvise() (like MADV_DONTDUMP) can cause a split without > unsharing first. > > It might seem ideal to unshare in hugetlb_vm_op_open, but that would > only unshare PMDs in the new VMA. > > Signed-off-by: James Houghton > --- > mm/hugetlb.c | 42 +++++++++++++++++++++++++++++++++--------- > 1 file changed, 33 insertions(+), 9 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index b39b74e0591a..bf7a1f628357 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -94,6 +94,8 @@ static int hugetlb_acct_memory(struct hstate *h, long delta); > static void hugetlb_vma_lock_free(struct vm_area_struct *vma); > static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma); > static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); > +static void hugetlb_unshare_pmds(struct vm_area_struct *vma, > + unsigned long start, unsigned long end); > > static inline bool subpool_is_free(struct hugepage_subpool *spool) > { > @@ -4828,6 +4830,23 @@ static int hugetlb_vm_op_split(struct vm_area_struct *vma, unsigned long addr) > { > if (addr & ~(huge_page_mask(hstate_vma(vma)))) > return -EINVAL; > + > + /* We require PUD_SIZE VMA alignment for PMD sharing. */ I can get the point, but it reads slightly awkward. How about: /* * If the address to split can be in the middle of a shared pmd * range, unshare before split the vma. */ I remember you had a helper to check pmd sharing possibility. Can use here depending on whether that existed in the code base or in your hgm series (or just pick that up with this one?). > + if (addr & ~PUD_MASK) { > + /* > + * hugetlb_vm_op_split is called right before we attempt to > + * split the VMA. We will need to unshare PMDs in the old and > + * new VMAs, so let's unshare before we split. > + */ > + unsigned long floor = addr & PUD_MASK; > + unsigned long ceil = floor + PUD_SIZE; > + > + if (floor < vma->vm_start || ceil >= vma->vm_end) s/>=/>/? > + /* PMD sharing is already impossible. */ > + return 0; IMHO slightly cleaner to write in the reversed way and let it fall through: if (floor >= vma->vm_start && ceil <= vma->vm_end) hugetlb_unshare_pmds(vma, floor, ceil); Thanks, > + hugetlb_unshare_pmds(vma, floor, ceil); > + } > + > return 0; > } > > @@ -7313,26 +7332,21 @@ void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio, int re > } > } > > -/* > - * This function will unconditionally remove all the shared pmd pgtable entries > - * within the specific vma for a hugetlbfs memory range. > - */ > -void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) > +static void hugetlb_unshare_pmds(struct vm_area_struct *vma, > + unsigned long start, > + unsigned long end) > { > struct hstate *h = hstate_vma(vma); > unsigned long sz = huge_page_size(h); > struct mm_struct *mm = vma->vm_mm; > struct mmu_notifier_range range; > - unsigned long address, start, end; > + unsigned long address; > spinlock_t *ptl; > pte_t *ptep; > > if (!(vma->vm_flags & VM_MAYSHARE)) > return; > > - start = ALIGN(vma->vm_start, PUD_SIZE); > - end = ALIGN_DOWN(vma->vm_end, PUD_SIZE); > - > if (start >= end) > return; > > @@ -7364,6 +7378,16 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) > mmu_notifier_invalidate_range_end(&range); > } > > +/* > + * This function will unconditionally remove all the shared pmd pgtable entries > + * within the specific vma for a hugetlbfs memory range. > + */ > +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) > +{ > + hugetlb_unshare_pmds(vma, ALIGN(vma->vm_start, PUD_SIZE), > + ALIGN_DOWN(vma->vm_end, PUD_SIZE)); > +} > + > #ifdef CONFIG_CMA > static bool cma_reserve_called __initdata; > > -- > 2.39.0.314.g84b9a713c41-goog > -- Peter Xu