From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1562DECAAD3 for ; Mon, 5 Sep 2022 09:46:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 751B7801CF; Mon, 5 Sep 2022 05:46:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 701818D0050; Mon, 5 Sep 2022 05:46:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C965801CF; Mon, 5 Sep 2022 05:46:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4DFB88D0050 for ; Mon, 5 Sep 2022 05:46:15 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1CD90120DB4 for ; Mon, 5 Sep 2022 09:46:15 +0000 (UTC) X-FDA: 79877551110.24.552BC5B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf21.hostedemail.com (Postfix) with ESMTP id B4F961C0071 for ; Mon, 5 Sep 2022 09:46:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662371174; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1Q/0WbMfiU5pCGwYiNx91CEKF86M0nGqDiqFxeXREM8=; b=NMDeFA1gG8XFzvA7mPce0nh+xNc7jiR+aKr1GnFLO4RDZ0qfnSxqdgXhYD/1O6WDfnVe61 Jxxxkq4DBLooqa4+8rXz+koJhwvkLHGJOHE4cSPqvWm6Q1kpJKSrdo/Sl0+9Q7UGjCpD2k 8xWrs08jw8JZxQjypHH1Kx3BDRwb4IY= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-14-Xy4kusniNyyoIdkg0DOSlg-1; Mon, 05 Sep 2022 05:46:13 -0400 X-MC-Unique: Xy4kusniNyyoIdkg0DOSlg-1 Received: by mail-wr1-f72.google.com with SMTP id s1-20020adf9781000000b002286cd81376so460953wrb.22 for ; Mon, 05 Sep 2022 02:46:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date; bh=1Q/0WbMfiU5pCGwYiNx91CEKF86M0nGqDiqFxeXREM8=; b=igsDaSRRRqFg8avdUHdDRqgThiBqbpF1oxqicqalfkawbb4FlkZmmX/b8RRRbqmIMh Jn5vEutnw/dhVQ6jUvis06vNqyEnFEq1z7Rw0MaGNgCvGqV1BXUpH+DgiEB0Gr1hNNc2 Bwlgd0psW8UyHtA2XcmUhJBGfFhtikQdCRpzj2eclkxqg9yHh6GB7Azq+7N7uwf8dYSH jk/qgcTY1fF5C0SAtvBkXiGFULr+XJ/LqFo9MyrvNPk1LhvYnvTqg1+gM4GJmBUme3Th gu0zFGD03v7Mj+QTilXoTdI+KNVkPs9jgRljI4vKX6tonhzBqBQrUq/K1mS8VGvXnpdg Yy1A== X-Gm-Message-State: ACgBeo12OFJdP2ZxSCGG76zhBnZsFnfw0EX2yGA0GmACJmF5/dUh4jER 8bIZ1A21TCLTv3lSLQxARRPvFo9lhL1b/9vUMMntW8387lVdRVeZKInxS8jpe3UB+v88JfGXNih CDJ0uVs2Lf9g= X-Received: by 2002:a1c:7708:0:b0:3a5:5543:cec4 with SMTP id t8-20020a1c7708000000b003a55543cec4mr10424027wmi.47.1662371171875; Mon, 05 Sep 2022 02:46:11 -0700 (PDT) X-Google-Smtp-Source: AA6agR4Wwy0zqSyrxArr5R3ToadS4HGGIDzAwUIVN/JrVOdKZSTGbW21NA2h//jzBvvIZj5PunHYoA== X-Received: by 2002:a1c:7708:0:b0:3a5:5543:cec4 with SMTP id t8-20020a1c7708000000b003a55543cec4mr10424006wmi.47.1662371171554; Mon, 05 Sep 2022 02:46:11 -0700 (PDT) Received: from ?IPV6:2003:d8:2f0d:ba00:c951:31d7:b2b0:8ba0? (p200300d82f0dba00c95131d7b2b08ba0.dip0.t-ipconnect.de. [2003:d8:2f0d:ba00:c951:31d7:b2b0:8ba0]) by smtp.gmail.com with ESMTPSA id bz9-20020a056000090900b0022584c82c80sm8630307wrb.19.2022.09.05.02.46.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 05 Sep 2022 02:46:11 -0700 (PDT) Message-ID: Date: Mon, 5 Sep 2022 11:46:10 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.0 To: Christophe Leroy , Mike Kravetz Cc: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "linux-ia64@vger.kernel.org" , Baolin Wang , "Aneesh Kumar K . V" , Naoya Horiguchi , Michael Ellerman , Muchun Song , Andrew Morton , "linuxppc-dev@lists.ozlabs.org" References: <20220829234053.159158-1-mike.kravetz@oracle.com> <608934d4-466d-975e-6458-34a91ccb4669@redhat.com> <739dc825-ece3-a59f-adc5-65861676e0ae@redhat.com> <323fdb0f-c5a5-e0e5-1ff4-ab971bc295cc@redhat.com> <5f6a7c6b-5073-f050-8dae-6ee279a8bb0b@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH] hugetlb: simplify hugetlb handling in follow_page_mask In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662371174; a=rsa-sha256; cv=none; b=L2ogad18t2/PA15OipHJBqg4lrX4Bw1uhOWy+hnq8+6Q5R0cp2Ux2bNXMKU1j9k4LLds5/ Q/NxVp1d5N32vjLePMmIzITPWV2nHPnx839JmTvMkHtyvKtzPD/TrkbwH7LN9SVuiL6yKN vqwePSWp0vM45ny2lG1XAZzfef2C5QA= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NMDeFA1g; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf21.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662371174; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1Q/0WbMfiU5pCGwYiNx91CEKF86M0nGqDiqFxeXREM8=; b=LyrS/ML0XIs5M2emLSDVh+epZepyETHQ9ZhMTBv7AjP66LCMO41Cd1Yanv1GIvvUBZseAT Aqu56Raa/ycxVycmVszV1254nw94oXWVGHm0GYHC6OJR5Uut7Rqj8neVBpzBhslAdMmYCL wSBXOplnDxBUxyAxxvrMY+hMtvgtoKc= X-Rspam-User: Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NMDeFA1g; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf21.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: B4F961C0071 X-Stat-Signature: 8ga9fmog1o69tbd96okmfe3ejbgxwzka X-HE-Tag: 1662371174-6430 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 05.09.22 11:33, Christophe Leroy wrote: > > > Le 05/09/2022 à 10:37, David Hildenbrand a écrit : >> On 03.09.22 09:07, Christophe Leroy wrote: >>> +Resending with valid powerpc list address >>> >>> Le 02/09/2022 à 20:52, David Hildenbrand a écrit : >>>>>>> Adding Christophe on Cc: >>>>>>> >>>>>>> Christophe do you know if is_hugepd is true for all hugetlb >>>>>>> entries, not >>>>>>> just hugepd? >>> >>> is_hugepd() is true if and only if the directory entry points to a huge >>> page directory and not to the normal lower level directory. >>> >>> As far as I understand if the directory entry is not pointing to any >>> lower directory but is a huge page entry, pXd_leaf() is true. >>> >>> >>>>>>> >>>>>>> On systems without hugepd entries, I guess ptdump skips all >>>>>>> hugetlb entries. >>>>>>> Sigh! >>> >>> As far as I can see, ptdump_pXd_entry() handles the pXd_leaf() case. >>> >>>>>> >>>>>> IIUC, the idea of ptdump_walk_pgd() is to dump page tables even >>>>>> outside >>>>>> VMAs (for debugging purposes?). >>>>>> >>>>>> I cannot convince myself that that's a good idea when only holding the >>>>>> mmap lock in read mode, because we can just see page tables getting >>>>>> freed concurrently e.g., during concurrent munmap() ... while holding >>>>>> the mmap lock in read we may only walk inside VMA boundaries. >>>>>> >>>>>> That then raises the questions if we're only calling this on >>>>>> special MMs >>>>>> (e.g., init_mm) whereby we cannot really see concurrent munmap() and >>>>>> where we shouldn't have hugetlb mappings or hugepd entries. >>> >>> At least on powerpc, PTDUMP handles only init_mm. >>> >>> Hugepage are used at least on powerpc 8xx for linear memory mapping, see >>> >>> commit 34536d780683 ("powerpc/8xx: Add a function to early map kernel >>> via huge pages") >>> commit cf209951fa7f ("powerpc/8xx: Map linear memory with huge pages") >>> >>> hugepds may also be used in the future to use huge pages for vmap and >>> vmalloc, see commit a6a8f7c4aa7e ("powerpc/8xx: add support for huge >>> pages on VMAP and VMALLOC") >>> >>> As far as I know, ppc64 also use huge pages for VMAP and VMALLOC, see >>> >>> commit d909f9109c30 ("powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP") >>> commit 8abddd968a30 ("powerpc/64s/radix: Enable huge vmalloc mappings") >> >> There is a difference between an ordinary huge mapping (e.g., as used >> for THP) and a a hugetlb mapping. >> >> Our current understanding is that hugepd only applies to hugetlb. >> Wouldn't vmap/vmalloc user ordinary huge pmd entries instead of hugepd? >> > > 'hugepd' stands for huge page directory. It is independant of whether a > huge page is used for hugetlb or for anything else, it represents the > way pages are described in the page tables. This patch here makes the assumption that hugepd only applies to hugetlb, because it removes any such handling from the !hugetlb path in GUP. Is that incorrect or are there valid cases where that could happen? (init_mm is special in that regard, i don't think it interacts with GUP at all). > > I don't know what you mean by _ordinary_ huge pmd entry. > Essentially, what we use for THP. Let me try to understand how hugepd interact with the rest of the system. Do systems that support hugepd currently implement THP? Reading above 32bit systems below, I assume not? > Let's take the exemple of powerpc 8xx which is the one I know best. This > is a powerpc32, so it has two levels : PGD and PTE. PGD has 1024 entries > and each entry covers a 4Mbytes area. Normal PTE has 1024 entries and > each entry is a 4k page. When you use 8Mbytes pages, you don't use PTEs > as it would be a waste of memory. You use a huge page directory that has > a single entry, and you have two PGD entries pointing to the huge page > directory. Thanks, I assume there are no 8MB THP, correct? The 8MB example with 4MB PGD entries makes it sound a bit like the cont-PTE/cont-PMD handling on aarch64: they don't use a hugepd but would simply let two consecutive PGD entries point at the the relevant (sub) parts of the hugetlb page. No hugepd involved. > > Some time ago, hupgepd was also used for 512kbytes pages and 16kbytes > pages: > - there was huge page directories with 8x 512kbytes pages, > - there was huge page directories with 256x 16kbytes pages, > > And the PGD/PMD entry points to a huge page directory (HUGEPD) instead > of pointing to a page table directory (PTE). Thanks for the example. > > Since commit b250c8c08c79 ("powerpc/8xx: Manage 512k huge pages as > standard pages."), the 8xx doesn't use anymore hugepd for 512k huge > page, but other platforms like powerpc book3e extensively use huge page > directories. > > I hope this clarifies the subject, otherwise I'm happy to provide > further details. Thanks, it would be valuable to know if the assumption in this patch is correct: hugepd will only be found in hugetlb areas in ordinary MMs (not init_mm). -- Thanks, David / dhildenb