From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE403C4361A for ; Fri, 4 Dec 2020 08:13:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 40D69225A9 for ; Fri, 4 Dec 2020 08:13:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 40D69225A9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 61AAF6B0036; Fri, 4 Dec 2020 03:13:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5CAC46B005C; Fri, 4 Dec 2020 03:13:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 493306B0068; Fri, 4 Dec 2020 03:13:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0099.hostedemail.com [216.40.44.99]) by kanga.kvack.org (Postfix) with ESMTP id 32BA76B0036 for ; Fri, 4 Dec 2020 03:13:12 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 7E82B3638 for ; Fri, 4 Dec 2020 08:13:11 +0000 (UTC) X-FDA: 77554884582.16.crowd15_1708d15273c2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 5E0E6100E6903 for ; Fri, 4 Dec 2020 08:13:11 +0000 (UTC) X-HE-Tag: crowd15_1708d15273c2 X-Filterd-Recvd-Size: 7093 Received: from mail-pl1-f196.google.com (mail-pl1-f196.google.com [209.85.214.196]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Fri, 4 Dec 2020 08:13:10 +0000 (UTC) Received: by mail-pl1-f196.google.com with SMTP id p6so2686174plo.6 for ; Fri, 04 Dec 2020 00:13:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:subject:to:cc:references:in-reply-to:mime-version :message-id:content-transfer-encoding; bh=3fbzFtODlG6DhIFOuWtZWUtQuIVTw1JtB82OhelR1f8=; b=hiaCSk2RB4ZbZZ7qUDKx5WuheWKkTpmcIvHy+cDMvtWoe2wDZGGmS/CTLyju1KF2h8 wrdeWY5t6bUGml15IUYpkpCB3LWc8lGpFbHfR3bWaCjZlD3sXcH9T/3RpltgStc6Koto 1j8Cz+Cj3tG5sw/UmjdiDUMINMSPhCVp5JuxGyUJtGxH4PAe0RR8GUSCV82Zd8p/Yh41 BzYOl3j+6ON8TZJFpdErKBQway7qpqf5hq+9ZcktbSgtVO/++tslXRwcBKuYp/G8Kldq AlgSDTdbZ8gam9IwQZQDkNGSFEc5lAR9DYMhIO6aym5RiP/KPWlSuKDaVLgbPXhM+Sm+ jnVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:message-id:content-transfer-encoding; bh=3fbzFtODlG6DhIFOuWtZWUtQuIVTw1JtB82OhelR1f8=; b=ID8Ucas/YW0e7+KIdwnneLtJFQVqZtjIwRUWOqnnUmJ+IyQ92No7GjA6MR50bcTiBc Z65FQdh5FRkp7iGUBpgqoGfpUtv5bID7YUCbq0jS9+FH+IDByEkEMR+xynPgvFx9AMY0 m7zjA5FF8rMGvC9JQBH29NWR7+z7hzEKlEbdGeW68crWrKHVd8HJu6r6aJgGgt7KCWVu wuwNA5+lmvTowa2Z9jsHpV7BqMVGf0xuSw2uswsePaQ2R7rOidibkUlQ2sfEIegpsi62 GdphecxBV1DMFLRrkOTpQha/DfBQ4xx59rITEs3+q0bxsy6H4Vhq1D4n60m1Xsp07Qql WHXA== X-Gm-Message-State: AOAM531S68ez6sFbFMElasMNSv1KkCf9DlMuNxor7rh8FEYYJj3UfszM C7tAFdEkKTyRbuapf46ftnQ= X-Google-Smtp-Source: ABdhPJxFGKvxSujUjSN4DbHCJiDIqCfKt/Gd/ih5Y1oXid6kspH8VxuaLkxN8iSd6XopGq0+v05kZA== X-Received: by 2002:a17:902:c395:b029:da:9aca:c972 with SMTP id g21-20020a170902c395b02900da9acac972mr2815306plg.32.1607069588337; Fri, 04 Dec 2020 00:13:08 -0800 (PST) Received: from localhost ([1.129.136.83]) by smtp.gmail.com with ESMTPSA id n68sm4177105pfn.161.2020.12.04.00.13.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Dec 2020 00:13:07 -0800 (PST) Date: Fri, 04 Dec 2020 18:12:58 +1000 From: Nicholas Piggin Subject: Re: [PATCH v8 11/12] mm/vmalloc: Hugepage vmalloc mappings To: "akpm@linux-foundation.org" , "linux-mm@kvack.org" , "Edgecombe, Rick P" Cc: "christophe.leroy@csgroup.eu" , "hch@infradead.org" , "Jonathan.Cameron@Huawei.com" , "linux-arch@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linuxppc-dev@lists.ozlabs.org" , "lizefan@huawei.com" References: <20201128152559.999540-1-npiggin@gmail.com> <20201128152559.999540-12-npiggin@gmail.com> In-Reply-To: MIME-Version: 1.0 Message-Id: <1607068679.lfd133za4h.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Excerpts from Edgecombe, Rick P's message of December 1, 2020 6:21 am: > On Sun, 2020-11-29 at 01:25 +1000, Nicholas Piggin wrote: >> Support huge page vmalloc mappings. Config option >> HAVE_ARCH_HUGE_VMALLOC >> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and >> supports PMD sized vmap mappings. >>=20 >> vmalloc will attempt to allocate PMD-sized pages if allocating PMD >> size >> or larger, and fall back to small pages if that was unsuccessful. >>=20 >> Allocations that do not use PAGE_KERNEL prot are not permitted to use >> huge pages, because not all callers expect this (e.g., module >> allocations vs strict module rwx). >=20 > Several architectures (x86, arm64, others?) allocate modules initially > with PAGE_KERNEL and so I think this test will not exclude module > allocations in those cases. Ah, thanks. I guess archs must additionally ensure that their PAGE_KERNEL allocations are suitable for huge page mappings before enabling the option. If there is interest from those archs to support this, I have an early (un-posted) patch that adds an explicit VM_HUGE flag that could override the pessemistic arch default. It's not much trouble to add this=20 to the large system hash allocations. It's very out of date now but I=20 can at least give what I have to anyone doing an arch support that wants it. >=20 > [snip] >=20 >> @@ -2400,6 +2453,7 @@ static inline void set_area_direct_map(const >> struct vm_struct *area, >> { >> int i; >> =20 >> + /* HUGE_VMALLOC passes small pages to set_direct_map */ >> for (i =3D 0; i < area->nr_pages; i++) >> if (page_address(area->pages[i])) >> set_direct_map(area->pages[i]); >> @@ -2433,11 +2487,12 @@ static void vm_remove_mappings(struct >> vm_struct *area, int deallocate_pages) >> * map. Find the start and end range of the direct mappings to >> make sure >> * the vm_unmap_aliases() flush includes the direct map. >> */ >> - for (i =3D 0; i < area->nr_pages; i++) { >> + for (i =3D 0; i < area->nr_pages; i +=3D 1U << area->page_order) { >> unsigned long addr =3D (unsigned long)page_address(area- >> >pages[i]); >> if (addr) { >> + unsigned long page_size =3D PAGE_SIZE << area- >> >page_order; >> start =3D min(addr, start); >> - end =3D max(addr + PAGE_SIZE, end); >> + end =3D max(addr + page_size, end); >> flush_dmap =3D 1; >> } >> } >=20 > The logic around this is a bit tangled. The reset of the direct map has > to succeed, but if the set_direct_map_() functions require a split they > could fail. For x86, set_memory_ro() calls on a vmalloc alias will > mirror the page size and permission on the direct map and so the direct > map will be broken to 4k pages if it's a RO vmalloc allocation. >=20 > But after this, module vmalloc()'s could have large pages which would > result in large RO pages on the direct map. Then it could possibly fail > when trying to reset a 4k page out of a large RO direct map mapping.=20 >=20 > I think either module allocations need to be actually excluded from > having large pages (seems like you might have seen other issues as > well?), or another option could be to use the changes here: > https://lore.kernel.org/lkml/20201125092208.12544-4-rppt@kernel.org/ > to reset the direct map for a large page range at a time for large=20 > vmalloc pages. >=20 Right, x86 would have to do something about that before enabling. A VM_HUGE flag might be quick and easy but maybe other options are not=20 too difficult. Thanks, Nick