From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDF0DC04E30 for ; Mon, 9 Dec 2019 21:00:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 90EE32071E for ; Mon, 9 Dec 2019 21:00:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="xJUcW/ZJ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 90EE32071E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 37E646B28A7; Mon, 9 Dec 2019 16:00:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 32FB56B28A8; Mon, 9 Dec 2019 16:00:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 21FD96B28A9; Mon, 9 Dec 2019 16:00:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0042.hostedemail.com [216.40.44.42]) by kanga.kvack.org (Postfix) with ESMTP id 0D8976B28A7 for ; Mon, 9 Dec 2019 16:00:27 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id CADE4611D for ; Mon, 9 Dec 2019 21:00:26 +0000 (UTC) X-FDA: 76246821252.08.line06_20d46592b1d11 X-HE-Tag: line06_20d46592b1d11 X-Filterd-Recvd-Size: 7784 Received: from mail-ot1-f68.google.com (mail-ot1-f68.google.com [209.85.210.68]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Mon, 9 Dec 2019 21:00:25 +0000 (UTC) Received: by mail-ot1-f68.google.com with SMTP id 66so13507315otd.9 for ; Mon, 09 Dec 2019 13:00:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=VMFJWolzCCUGyRUuaILC3spwwtCuqvBuPYOOVXEp/aA=; b=xJUcW/ZJZqhBrirgnJUXRH9YFWt+xzEM6e4gecFQ6NSsUXaM8QRtq93mV3xIMtdnxI hwZO34hrDP+h/SvMzEgMKtvQJdml3hl4u8Dk2JbZ+yNZrYeuwfs+GQHvGkrOXCYjVYBM h1YB45XPZeWKDmwzWRJaxFbllbHyJwyu4Y4ZrjVAHuwCJoklEOKkhF2iY7wakYwatD0Y yTsZilbwBrxKad2PDw+AsboTcGf+TSYC6FfXq2xZl9CL2FrRHGKjDB93f5M6o2gRfjWa f5oibYzs7r3R8ibaS0mzMWQ1NjYVhoDKOJkf0PiqFM7DbE27F0s5OMjWrpQToA9BqDrr kc+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=VMFJWolzCCUGyRUuaILC3spwwtCuqvBuPYOOVXEp/aA=; b=SYVWvrhOguc5hhLEQNg01n7UC+LWBiBUhLnxr8VwCnx3yhS+sKuPz/mMrCfMEJ49CT EbY0DtP3/9SUFtWYuWrfxhmaI8lDwzFZ4Vf0o6Kc6n+eIfWPWhU2mwICll2646xAK42z VtOdsTzXgxkIYur4+2ba8ESanRdvvbOuEWNp5rnTHcnU91xhUUAvWPN/4C09XRG01EKi cOj4e9dEyW4GB3sPTxrSSpotdPVBxSgO4/7wv0EQ0favw/V6TIHyErcz0iPJNZ4Y5giY CJyb0j2akcvX9KDGLqnOk60g30kz/yOkf2hH+tO88IC7UC7jEV5m5aPJ1HG579oJpnPZ cLyw== X-Gm-Message-State: APjAAAUZAmEWKg4uEtc67nvhSxGnlenzOwoVr8Xsaoa/xrAO5KOZkRYj gdcHmgwZPf+hgsqcJX2j9q2vToGWAstF/suq4jjMjg== X-Google-Smtp-Source: APXvYqwi+gYj1cVL4mF8Xopoq9y5To6+x5iJhC9e0WbcEazDDmPLdLZBK/hlPgH1EYJCfb16RNbXAt7Qn0TURN7r5Vw= X-Received: by 2002:a9d:4789:: with SMTP id b9mr21728446otf.247.1575925224491; Mon, 09 Dec 2019 13:00:24 -0800 (PST) MIME-Version: 1.0 References: <20191209191346.5197-1-logang@deltatee.com> <20191209191346.5197-6-logang@deltatee.com> <20191209204128.GC7658@dhcp22.suse.cz> In-Reply-To: <20191209204128.GC7658@dhcp22.suse.cz> From: Dan Williams Date: Mon, 9 Dec 2019 13:00:13 -0800 Message-ID: Subject: Re: [PATCH 5/6] mm, memory_hotplug: Provide argument for the pgprot_t in arch_add_memory() To: Michal Hocko Cc: Logan Gunthorpe , David Hildenbrand , Linux Kernel Mailing List , Linux ARM , linux-ia64@vger.kernel.org, linuxppc-dev , linux-s390 , Linux-sh , platform-driver-x86@vger.kernel.org, Linux MM , Christoph Hellwig , Andrew Morton , Catalin Marinas , Will Deacon , Benjamin Herrenschmidt , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , Andy Lutomirski , Peter Zijlstra Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Dec 9, 2019 at 12:47 PM Michal Hocko wrote: > > On Mon 09-12-19 13:24:19, Logan Gunthorpe wrote: > > > > > > On 2019-12-09 12:23 p.m., David Hildenbrand wrote: > > > On 09.12.19 20:13, Logan Gunthorpe wrote: > > >> devm_memremap_pages() is currently used by the PCI P2PDMA code to create > > >> struct page mappings for IO memory. At present, these mappings are created > > >> with PAGE_KERNEL which implies setting the PAT bits to be WB. However, on > > >> x86, an mtrr register will typically override this and force the cache > > >> type to be UC-. In the case firmware doesn't set this register it is > > >> effectively WB and will typically result in a machine check exception > > >> when it's accessed. > > >> > > >> Other arches are not currently likely to function correctly seeing they > > >> don't have any MTRR registers to fall back on. > > >> > > >> To solve this, add an argument to arch_add_memory() to explicitly > > >> set the pgprot value to a specific value. > > >> > > >> Of the arches that support MEMORY_HOTPLUG: x86_64, s390 and arm64 is a > > >> simple change to pass the pgprot_t down to their respective functions > > >> which set up the page tables. For x86_32, set the page tables explicitly > > >> using _set_memory_prot() (seeing they are already mapped). For sh, reject > > >> anything but PAGE_KERNEL settings -- this should be fine, for now, seeing > > >> sh doesn't support ZONE_DEVICE anyway. > > >> > > >> Cc: Dan Williams > > >> Cc: David Hildenbrand > > >> Cc: Michal Hocko > > >> Signed-off-by: Logan Gunthorpe > > >> --- > > >> arch/arm64/mm/mmu.c | 4 ++-- > > >> arch/ia64/mm/init.c | 5 ++++- > > >> arch/powerpc/mm/mem.c | 4 ++-- > > >> arch/s390/mm/init.c | 4 ++-- > > >> arch/sh/mm/init.c | 5 ++++- > > >> arch/x86/mm/init_32.c | 7 ++++++- > > >> arch/x86/mm/init_64.c | 4 ++-- > > >> include/linux/memory_hotplug.h | 2 +- > > >> mm/memory_hotplug.c | 2 +- > > >> mm/memremap.c | 2 +- > > >> 10 files changed, 25 insertions(+), 14 deletions(-) > > >> > > >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > > >> index 60c929f3683b..48b65272df15 100644 > > >> --- a/arch/arm64/mm/mmu.c > > >> +++ b/arch/arm64/mm/mmu.c > > >> @@ -1050,7 +1050,7 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr) > > >> } > > >> > > >> #ifdef CONFIG_MEMORY_HOTPLUG > > >> -int arch_add_memory(int nid, u64 start, u64 size, > > >> +int arch_add_memory(int nid, u64 start, u64 size, pgprot_t prot, > > >> struct mhp_restrictions *restrictions) > > > > > > Can we fiddle that into "struct mhp_restrictions" instead? > > > > Yes, if that's what people want, it's pretty trivial to do. I chose not > > to do it that way because it doesn't get passed down to add_pages() and > > it's not really a "restriction". If I don't hear any objections, I will > > do that for v2. > > I do agree that restriction is not the best fit. But I consider prot > argument to complicate the API to all users even though it is not really > clear whether we are going to have many users really benefiting from it. > Look at the vmalloc API and try to find how many users of __vmalloc do > not use PAGE_KERNEL. At least for this I can foresee at least one more user in the pipeline, encrypted memory support for persistent memory mappings that will store the key-id in the ptes. > > So I can see two options. One of them is to add arch_add_memory_prot > that would allow to have give and extra prot argument or simply call > an arch independent API to change the protection after arch_add_memory. > The later sounds like much less code. The memory shouldn't be in use by > anybody at that stage yet AFAIU. Maybe there even is an API like that. I'm ok with passing it the same way as altmap or a new arch_add_memory_prot() my only hangup with after the fact changes is the wasted effort it inflicts in the init path for potentially large address ranges.