From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23DB6C433E0 for ; Fri, 29 Jan 2021 12:46:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9EF5164DE9 for ; Fri, 29 Jan 2021 12:46:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9EF5164DE9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BA9566B0005; Fri, 29 Jan 2021 07:46:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B59016B0006; Fri, 29 Jan 2021 07:46:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A20566B006C; Fri, 29 Jan 2021 07:46:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8DE256B0005 for ; Fri, 29 Jan 2021 07:46:43 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 260B58249980 for ; Fri, 29 Jan 2021 12:46:43 +0000 (UTC) X-FDA: 77758786686.23.flock22_0c01535275a8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 0859137608 for ; Fri, 29 Jan 2021 12:46:42 +0000 (UTC) X-HE-Tag: flock22_0c01535275a8 X-Filterd-Recvd-Size: 7571 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Fri, 29 Jan 2021 12:46:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1611924401; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mX/u0Xzo167kUDlWEqgbjJ/cIs1Lmp0/BzY59BKMPps=; b=ReSMbpVt+axqdW7aack2mBwGIrb3XnYVjbjZRR2NW9LPBGJDk9rP0i6OC7cfNttKeaaDdM n8GczsxfkpJgA0N2oylsnUuMH7jis+zNk73joF4OY6m17+HhO3o3HVs67aq8jj+xiQ/Bf0 Tu/cA0EkwIEOzb6Ag97dO1RClE2zeBE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-575-IBWTRJgeP5iSOB5SCUeleQ-1; Fri, 29 Jan 2021 07:46:39 -0500 X-MC-Unique: IBWTRJgeP5iSOB5SCUeleQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7DFB95B368; Fri, 29 Jan 2021 12:46:37 +0000 (UTC) Received: from [10.36.113.219] (ovpn-113-219.ams2.redhat.com [10.36.113.219]) by smtp.corp.redhat.com (Postfix) with ESMTP id 83B655C233; Fri, 29 Jan 2021 12:46:34 +0000 (UTC) To: Oscar Salvador , Andrew Morton Cc: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20210129064045.18471-1-osalvador@suse.de> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [PATCH v2] x86/vmemmap: Handle unpopulated sub-pmd ranges Message-ID: Date: Fri, 29 Jan 2021 13:46:33 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: <20210129064045.18471-1-osalvador@suse.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 29.01.21 07:40, Oscar Salvador wrote: > When the size of a struct page is not multiple of 2MB, sections do > not span a PMD anymore and so when populating them some parts of the > PMD will remain unused. > Because of this, PMDs will be left behind when depopulating sections > since remove_pmd_table() thinks that those unused parts are still in > use. >=20 > Fix this by marking the unused parts with PAGE_INUSE, so memchr_inv() w= ill > do the right thing and will let us free the PMD when the last user of i= t > is gone. >=20 > This patch is based on a similar patch by David Hildenbrand: >=20 > https://lore.kernel.org/linux-mm/20200722094558.9828-9-david@redhat.com= / > https://lore.kernel.org/linux-mm/20200722094558.9828-10-david@redhat.co= m/ >=20 > Signed-off-by: Oscar Salvador > --- >=20 > v1 -> v2: > - Rename PAGE_INUSE to PAGE_UNUSED as it better describes what we do >=20 > --- > arch/x86/mm/init_64.c | 91 +++++++++++++++++++++++++++++++++++++-----= - > 1 file changed, 79 insertions(+), 12 deletions(-) >=20 > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c > index b5a3fa4033d3..dbb76160ed52 100644 > --- a/arch/x86/mm/init_64.c > +++ b/arch/x86/mm/init_64.c > @@ -871,7 +871,72 @@ int arch_add_memory(int nid, u64 start, u64 size, > return add_pages(nid, start_pfn, nr_pages, params); > } > =20 > -#define PAGE_INUSE 0xFD > +#define PAGE_UNUSED 0xFD > + > +/* > + * The unused vmemmap range, which was not yet memset(PAGE_UNUSED) ran= ges > + * from unused_pmd_start to next PMD_SIZE boundary. > + */ > +static unsigned long unused_pmd_start __meminitdata; > + > +static void __meminit vmemmap_flush_unused_pmd(void) > +{ > + if (!unused_pmd_start) > + return; > + /* > + * Clears (unused_pmd_start, PMD_END] > + */ > + memset((void *)unused_pmd_start, PAGE_UNUSED, > + ALIGN(unused_pmd_start, PMD_SIZE) - unused_pmd_start); > + unused_pmd_start =3D 0; > +} > + > +/* Returns true if the PMD is completely unused and thus it can be fre= ed */ > +static bool __meminit vmemmap_unuse_sub_pmd(unsigned long addr, unsign= ed long end) > +{ > + unsigned long start =3D ALIGN_DOWN(addr, PMD_SIZE); > + > + vmemmap_flush_unused_pmd(); > + memset((void *)addr, PAGE_UNUSED, end - addr); > + > + return !memchr_inv((void *)start, PAGE_UNUSED, PMD_SIZE); > +} > + > +static void __meminit vmemmap_use_sub_pmd(unsigned long start, unsigne= d long end) > +{ > + /* > + * We only optimize if the new used range directly follows the > + * previously unused range (esp., when populating consecutive section= s). > + */ > + if (unused_pmd_start =3D=3D start) { > + if (likely(IS_ALIGNED(end, PMD_SIZE))) > + unused_pmd_start =3D 0; > + else > + unused_pmd_start =3D end; > + return; > + } > + > + vmemmap_flush_unused_pmd(); > +} > + > +static void __meminit vmemmap_use_new_sub_pmd(unsigned long start, uns= igned long end) > +{ > + vmemmap_flush_unused_pmd(); > + > + /* > + * Mark the unused parts of the new memmap range > + */ > + if (!IS_ALIGNED(start, PMD_SIZE)) > + memset((void *)start, PAGE_UNUSED, > + start - ALIGN_DOWN(start, PMD_SIZE)); > + /* > + * We want to avoid memset(PAGE_UNUSED) when populating the vmemmap o= f > + * consecutive sections. Remember for the last added PMD the last > + * unused range in the populated PMD. > + */ > + if (!IS_ALIGNED(end, PMD_SIZE)) > + unused_pmd_start =3D end; > +} > =20 > static void __meminit free_pagetable(struct page *page, int order) > { > @@ -1008,10 +1073,10 @@ remove_pte_table(pte_t *pte_start, unsigned lon= g addr, unsigned long end, > * with 0xFD, and remove the page when it is wholly > * filled with 0xFD. > */ > - memset((void *)addr, PAGE_INUSE, next - addr); > + memset((void *)addr, PAGE_UNUSED, next - addr); > =20 > page_addr =3D page_address(pte_page(*pte)); > - if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) { > + if (!memchr_inv(page_addr, PAGE_UNUSED, PAGE_SIZE)) { > free_pagetable(pte_page(*pte), 0); > =20 I remember already raising this, in the context of other cleanups, but=20 let's start anew: How could we ever even end up in "!PAGE_ALIGNED(addr) &&=20 PAGE_ALIGNED(next)"? As the comment correctly indicates, it would only=20 make sense for "freeing vmemmap pages". This would mean we are removing parts of a vmemmap page (4k), calling=20 vmemmap_free()->remove_pagetable() on sub-page granularity. Even sub-sections (2MB - 512 pages) have a memmap size with base pages: - 56 bytes: 7 pages - 64 bytes: 8 pages - 72 bytes: 9 pages sizeof(struct page) is always multiples of 8 bytes, so that will hold. E.g., in __populate_section_memmap(), we already enforce proper=20 subsection alignment. IMHO, we should rip out that code here and enforce page alignment in=20 vmemmap_populate()/vmemmap_free(). Am I missing something? --=20 Thanks, David / dhildenb