From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7076AC433E0 for ; Wed, 6 Jan 2021 09:55:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EC92F2310E for ; Wed, 6 Jan 2021 09:55:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EC92F2310E Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 799198D00FD; Wed, 6 Jan 2021 04:55:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 74A718D0090; Wed, 6 Jan 2021 04:55:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 660B58D00FD; Wed, 6 Jan 2021 04:55:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0254.hostedemail.com [216.40.44.254]) by kanga.kvack.org (Postfix) with ESMTP id 4E5A98D0090 for ; Wed, 6 Jan 2021 04:55:23 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 19FC9181AEF1D for ; Wed, 6 Jan 2021 09:55:23 +0000 (UTC) X-FDA: 77674892526.28.honey29_3617a18274e0 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id DC4F7C5B1 for ; Wed, 6 Jan 2021 09:55:22 +0000 (UTC) X-HE-Tag: honey29_3617a18274e0 X-Filterd-Recvd-Size: 6142 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Wed, 6 Jan 2021 09:55:22 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1609926920; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=AyhuOFryeJUwIu0CJ6EAPjrR45s4g3oEm7P1mBVda9Y=; b=V4znQ33PtuoQSebKcH/T9BBg2BulSvISub/eOka4m9ye1jOQxtwpqbgeGAo17I8/m7ta0d HByYX8Nsb4kPXIWsNVFB1nVkDIlsaczb9CVXor7VEDSg6jy/5+Iq0B6PX+tNv8uV30yR+J aBl6pac+4QobMUMA8Dw2Q9wncTDE75E= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id CE951ACAF; Wed, 6 Jan 2021 09:55:20 +0000 (UTC) Date: Wed, 6 Jan 2021 10:55:20 +0100 From: Michal Hocko To: Dan Williams Cc: linux-mm@kvack.org, Andrew Morton , David Hildenbrand , linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions Message-ID: <20210106095520.GJ13207@dhcp22.suse.cz> References: <160990599013.2430134.11556277600719835946.stgit@dwillia2-desk3.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <160990599013.2430134.11556277600719835946.stgit@dwillia2-desk3.amr.corp.intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 05-01-21 20:07:18, Dan Williams wrote: > While pfn_to_online_page() is able to determine pfn_valid() at > subsection granularity it is not able to reliably determine if a given > pfn is also online if the section is mixed with ZONE_DEVICE pfns. I would call out the problem more explicitly. E.g. something like " This means that pfn_to_online_page can lead to false positives and allow to return a struct page which is not fully initialized because it belongs to ZONE_DEVICE (PUT AN EXAMPLE OF SUCH AN UNITIALIZED PAGE HERE). That can lead to either crash on PagePoisoned assertion or a silently broken page state with debugging disabled. " I would also appreciate a more specific note about how the existing HW can trigger this. You have mentioned 64MB subsection examples in the other email. It would be great to mention it here as well. > Update move_pfn_range_to_zone() to flag (SECTION_TAINT_ZONE_DEVICE) a > section that mixes ZONE_DEVICE pfns with other online pfns. With > SECTION_TAINT_ZONE_DEVICE to delineate, pfn_to_online_page() can fall > back to a slow-path check for ZONE_DEVICE pfns in an online section. > > With this implementation of pfn_to_online_page() pfn-walkers mostly only > need to check section metadata to determine pfn validity. In the > rare case of mixed-zone sections the pfn-walker will skip offline > ZONE_DEVICE pfns as expected. The above paragraph is slightly confusing. You do not require pfn-walkers to check anything right? Section metadata is something that is and should be hidden from them. They are asking for an online page and get it or NULL. Nothing more nothing less. > Other notes: > > Because the collision case is rare, and for simplicity, the > SECTION_TAINT_ZONE_DEVICE flag is never cleared once set. I do not see a problem with that. > pfn_to_online_page() was already borderline too large to be a macro / > inline function, but the additional logic certainly pushed it over that > threshold, and so it is moved to an out-of-line helper. Worth a separate patch. The approach is sensible. Thanks! I was worried that we do not have sufficient space for a new flag but the comment explains we have 6 bits available. I haven't double checked that for the current state of the code. The comment is quite recent and I do not remember any substantial changes in this area. Still something that is rather fragile because an unexpected alignment would be a runtime failure which is good to stop random corruptions but not ideal. Is there any way to explicitly test for this? E.g. enforce a shared section by qemu and then trigger a pfn walk? > Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") > Cc: Andrew Morton > Reported-by: Michal Hocko > Reported-by: David Hildenbrand > Signed-off-by: Dan Williams [...] > +static int zone_id(const struct zone *zone) > +{ > + struct pglist_data *pgdat = zone->zone_pgdat; > + > + return zone - pgdat->node_zones; > +} We already have zone_idx() > + > +static void section_taint_zone_device(struct zone *zone, unsigned long pfn) > +{ > + struct mem_section *ms = __nr_to_section(pfn_to_section_nr(pfn)); > + > + if (zone_id(zone) != ZONE_DEVICE) > + return; > + > + if (IS_ALIGNED(pfn, PAGES_PER_SECTION)) > + return; > + > + ms->section_mem_map |= SECTION_TAINT_ZONE_DEVICE; > +} > + > /* > * Associate the pfn range with the given zone, initializing the memmaps > * and resizing the pgdat/zone data to span the added pages. After this > @@ -707,6 +769,15 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, > resize_pgdat_range(pgdat, start_pfn, nr_pages); > pgdat_resize_unlock(pgdat, &flags); > > + /* > + * Subsection population requires care in pfn_to_online_page(). > + * Set the taint to enable the slow path detection of > + * ZONE_DEVICE pages in an otherwise ZONE_{NORMAL,MOVABLE} > + * section. > + */ > + section_taint_zone_device(zone, start_pfn); > + section_taint_zone_device(zone, start_pfn + nr_pages); I think it would be better to add the checks here and only set the flag in the called function. SECTION_TAINT_ZONE_DEVICE should go to where we define helpers for ther flags. > + > /* > * TODO now we have a visible range of pages which are not associated > * with their zone properly. Not nice but set_pfnblock_flags_mask > -- Michal Hocko SUSE Labs