From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65E5AC4363D for ; Tue, 6 Oct 2020 22:28:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 818C7208C7 for ; Tue, 6 Oct 2020 22:28:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 818C7208C7 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9FCEC6B005C; Tue, 6 Oct 2020 18:28:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9D4DD6B005D; Tue, 6 Oct 2020 18:28:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C3EC6B0062; Tue, 6 Oct 2020 18:28:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0019.hostedemail.com [216.40.44.19]) by kanga.kvack.org (Postfix) with ESMTP id 5EB6E6B005C for ; Tue, 6 Oct 2020 18:28:06 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E9A21180AD801 for ; Tue, 6 Oct 2020 22:28:05 +0000 (UTC) X-FDA: 77342939730.30.chalk06_460481a271ca Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id BDEFE180B3C85 for ; Tue, 6 Oct 2020 22:28:05 +0000 (UTC) X-HE-Tag: chalk06_460481a271ca X-Filterd-Recvd-Size: 4140 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Tue, 6 Oct 2020 22:28:05 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id CF634AD72; Tue, 6 Oct 2020 22:28:03 +0000 (UTC) Subject: Re: [PATCH 5/9] mm, page_alloc: make per_cpu_pageset accessible only after init To: Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim References: <20200922143712.12048-1-vbabka@suse.cz> <20200922143712.12048-6-vbabka@suse.cz> <20201005132445.GA4555@dhcp22.suse.cz> From: Vlastimil Babka Message-ID: Date: Wed, 7 Oct 2020 00:28:03 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20201005132445.GA4555@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/5/20 3:24 PM, Michal Hocko wrote: > On Tue 22-09-20 16:37:08, Vlastimil Babka wrote: >> setup_zone_pageset() replaces the boot_pageset by allocating and initializing a >> proper percpu one. Currently it assigns zone->pageset with the newly allocated >> one before initializing it. That's currently not an issue, because the zone >> should not be in any zonelist, thus not visible to allocators at this point. >> >> Memory ordering between the pcplist contents and its visibility is also not >> guaranteed here, but that also shouldn't be an issue because online_pages() >> does a spin_unlock(pgdat->node_size_lock) before building the zonelists. >> >> However it's best that we don't silently rely on operations that can be changed >> in the future. Make sure only properly initialized pcplists are visible, using >> smp_store_release(). The read side has a data dependency via the zone->pageset >> pointer instead of an explicit read barrier. > > Heh, this looks like inveting a similar trap the previous patch was > removing. But more seriously considering that we need a locking for the > whole setup, wouldn't it be better to simply document the locking > requirements rather than adding scary looking barriers future ourselves > or somebody else will need to scratch heads about. I am pretty sure we > don't do anything like that when initializating numa node or other data > structures that might be allocated during the memory hotadd. Yeah it will be best to drop this for now. I just looked closed at build_zonelist() implementation and got scared how it just modifies the list that others might be reading at the same time. zoneref_set_zone() sets zoneref->zone and zoneref->zone_idx without any sync, what if somebody, e.g. for_next_zone_zonelist_nodemask() makes decision based on zone_idx and then picks wrong zone pointer? etc. >> Signed-off-by: Vlastimil Babka >> --- >> mm/page_alloc.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 99b74c1c2b0a..de3b48bda45c 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -6246,15 +6246,17 @@ static void zone_set_pageset_high_and_batch(struct zone *zone) >> >> void __meminit setup_zone_pageset(struct zone *zone) >> { >> + struct per_cpu_pageset __percpu * new_pageset; >> struct per_cpu_pageset *p; >> int cpu; >> >> - zone->pageset = alloc_percpu(struct per_cpu_pageset); >> + new_pageset = alloc_percpu(struct per_cpu_pageset); >> for_each_possible_cpu(cpu) { >> - p = per_cpu_ptr(zone->pageset, cpu); >> + p = per_cpu_ptr(new_pageset, cpu); >> pageset_init(p); >> } >> >> + smp_store_release(&zone->pageset, new_pageset); >> zone_set_pageset_high_and_batch(zone); >> } >> >> -- >> 2.28.0 >