From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44FC5C4727E for ; Thu, 8 Oct 2020 11:42:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C92A721D7B for ; Thu, 8 Oct 2020 11:42:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C92A721D7B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6C4716B005C; Thu, 8 Oct 2020 07:42:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 672DD6B005D; Thu, 8 Oct 2020 07:42:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 22F226B005C; Thu, 8 Oct 2020 07:42:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0205.hostedemail.com [216.40.44.205]) by kanga.kvack.org (Postfix) with ESMTP id DB4896B005D for ; Thu, 8 Oct 2020 07:42:09 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 869B6180AD807 for ; Thu, 8 Oct 2020 11:42:09 +0000 (UTC) X-FDA: 77348569578.26.wax93_1f0580f271d7 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 3D1D81804B661 for ; Thu, 8 Oct 2020 11:42:09 +0000 (UTC) X-HE-Tag: wax93_1f0580f271d7 X-Filterd-Recvd-Size: 5790 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Thu, 8 Oct 2020 11:42:08 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id A7865AE1C; Thu, 8 Oct 2020 11:42:07 +0000 (UTC) From: Vlastimil Babka To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka , Michal Hocko Subject: [PATCH v2 4/7] mm, page_alloc: simplify pageset_update() Date: Thu, 8 Oct 2020 13:41:58 +0200 Message-Id: <20201008114201.18824-5-vbabka@suse.cz> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201008114201.18824-1-vbabka@suse.cz> References: <20201008114201.18824-1-vbabka@suse.cz> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: pageset_update() attempts to update pcplist's high and batch values in a = way that readers don't observe batch > high. It uses smp_wmb() to order the u= pdates in a way to achieve this. However, without proper pairing read barriers i= n readers this guarantee doesn't hold, and there are no such barriers in e.g. free_unref_page_commit(). Commit 88e8ac11d2ea ("mm, page_alloc: fix core hung in free_pcppages_bulk= ()") already showed this is problematic, and solved this by ultimately only tr= using pcp->count of the current cpu with interrupts disabled. The update dance with unpaired write barriers thus makes no sense. Replac= e them with plain WRITE_ONCE to prevent store tearing, and document that th= e values can change asynchronously and should not be trusted for correctnes= s. All current readers appear to be OK after 88e8ac11d2ea. Convert them to READ_ONCE to prevent unnecessary read tearing, but mainly to alert anybod= y making future changes to the code that special care is needed. Signed-off-by: Vlastimil Babka Acked-by: David Hildenbrand Acked-by: Michal Hocko --- mm/page_alloc.c | 40 ++++++++++++++++++---------------------- 1 file changed, 18 insertions(+), 22 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f827b42a2475..f33c36312eb5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1344,7 +1344,7 @@ static void free_pcppages_bulk(struct zone *zone, i= nt count, { int migratetype =3D 0; int batch_free =3D 0; - int prefetch_nr =3D 0; + int prefetch_nr =3D READ_ONCE(pcp->batch); bool isolated_pageblocks; struct page *page, *tmp; LIST_HEAD(head); @@ -1395,8 +1395,10 @@ static void free_pcppages_bulk(struct zone *zone, = int count, * avoid excessive prefetching due to large count, only * prefetch buddy for the first pcp->batch nr of pages. */ - if (prefetch_nr++ < pcp->batch) + if (prefetch_nr) { prefetch_buddy(page); + prefetch_nr--; + } } while (--count && --batch_free && !list_empty(list)); } =20 @@ -3190,10 +3192,8 @@ static void free_unref_page_commit(struct page *pa= ge, unsigned long pfn) pcp =3D &this_cpu_ptr(zone->pageset)->pcp; list_add(&page->lru, &pcp->lists[migratetype]); pcp->count++; - if (pcp->count >=3D pcp->high) { - unsigned long batch =3D READ_ONCE(pcp->batch); - free_pcppages_bulk(zone, batch, pcp); - } + if (pcp->count >=3D READ_ONCE(pcp->high)) + free_pcppages_bulk(zone, READ_ONCE(pcp->batch), pcp); } =20 /* @@ -3378,7 +3378,7 @@ static struct page *__rmqueue_pcplist(struct zone *= zone, int migratetype, do { if (list_empty(list)) { pcp->count +=3D rmqueue_bulk(zone, 0, - pcp->batch, list, + READ_ONCE(pcp->batch), list, migratetype, alloc_flags); if (unlikely(list_empty(list))) return NULL; @@ -6250,13 +6250,16 @@ static int zone_batchsize(struct zone *zone) } =20 /* - * pcp->high and pcp->batch values are related and dependent on one anot= her: - * ->batch must never be higher then ->high. - * The following function updates them in a safe manner without read sid= e - * locking. + * pcp->high and pcp->batch values are related and generally batch is lo= wer + * than high. They are also related to pcp->count such that count is low= er + * than high, and as soon as it reaches high, the pcplist is flushed. * - * Any new users of pcp->batch and pcp->high should ensure they can cope= with - * those fields changing asynchronously (acording to the above rule). + * However, guaranteeing these relations at all times would require e.g.= write + * barriers here but also careful usage of read barriers at the read sid= e, and + * thus be prone to error and bad for performance. Thus the update only = prevents + * store tearing. Any new users of pcp->batch and pcp->high should ensur= e they + * can cope with those fields changing asynchronously, and fully trust o= nly the + * pcp->count field on the local CPU with interrupts disabled. * * mutex_is_locked(&pcp_batch_high_lock) required when calling this func= tion * outside of boot time (or some other assurance that no concurrent upda= ters @@ -6265,15 +6268,8 @@ static int zone_batchsize(struct zone *zone) static void pageset_update(struct per_cpu_pages *pcp, unsigned long high= , unsigned long batch) { - /* start with a fail safe value for batch */ - pcp->batch =3D 1; - smp_wmb(); - - /* Update high, then batch, in order */ - pcp->high =3D high; - smp_wmb(); - - pcp->batch =3D batch; + WRITE_ONCE(pcp->batch, batch); + WRITE_ONCE(pcp->high, high); } =20 static void pageset_init(struct per_cpu_pageset *p) --=20 2.28.0