From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B026C433EF for ; Mon, 28 Mar 2022 13:51:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9262C8D0002; Mon, 28 Mar 2022 09:51:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8D5118D0001; Mon, 28 Mar 2022 09:51:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 775FB8D0002; Mon, 28 Mar 2022 09:51:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0021.hostedemail.com [216.40.44.21]) by kanga.kvack.org (Postfix) with ESMTP id 680748D0001 for ; Mon, 28 Mar 2022 09:51:49 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 214BA8E2D3 for ; Mon, 28 Mar 2022 13:51:49 +0000 (UTC) X-FDA: 79293933138.24.C63A042 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 6E798A0031 for ; Mon, 28 Mar 2022 13:51:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1648475508; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7kNKLf8yfEf9VObJjS6Szdmj1nSgLd4zL/Xd+OpmbEc=; b=Fse/Xl7LHjFnfG/id6cF38fiOwDV3Wj/6zmjLLQCw/S3A8gImuA6AkBoC9WNSSweeTpfZg tvhIRL8dV5G7DFguZnY9ijPXUhoDeeX37+5uSZbI85e7L6pVp5EHepjXpNzXlydLp4nNxg 9IRc4TWU9SI+yQB9OGSlzi63NQ8b/Ww= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-607-QI-a0-EQOEKJs0tlwfME6g-1; Mon, 28 Mar 2022 09:51:46 -0400 X-MC-Unique: QI-a0-EQOEKJs0tlwfME6g-1 Received: by mail-wm1-f71.google.com with SMTP id r64-20020a1c2b43000000b0038b59eb1940so16541wmr.0 for ; Mon, 28 Mar 2022 06:51:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=7kNKLf8yfEf9VObJjS6Szdmj1nSgLd4zL/Xd+OpmbEc=; b=nIjqa1DqA7uYdoXq6LGq1vpHzmm5yiPWnn/xNXOA5P1yy7pMc7qJp/KeSjwMzG7JSX E3UAPyOk3gZ0NwI2l5VR3aBVAuqUjN2DJgwtI1oU8zmGWwmAvKW3BfF6G0lZIOKdRlys J+cICbW0fcoLYGkU6QTewxMgyna7/NemtD7IUoiRHSnq4HMW7ZFuSypZ5rwk0b6zqCag z6LJPSTpg+JCy+CFVZSK2jAVvN5IIm63JQLjN+DHAj6v99NoseOc3BOAinR3/S0N2CxZ rYbcfUc2lpPsOQMMaIoUlpgv3Wy+FKsQQV75lyb6+fuvxQGTnBf8mqL4S1YmDeKyKI1h oeFw== X-Gm-Message-State: AOAM530qPg/35OwKPr7E2GOY5AwYSJw71k/aSBb0u7x/bpUA2VMqWCLs 4rb15WQbeaIE5WlWtecRuGRMNoi/FSMKvdJaiVizp5VzcBuCvhxuYo6BYT6MNEET8KtPbETMJus 4ffIV8zT+/E8= X-Received: by 2002:a05:600c:3c9d:b0:37f:a5c3:fccf with SMTP id bg29-20020a05600c3c9d00b0037fa5c3fccfmr26326992wmb.13.1648475505475; Mon, 28 Mar 2022 06:51:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwDqmAiCMutmOITS8J5yRNisNHw6matYR2Mqlf/hrWAc0SWh4ga/RYFaM088TWMjvLA8ZIJ9A== X-Received: by 2002:a05:600c:3c9d:b0:37f:a5c3:fccf with SMTP id bg29-20020a05600c3c9d00b0037fa5c3fccfmr26326947wmb.13.1648475505195; Mon, 28 Mar 2022 06:51:45 -0700 (PDT) Received: from ?IPv6:2a0c:5a80:3506:3400:69b5:c807:1d52:ff67? ([2a0c:5a80:3506:3400:69b5:c807:1d52:ff67]) by smtp.gmail.com with ESMTPSA id n8-20020a5d5988000000b00203d5f1f3e4sm13491929wri.105.2022.03.28.06.51.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Mar 2022 06:51:44 -0700 (PDT) Message-ID: Subject: Re: [PATCH 0/2] mm/page_alloc: Remote per-cpu lists drain support From: Nicolas Saenz Julienne To: Mel Gorman Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, frederic@kernel.org, tglx@linutronix.de, mtosatti@redhat.com, linux-rt-users@vger.kernel.org, vbabka@suse.cz, cl@linux.com, paulmck@kernel.org, willy@infradead.org Date: Mon, 28 Mar 2022 15:51:43 +0200 In-Reply-To: <20220325104800.GI4363@suse.de> References: <20220208100750.1189808-1-nsaenzju@redhat.com> <20220303114550.GE4363@suse.de> <3c24840e8378c69224974f321ec5c06a36a33dd3.camel@redhat.com> <20220325104800.GI4363@suse.de> User-Agent: Evolution 3.42.4 (3.42.4-1.fc35) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Fse/Xl7L"; spf=none (imf15.hostedemail.com: domain of nsaenzju@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=nsaenzju@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 6E798A0031 X-Stat-Signature: 1bdexgggacxz5fh7zfqaor5kkso6656e X-HE-Tag: 1648475508-184234 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Mel, On Fri, 2022-03-25 at 10:48 +0000, Mel Gorman wrote: > > [1] It follows this pattern: > >=20 > > struct per_cpu_pages *pcp; > >=20 > > pcp =3D raw_cpu_ptr(page_zone(page)->per_cpu_pageset); > > // <- Migration here is OK: spin_lock protects vs eventual pcplist > > // access from local CPU as long as all list access happens through = the > > // pcp pointer. > > spin_lock(&pcp->lock); > > do_stuff_with_pcp_lists(pcp); > > spin_unlock(&pcp->lock); > >=20 >=20 > And this was the part I am concerned with. We are accessing a PCP > structure that is not necessarily the one belonging to the CPU we > are currently running on. This type of pattern is warned about in > Documentation/locking/locktypes.rst >=20 > ---8<--- > A typical scenario is protection of per-CPU variables in thread context= :: >=20 > struct foo *p =3D get_cpu_ptr(&var1); >=20 > spin_lock(&p->lock); > p->count +=3D this_cpu_read(var2); >=20 > This is correct code on a non-PREEMPT_RT kernel, but on a PREEMPT_RT ke= rnel > this breaks. The PREEMPT_RT-specific change of spinlock_t semantics doe= s > not allow to acquire p->lock because get_cpu_ptr() implicitly disables > preemption. The following substitution works on both kernels:: > ---8<--- >=20 > Now we don't explicitly have this pattern because there isn't an > obvious this_cpu_read() for example but it can accidentally happen for > counting. __count_zid_vm_events -> __count_vm_events -> raw_cpu_add is > an example although a harmless one. >=20 > Any of the mod_page_state ones are more problematic though because we > lock one PCP but potentially update the per-cpu pcp stats of another CP= U > of a different PCP that we have not locked and those counters must be > accurate. But IIUC vmstats don't track pcplist usage (i.e. adding a page into the l= ocal pcplist doesn't affect the count at all). It is only when interacting wit= h the buddy allocator that they get updated. It makes sense for the CPU that adds/removes pages from the allocator to do the stat update, regardless o= f the page's journey. > It *might* still be safe but it's subtle, it could be easily accidental= ly > broken in the future and it would be hard to detect because it would be > very slow corruption of VM counters like NR_FREE_PAGES that must be > accurate. What does accurate mean here? vmstat consumers don't get accurate data, o= nly snapshots. And as I comment above you can't infer information about pcpli= st usage from these stats. So, I see no real need for CPU locality when upda= ting them (which we're still retaining nonetheless, as per my comment above), = the only thing that is really needed is atomicity, achieved by disabling IRQs= (and preemption on RT). And this, even with your solution, is achieved through= the struct zone's spin_lock (plus a preempt_disable() in RT). All in all, my point is that none of the stats are affected by the change= , nor have a dependency with the pcplists handling. And if we ever have the nee= d to pin vmstat updates to pcplist usage they should share the same pcp struct= ure. That said, I'm happy with either solution as long as we get remote pcplis= t draining. So if still unconvinced, let me know how can I help. I have acc= ess to all sorts of machines to validate perf results, time to review, or even t= o move the series forward. Thanks! --=20 Nicol=C3=A1s S=C3=A1enz