From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEA19C4338F for ; Thu, 5 Aug 2021 18:48:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0036360F3A for ; Thu, 5 Aug 2021 18:48:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 0036360F3A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 4ED3F8D0001; Thu, 5 Aug 2021 14:48:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 49D296B0071; Thu, 5 Aug 2021 14:48:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 38C2B8D0001; Thu, 5 Aug 2021 14:48:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0164.hostedemail.com [216.40.44.164]) by kanga.kvack.org (Postfix) with ESMTP id 1EDB86B006C for ; Thu, 5 Aug 2021 14:48:12 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id B6FB6181316E8 for ; Thu, 5 Aug 2021 18:48:11 +0000 (UTC) X-FDA: 78441911982.08.8C52D7F Received: from mail-vk1-f169.google.com (mail-vk1-f169.google.com [209.85.221.169]) by imf27.hostedemail.com (Postfix) with ESMTP id 7766A70009D2 for ; Thu, 5 Aug 2021 18:48:11 +0000 (UTC) Received: by mail-vk1-f169.google.com with SMTP id l6so1444090vkr.8 for ; Thu, 05 Aug 2021 11:48:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=5yKttjT3yNxI19PIbG21BPiRMyZUxrNNy+NbdpW7/go=; b=QSmCmMG4tvLPx+Iw7yEpb0oav/I/KEcPkCjUR2fqN+zahvfdvopJlc27br4HaCEcWI 1JeldoVLjIIuJnMDupdoQrvfTdt0wBoqFcsf9sGnXENDtRceDOZ+yLR88vhwgnoA/9IX aQGUatGHL8GIiLiXhwg2v8s6aNXNUWiU0Dfet6rk9cFewJTyA7SlJn8QpCj5idEmZW++ UBiEYHbUVTfjOOSqrmKxsd9cd0TRi5T7b5RbTCMNo2b4p8z26U79GKQyrqvE4XBwO/DW tpRN+dRx6YgOn/yClyeKxmSf3arrggk4SYQaqow8o0VN+alncY71/l9ToODvJ0WEuMGR y8Og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5yKttjT3yNxI19PIbG21BPiRMyZUxrNNy+NbdpW7/go=; b=qD9owfLl/Vuk1EPcy3Hg6N4WQbaco7FvnNzRVd9VbvxuKAAZw5gFsuHrkSo9Lce3Hi +WCxHayeQA9LtDn7KaFEysiRhwIs0QOSDg8Z4IFBHQ3b18ELvr4CchsfLJwFfxHSAEI5 KtrMRsGeClELJ56xNDgE6w1XDDhnEl1MPAAMXA1CTscifz582VACm4vhGrYfk1QjNnei 6I99y43SQytmLFySJSvQN73T6zMxSmVCv+vBAT5N0PQcm9w4BRD9FQABYaxLH5pZ+Iry /FEb9fTJEpYqp787HI111JaciKXGVx96JEPvUDZ4KOBh7MzPEIr21aPmMFTPhujurpLa ZAmA== X-Gm-Message-State: AOAM531gPwA0/L4Ggmgq/GrlN0BXAHv/VCA3IyRP9UadvhaS4CW8r3cU n80EoGBiogb5xEGSV0UDiVRGMwTjM4fjPrM30Kg= X-Google-Smtp-Source: ABdhPJzrkeoJ7TD/kLzweOoXyRt/SDOvs7O8zGfNYbEbWMgg7xftbrR8kMAzh6juak8vDw4Mlr+N0eqm9j84G/2qrD8= X-Received: by 2002:a1f:3f17:: with SMTP id m23mr5302859vka.7.1628189290720; Thu, 05 Aug 2021 11:48:10 -0700 (PDT) MIME-Version: 1.0 References: <20210723100034.13353-1-mgorman@techsingularity.net> <20210723100034.13353-3-mgorman@techsingularity.net> <87czqu2iew.ffs@tglx> <20210804095425.GA6464@techsingularity.net> <91b2f893-eb6a-d91d-3769-baba8601b0f6@suse.cz> <20210804142306.GE6464@techsingularity.net> <87h7g4123u.ffs@tglx> <20210805140458.GF6464@techsingularity.net> In-Reply-To: <20210805140458.GF6464@techsingularity.net> From: Daniel Vacek Date: Thu, 5 Aug 2021 20:47:58 +0200 Message-ID: Subject: Re: [PATCH 2/2] mm/vmstat: Protect per cpu variables with preempt disable on RT To: Mel Gorman Cc: Thomas Gleixner , Vlastimil Babka , Andrew Morton , Ingo Molnar , Hugh Dickins , Linux-MM , Linux-RT-Users , LKML , Peter Zijlstra Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7766A70009D2 Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=QSmCmMG4; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of neelxg@gmail.com designates 209.85.221.169 as permitted sender) smtp.mailfrom=neelxg@gmail.com X-Stat-Signature: xrwkrc8x1ujt76uh3o4uhqf7bi6rr4qj X-HE-Tag: 1628189291-59313 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Mel. On Thu, Aug 5, 2021 at 4:48 PM Mel Gorman wrote: > > On Thu, Aug 05, 2021 at 02:56:53PM +0200, Thomas Gleixner wrote: > > On Wed, Aug 04 2021 at 15:23, Mel Gorman wrote: > > Mel, > > > > > On Wed, Aug 04, 2021 at 03:42:25PM +0200, Vlastimil Babka wrote: > > >> The idea was not build-time, but runtime (hidden behind lockdep, VM_DEBUG or > > >> whatnot), i.e.: > > >> > > >> what that code needs is switch(item) { case foo1: case foo2: > > >> lockdep_assert_irqs_disabled(); break; case bar1: case bar2: > > >> lockdep_assert_preempt_disabled(); lockdep_assert_no_in_irq(); break; } or > > >> something along those lines > > >> > > > Ok, that would potentially work. It may not even need to split the stats > > > into different enums. Simply document which stats need protection from > > > IRQ or preemption and use PROVE_LOCKING to check if preemption or IRQs > > > are disabled depending on the kernel config. I don't think it gets rid > > > of preempt_disable_rt unless the API was completely reworked with entry > > > points that describe the locking requirements. That would be tricky > > > because the requirements differ between kernel configurations. > > > > Right. This won't get rid of the preempt disabling on RT, but I think we > > should rather open code this > > > > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > > preempt_dis/enable(); > > > > instead of proliferating these helper macros which have only one user left. > > > > Ok, that is reasonable. I tried creating a vmstat-specific helper but the > names were misleading so I ended up with the patch below which open-codes > it as you suggest. The comment is not accurate because "locking/local_lock: > Add RT support" is not upstream but it'll eventually be accurate. > > Is this ok? > > --8<-- > From e5b7a2ffcf55e4b4030fd54e49f5c5a1d1864ebe Mon Sep 17 00:00:00 2001 > From: Ingo Molnar > Date: Fri, 3 Jul 2009 08:30:13 -0500 > Subject: [PATCH] mm/vmstat: Protect per cpu variables with preempt disable on > RT > > Disable preemption on -RT for the vmstat code. On vanila the code runs > in IRQ-off regions while on -RT it may not when stats are updated under > a local_lock. "preempt_disable" ensures that the same resources is not > updated in parallel due to preemption. > > This patch differs from the preempt-rt version where __count_vm_event and > __count_vm_events are also protected. The counters are explicitly "allowed > to be to be racy" so there is no need to protect them from preemption. Only > the accurate page stats that are updated by a read-modify-write need > protection. This patch also differs in that a preempt_[en|dis]able_rt > helper is not used. As vmstat is the only user of the helper, it was > suggested that it be open-coded in vmstat.c instead of risking the helper > being used in unnecessary contexts. > > Signed-off-by: Ingo Molnar > Signed-off-by: Thomas Gleixner > Signed-off-by: Mel Gorman > --- > mm/vmstat.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 48 insertions(+) > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index b0534e068166..2c7e7569a453 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -319,6 +319,16 @@ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item, > long x; > long t; > > + /* > + * Accurate vmstat updates require a RMW. On !PREEMPT_RT kernels, > + * atomicity is provided by IRQs being disabled -- either explicitly > + * or via local_lock_irq. On PREEMPT_RT, local_lock_irq only disables > + * CPU migrations and preemption potentially corrupts a counter so > + * disable preemption. > + */ > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_disable(); > + > x = delta + __this_cpu_read(*p); > > t = __this_cpu_read(pcp->stat_threshold); > @@ -328,6 +338,9 @@ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item, > x = 0; > } > __this_cpu_write(*p, x); > + > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_enable(); > } > EXPORT_SYMBOL(__mod_zone_page_state); > > @@ -350,6 +363,10 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item, > delta >>= PAGE_SHIFT; > } > > + /* See __mod_node_page_state */ __mod_zone_page_state? > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_disable(); > + > x = delta + __this_cpu_read(*p); > > t = __this_cpu_read(pcp->stat_threshold); > @@ -359,6 +376,9 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item, > x = 0; > } > __this_cpu_write(*p, x); > + > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_enable(); > } > EXPORT_SYMBOL(__mod_node_page_state); > > @@ -391,6 +411,10 @@ void __inc_zone_state(struct zone *zone, enum zone_stat_item item) > s8 __percpu *p = pcp->vm_stat_diff + item; > s8 v, t; > > + /* See __mod_node_page_state */ ditto. > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_disable(); > + > v = __this_cpu_inc_return(*p); > t = __this_cpu_read(pcp->stat_threshold); > if (unlikely(v > t)) { > @@ -399,6 +423,9 @@ void __inc_zone_state(struct zone *zone, enum zone_stat_item item) > zone_page_state_add(v + overstep, zone, item); > __this_cpu_write(*p, -overstep); > } > + > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_enable(); > } > > void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item) > @@ -409,6 +436,10 @@ void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item) > > VM_WARN_ON_ONCE(vmstat_item_in_bytes(item)); > > + /* See __mod_node_page_state */ "" > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_disable(); > + > v = __this_cpu_inc_return(*p); > t = __this_cpu_read(pcp->stat_threshold); > if (unlikely(v > t)) { > @@ -417,6 +448,9 @@ void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item) > node_page_state_add(v + overstep, pgdat, item); > __this_cpu_write(*p, -overstep); > } > + > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_enable(); > } > > void __inc_zone_page_state(struct page *page, enum zone_stat_item item) > @@ -437,6 +471,10 @@ void __dec_zone_state(struct zone *zone, enum zone_stat_item item) > s8 __percpu *p = pcp->vm_stat_diff + item; > s8 v, t; > > + /* See __mod_node_page_state */ ... > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_disable(); > + > v = __this_cpu_dec_return(*p); > t = __this_cpu_read(pcp->stat_threshold); > if (unlikely(v < - t)) { > @@ -445,6 +483,9 @@ void __dec_zone_state(struct zone *zone, enum zone_stat_item item) > zone_page_state_add(v - overstep, zone, item); > __this_cpu_write(*p, overstep); > } > + > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_enable(); > } > > void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item) > @@ -455,6 +496,10 @@ void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item) > > VM_WARN_ON_ONCE(vmstat_item_in_bytes(item)); > > + /* See __mod_node_page_state */ and one more time here --nX > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_disable(); > + > v = __this_cpu_dec_return(*p); > t = __this_cpu_read(pcp->stat_threshold); > if (unlikely(v < - t)) { > @@ -463,6 +508,9 @@ void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item) > node_page_state_add(v - overstep, pgdat, item); > __this_cpu_write(*p, overstep); > } > + > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_enable(); > } > > void __dec_zone_page_state(struct page *page, enum zone_stat_item item)