From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 603ACECAAD3 for ; Wed, 14 Sep 2022 14:15:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD6C28D0002; Wed, 14 Sep 2022 10:15:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A85828D0001; Wed, 14 Sep 2022 10:15:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 94C358D0002; Wed, 14 Sep 2022 10:15:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 87A808D0001 for ; Wed, 14 Sep 2022 10:15:14 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 5BAA71C6029 for ; Wed, 14 Sep 2022 14:15:14 +0000 (UTC) X-FDA: 79910888148.29.264525E Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf11.hostedemail.com (Postfix) with ESMTP id 72BD24008F for ; Wed, 14 Sep 2022 14:15:13 +0000 (UTC) Received: by mail-pj1-f50.google.com with SMTP id fs14so14706571pjb.5 for ; Wed, 14 Sep 2022 07:15:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastly.com; s=google; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:from:to:cc:subject:date; bh=Fo6+GEpLuzEa2C/BIMz3Nj9vBo1YPW/XzxuoQnVqp/A=; b=H6N/ovIZoLE0y5wnmEzlCIr1OyzJJa6sooNhhXVOxrWKu/9S+oay6RlSJ6vHRqDcjD XJ12wl1OCHkDdGtAUyWTubazxIa+BpZOlSVIXhy1AmKuWNNs/sejGnUAynbLiRQwixRJ MrXCt+FGheaBFqRqwWajpbTITyimyTxC2RCdE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date; bh=Fo6+GEpLuzEa2C/BIMz3Nj9vBo1YPW/XzxuoQnVqp/A=; b=QCUG6mcY0N+CLK/AqN4SAhnTVN5T15yVipBrTYLWEg6B6w4RTPoJ8gRySotlD2OmsN xPATfTpRaBDUo0kuZGWqryue6q48Bw5bSfICtyC6JITJ+gD3TOTUUvjHlwcGr9eD0uct +noBukIfzV0jmsZXFqT47DLckVNlsptdfoV+TWX2bdw+TmzDuvo7reCOqZiiM52S6WEB sXe39FZUCGTyEUsMtEC1EYlNO2yLT0JpyRDy3oMfJiwy2TkvGzCPqSPyoaNmMjCgrZ4V iKpyLLjlUrpZ9mTmJjRqDtTm0VZb4YREC+brperfXGC98DxBjv4yEWvu4r/eHIFK41MB xJLg== X-Gm-Message-State: ACrzQf0VvLVsA57X8ECpB9T/AVv3Kgr2MVvFiwOu4t0/USbwKVDeH3ia hHhuIY43MPcIV6MH122ss/wUnw== X-Google-Smtp-Source: AMsMyM5mWwjLf2UkfbJqu70601uDNKEj6jsNzEl/wEpzPaasp6U0g3jAcjuOcytWRw4tOx9dR1a1oA== X-Received: by 2002:a17:90a:c782:b0:202:c73e:5488 with SMTP id gn2-20020a17090ac78200b00202c73e5488mr5023101pjb.202.1663164912285; Wed, 14 Sep 2022 07:15:12 -0700 (PDT) Received: from fastly.com (c-73-223-190-181.hsd1.ca.comcast.net. [73.223.190.181]) by smtp.gmail.com with ESMTPSA id n2-20020a170902d2c200b00176b63535adsm10958949plc.260.2022.09.14.07.15.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 14 Sep 2022 07:15:11 -0700 (PDT) Date: Wed, 14 Sep 2022 07:15:08 -0700 From: Joe Damato To: Dave Hansen Cc: x86@kernel.org, linux-mm@kvack.org, Dave Hansen , Andy Lutomirski , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [RFC 1/1] mm: Add per-task struct tlb counters Message-ID: <20220914141507.GA4422@fastly.com> References: <1663120270-2673-1-git-send-email-jdamato@fastly.com> <1663120270-2673-2-git-send-email-jdamato@fastly.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=fastly.com header.s=google header.b="H6N/ovIZ"; spf=pass (imf11.hostedemail.com: domain of jdamato@fastly.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=jdamato@fastly.com; dmarc=pass (policy=reject) header.from=fastly.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1663164913; a=rsa-sha256; cv=none; b=YDC9IHOc0juVlUKbk2CIMRyjmxE9zlphNJ0c3mKk//I1vY1/qE/dYJ+l4X5uE3V2LDPV+o +CSGyaRAI47QKJy9lxDaMzlIxn+a9UXEVlJ2O1TK10AirVdWoSaS7WK8ytTzBaeNlFdE5y 3O0NgGX0XzdbjkEkAEw0zIPBlcyDr6Y= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1663164913; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Fo6+GEpLuzEa2C/BIMz3Nj9vBo1YPW/XzxuoQnVqp/A=; b=xElHP2BiZRNt7I/a6rYh4sg2xGGmBPhNgc2fd9ygaS6O418438U+mbszoAyg4Ubx179VBO c6HADcIM0AMhmi19HTaO33cpP2VETGjRKiZ43HeYEz243zXc4Jq/JQlKisVSIs570WdU11 oDXdbQIMioVRr/KhhUqLvsslEAEEub4= X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=fastly.com header.s=google header.b="H6N/ovIZ"; spf=pass (imf11.hostedemail.com: domain of jdamato@fastly.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=jdamato@fastly.com; dmarc=pass (policy=reject) header.from=fastly.com X-Rspamd-Server: rspam04 X-Stat-Signature: f8hctjkxgw6zp1n1k7cycsqpa8sdkhe4 X-Rspamd-Queue-Id: 72BD24008F X-HE-Tag: 1663164913-482312 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 14, 2022 at 12:40:55AM -0700, Dave Hansen wrote: > On 9/13/22 18:51, Joe Damato wrote: > > TLB shootdowns are tracked globally, but on a busy system it can be > > difficult to disambiguate the source of TLB shootdowns. > > > > Add two counter fields: > > - nrtlbflush: number of tlb flush events received > > - ngtlbflush: number of tlb flush events generated > > > > Expose those fields in /proc/[pid]/stat so that they can be analyzed > > alongside similar metrics (e.g. min_flt and maj_flt). > > On x86 at least, we already have two other ways to count flushes. You > even quoted them with your patch: > > > count_vm_tlb_event(NR_TLB_REMOTE_FLUSH); > > + current->ngtlbflush++; > > if (info->end == TLB_FLUSH_ALL) > > trace_tlb_flush(TLB_REMOTE_SEND_IPI, TLB_FLUSH_ALL); > > Granted, the count_vm_tlb...() one is debugging only. But, did you try > to use those other mechanisms? For instance, could you patch > count_vm_tlb_event()? I tried to address this in my cover letter[1], but the count_vm_tlb_event are system-wide, AFAICT. This is useful, certainly, but it's difficult to know how many TLB shootdowns are being generated by which tasks without finer granularity. The goal was to try to account these events on a per-task basis. I could patch count_vm_tlb... to account on a per-task basis. That seems reasonable to me... assuming you and others are convinced that it's a better approach than tracepoints ;) > Why didn't the tracepoints work for you? Tracepoints do work; but IMHO the trouble with tracepoints in this case is: - You need to actually be running perf to gather the data at the right time; if you stop running perf too soon, or if the TLB shootdown storm is caused by some anomalous event when you weren't running perf... you are out of luck. - On heavily loaded systems with O(10,000) or O(100,000) tasks, perf tracepoint data is hard to analyze, events can be dropped, and significant resources can be consumed. In addition to this, there is existing tooling on Linux for scraping /proc/[pid]/stat for graphing/analysis/etc. IMO, possibly an easier way to debug large TLB shootdowns on a system might be (using a form of this patch): 1. Examine /proc/[pid]/stat to see which process or processes are responsible for the majority of the shootdowns. Perhaps you have a script scraping this data at various intervals and recording deltas. 2. Now that you know the timeline of the events, which processes are responsible, and the magnitude of the deltas... perf tracepoints can help you determine when and where exactly they occur. What do you think? > Can this be done in a more arch-generic way? It's a shame to > unconditionally add counters to the task struct and only use them on > x86. If someone wanted to generalize the x86 tracepoints, or make them > available to other architectures, I think that would be fine even if > they have to change a bit (queue the inevitable argument about > tracepoint ABI). I'm not sure; maybe if I tweaked count_vm_tlb then I suppose if archs other than x86 support count_vm_tlb in the future, they would automatically get support for this. > P.S. I'm not a fan of the structure member naming. Fair enough; I was inspired by nvcsw and nivcsw :) but if you think that this worth pursuing, I'll use more clear names in the future. Thanks for taking a look!