From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 933DFC54EBE for ; Mon, 16 Jan 2023 18:17:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D2BCF6B0073; Mon, 16 Jan 2023 13:17:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CDA766B0074; Mon, 16 Jan 2023 13:17:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BA2796B0075; Mon, 16 Jan 2023 13:17:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id AA96C6B0073 for ; Mon, 16 Jan 2023 13:17:17 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 576DC40809 for ; Mon, 16 Jan 2023 18:17:17 +0000 (UTC) X-FDA: 80361469314.26.4546659 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf28.hostedemail.com (Postfix) with ESMTP id 18609C0017 for ; Mon, 16 Jan 2023 18:17:14 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BNHuzRYX; spf=pass (imf28.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673893035; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2rYVIy1kB09an+EtRfzUSHQAOZ0Qm1/8bViAryaPMZc=; b=qo+h7W2sDs3cgmy9JVXaphGG5CGWAyzNdNTe84LQpIA/laNtL4+jjumGoWkiBDOOES7xEp OIfeO5qH8yyiQdGdeJXg674CCXbWwmpHuAdlHIkHzvPqhW9vR0aRhdx7prPPUapNOQGgyE slOo0okvcn9srm9tjC+vfFobFCVv3SM= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BNHuzRYX; spf=pass (imf28.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673893035; a=rsa-sha256; cv=none; b=a+6qr9stS0kwIk87LROzGmTHbs6PDvaaH8AE/SWXABtWW8rLjbCNcHVm+YJRnzL0l/hbTJ 4nrln/j+Nj9qTYkG/1+qPgIICnIFW2C/2xNc23Z2SmDi8f6DWq+e4pQEa/5RZV2YqZVnrV jbBqAhw/TXJxEnIEM7oxmLlImm4iNDs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673893034; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2rYVIy1kB09an+EtRfzUSHQAOZ0Qm1/8bViAryaPMZc=; b=BNHuzRYXim/UA4PocwE1kyx8k+HnHqpMUavFa/NmXv9Qq8hsqG27GMy8/gETnxQV+N5efF BqUxfQUzT4E5Lkvg6S4IDpG8xiyut7sYVFeDGl80dEoRqVzQ78DQaTsTvX7ZCyiHQ7qDd4 ZMmhPZX+fOkn4t2gsZoV1bctOsVlgJ8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-612-kQ123Hv8Mj6P2K1n6Onu3w-1; Mon, 16 Jan 2023 13:17:11 -0500 X-MC-Unique: kQ123Hv8Mj6P2K1n6Onu3w-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0D6B4101A521; Mon, 16 Jan 2023 18:17:11 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 62B701121314; Mon, 16 Jan 2023 18:17:10 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 4C4F5404265AA; Mon, 16 Jan 2023 13:11:40 -0300 (-03) Date: Mon, 16 Jan 2023 13:11:40 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Frederic Weisbecker , atomlin@atomlin.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v13 2/6] mm/vmstat: Use vmstat_dirty to track CPU-specific vmstat discrepancies Message-ID: References: <20230105125218.031928326@redhat.com> <20230105125248.813825852@redhat.com> <7c2af941-42a9-a59b-6a20-b331a4934a3@gentwo.de> <60183179-3a28-6bf9-a6ab-8a8976f283d@gentwo.de> <24ca2aad-54b2-2c3a-70b5-49a33c9a33@gentwo.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <24ca2aad-54b2-2c3a-70b5-49a33c9a33@gentwo.de> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 18609C0017 X-Stat-Signature: oknjyxidqssauzxhs3xpn1tejxi793mh X-Rspam-User: X-HE-Tag: 1673893034-267883 X-HE-Meta: U2FsdGVkX18hFaexyWw7Vj8MVkRJFU0R6k2/hjuBJ8qru8DGuSCB2AYz/BsSRCmbwvWJknY0Sa5AL09YlNebZts9qPnVJWWtLgwx1RKXcuSqe35GWQt/SaqEZvyHWfJArI/RCiFwVFzzEbH78F63f9ksDpyydijQkvZcwYn5SAQ/YUgcdIAopk1Glho3jLpq+8uk8vOw2/WJvFbmOLeqZDaStsegFNyHTKY0XSW0LLu5k8lsixOdkfpLyKrCClKr3uoPOE3UE6vM0hH9R1nsUrKSSWNLnGyiAFsprW/IvVsDATTxgRQYtHsW43AaT+74AymW8vx4AYCtQy8crljJm3yREA7ZK9wlHvzDmKe8D8IghW1hEN6UTFH0bFV1sp4CrSXRz9sH6iqHZK0R6nJ9uRwpCp+At0CBPZHplCst0ApGFxv8GxM1lR2UE1PhSKWgwqiFZSj+7VwHqajfSkoLR5n6rOOC/Z8SDBLEz0BiZluHJLBy8yF58iNnlml2BlytVchMi6+iL6vhFeXlHCIpV36pIEYAkwVonULQ80TTB5IZnT7id0xzy/gMQyAthwQIsUiLVDV7Ad7YtGqFVaw1jzQveMfyxq+LO/K0iaiSRmsf0FsbQSlNPEmBjpt5n2LEPuCI+/NRR5eh8yloXeNzXs4Erx9GXyX1w8yLI7kcpcV0v/BsL8YUBBHQPMIHGOeyycL30701AtK5bdYdlbRuaJudtFpXYRtdnNe3xP6O4LeKe/rATpCPntsqSGXy7WT3kQVckCTgjwnrTAKde4AWWtEnYdI0QNKR29wz5O3p3KlYZNTt4yHCXTmMakHHRKxbwLLnOY8OOAKguOHb04k1fE8C3XSKNZmde+MmM9AMZc+1yK9pr0Rss2IdO6kVa0JydyUx/vtHuNDRUN3besFHfI0Pq0ysKDl+X8nxkVrTiCA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 16, 2023 at 10:51:40AM +0100, Christoph Lameter wrote: > On Wed, 11 Jan 2023, Marcelo Tosatti wrote: > > > OK, can replace this_cpu operations with this_cpu_ptr + standard C operators > > (and in fact can do that for interrupt disabled functions as well, that > > is CONFIG_HAVE_CMPXCHG_LOCAL not defined). > > > > Is that it? > > No that was hyperthetical. > > I do not know how to get out of this dilemma. We surely want to keep fast > vmstat operations working. Honestly, to me, there is no dilemma: * There is a requirement from applications to be uninterrupted by operating system activities. Examples include radio access network software, software defined PLCs for industrial automation (1). * There exists vm-statistics counters (which count the number of pages on different states, for example, number of free pages, locked pages, pages under writeback, pagetable pages, file pages, etc). To reduce number of accesses to the global counters, each CPU maintains its own delta relative to the global VM counters (which can be cached in the local processor cache, therefore fast). The per-CPU deltas are synchronized to global counters: 1) If the per-CPU counters exceed a given threshold. 2) Periodically, with a low frequency compared to CPU events (every number of seconds). 3) Upon an event that requires accurate counters. * The periodic synchronization interrupts a given CPU, in case it contains a counter delta relative to the global counters. To avoid this interruption, due to [1], the proposed patchset synchronizes any pending per-CPU deltas to global counters, for nohz_full= CPUs, when returning to userspace (which is a very fast path). Since return to userspace is a very fast path, synchronizing per-CPU counter deltas by reading their contents is undesired. Therefore a single bit is introduced to compact the following information: does this CPU contain any delta relative to the global counters that should be written back? This bit is set when a per-CPU delta is increased. This bit is cleared when the per-CPU deltas are written back to the global counters. Since for the following two operations: modify per-CPU delta (for current CPU) of counter X by Y set bit (for current CPU) indicating the per-CPU delta exists "current CPU" can change, it is necessary to disable CPU preemption when executing the pair of operations. vmstat operations still perform their main goal which is to maintain accesses local to the CPU when incrementing the counters (for most of counter modifications). The preempt_disable/enable pair is also a per-CPU variable. Now you are objecting to this patchset because: It increases the number of cycles to execute the function to modify the counters by 6. Can you mention any benchmark where this increase is significant? By searching for mod_zone_page_state/mode_node_page_state one can see the following: the codepaths that call them are touching multiple pages and other data structures, so the preempt_enable/preempt_disable pair should be a very small contribution (in terms of percentage) to any meaningful benchmark. > The fundamental issue that causes the vmstat discrepancies is likely that > the fast this_cpu ops can increment the counter on any random cpu and that > this is the reason you get vmstat discrepancies. Yes. > Give up the assumption that an increment of a this_cpu counter on a > specific cpu means that something occurred on that specific cpu. Maybe > that will get you on a path to resolve the issues you are seeing. But it can't. To be able to condense the information "does a delta exist on this CPU" from a number of cacheline reads to a single cacheline read, one bit can be used. And the write to that bit and to the counters is not atomic. Alternatives: 1) Disable periodic synchronization for nohz_full CPUs. 2) Processor instructions which can modify more than one address in memory. 3) Synchronize the per-CPU stats remotely (which increases per-CPU and per-node accesses).