From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1D8EC3ABC0 for ; Thu, 8 May 2025 13:34:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F36B96B0085; Thu, 8 May 2025 09:34:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EE5996B0089; Thu, 8 May 2025 09:34:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DAE596B008A; Thu, 8 May 2025 09:34:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id BB5D66B0089 for ; Thu, 8 May 2025 09:34:00 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A427D1603FE for ; Thu, 8 May 2025 13:34:02 +0000 (UTC) X-FDA: 83419833924.09.38C3FE6 Received: from out-180.mta1.migadu.com (out-180.mta1.migadu.com [95.215.58.180]) by imf27.hostedemail.com (Postfix) with ESMTP id 8101E40008 for ; Thu, 8 May 2025 13:34:00 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=MbY+G5hP; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf27.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.180 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746711241; a=rsa-sha256; cv=none; b=BCuPjPOaEwzYz4LfIIIJTKZ8aFKz0B0Ptk2IsJfAT7LwWbFYf3qz+IVBHl9hP3EGxRCA5W vDi3QjXTyT587yX5pyqcJ7625VQ1v7yVuTzrdTsNZI+IrVOc8+yM7Fot3XOT7yYQ/kWf9u JYmy9aCF44utNGGEny6z2pi7k60AFfQ= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=MbY+G5hP; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf27.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.180 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746711241; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=O6UnM0lzDqdDWIJP8LzrGKGnouE7W/3Fbjl8iEI9USs=; b=xWbyNvPeGsCwxHWgbG/4ddnZcwPAJazUIMAwimrOgphzgQP5WcwGDc1bTvx0sp6xw41Xp+ DEj+JQVplzfwB6D4yIwIce97RXQlzrt5buSQGJKa/7vnPD1N+wR6jppkqseEzTjQOtFgyY 8RWod+zs8OPZ39tHkFn6gV1mIgsevok= Date: Thu, 8 May 2025 09:33:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1746711237; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=O6UnM0lzDqdDWIJP8LzrGKGnouE7W/3Fbjl8iEI9USs=; b=MbY+G5hPQHyNM2PvNFuZCdQZNzO+lf3M6ePwJoDdE7FXme0aN1iEWbbSYoeOcSnLQtdWfB xIDNZSjFHb7EJyqNfGC1B8y3qSlWkrOeeL2gSlCrWi2IQPyJ7mFIH9KYrcLtvRP3coYeXE ZmpEh6pYV3TO1DpW9ia2ZxIPhlTlIkY= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: David Wang <00107082@163.com> Cc: Suren Baghdasaryan , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] alloc_tag: avoid mem alloc and iter reset when reading allocinfo Message-ID: References: <20250507175500.204569-1-00107082@163.com> <289b58f1.352d.196addbf31d.Coremail.00107082@163.com> <1ed4c8f7.3e12.196adf621a2.Coremail.00107082@163.com> <52tsrapmkfywv4kkdpravtfmxkhxchyua4wttpugihld4iws3r@atfgtbd5wwhx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 8101E40008 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: pp4x7z3jnkwcfh5t9osomz1hxe5xuoe6 X-HE-Tag: 1746711240-929489 X-HE-Meta: U2FsdGVkX1+5qgyUcYOuwBtus1OUVquLrAfUZ3Xzf2f8dWF6YcafUyYy+zlj0Yj/M7qSviQ54l0xVxdK13GLl9qKj7SRhTa0gP7m+LCYZ/TAkA+LybgCuzLGT4hi4bNm355axxNPDwDe6cX32hPHzeeeTgF5dSGNzvMcilahWMniHrvApACUzJVsYbXSWV0allRBljsSAVwhSoS0LHNP0AyKVfEPiEd8HdNBUIK84GFkSJ+dT6CP7dodTEk47BS/lCy+tV1MW16vVGihTJEGEQlHQyD9bdPL+im/5P3cb0cLcghxLpNIBYLBrh13iP6B4fACKl4ER4jmfSSIQ6QfwzL8ekCeqDeq+DEGNF9RmGqZNRrmFvUQi96kCeQOoavN8zzVYxAC85v9h+Mu4dRy1wC9MGkyaNI25Cn/cdFRK9wyG4IdhDCe5Jqe30qED7bqp7CoHd5arnf/KyzFjMOuGnfVgli5ZUj1d3YTqcpYjVoL0ZjhX4G5ZYmamhMWPLhPUKbm9CUV7Yqw+KYUzPBHQIwJYDx5G+lIkyGz31nPN0h3An/hSpJ3zyjF0Oi0OSvGw/cg32PrtQpbwxHcOTLZaGbOzUj1adO5/Jpuh1x4KxfPJWMHrUHd8uGsQ9m254sbVyyQkJmskzcqb3KjqAHt/aTflgsMfxb+uHwx57IECp4FEimlSBWB0Nyer+B+sMsk7sCLxWrPtXNF0HZsmfWK4VGImTB6BC2PBoChTwVIACtHzl+LzlXAe2X+ITlrtURTYt6CW6iluSYja5CelAp89qKEzVKzTfS41sRrWX3UzhpC4h/nZateT7CzpneR6zBMkV5LZ8Ek7fdBTTCFdQoFjZQnSTh81DgyKh3ByLCI+/PGqFBq+JLIF8v+yaam36th5VCTYP3DNsfj4oT8tewUA/F9ioY16dJpqsCcURlvz8piglr3RBIHDoEINfXuJGCvNXghygtjgbou2fxEmMF w+CXtWlL b4jNCMYupaGMdakWUaZktQIwYMTIhCGemutOQbHYomyV1fO3VIEmjwvfIsKzcOXPDTINdgujCfR5x9eOxPe57Ko6L622YpANas3DsiO2Vh+eYUV4buho39hpfqTKLqBrvBnpBY7lI5+5MKRrc5nvoSPWueN6u7I0ysGn1URUX3NOYOeJd9OvYX2nneF3vzfGMQbxvvax2uA6sf2KGAgDEdJWp1/g87ovVJgI1rzMOn0hBzs4NpQoqH9emMVYrgajf3Mlj+TVKCgIYsnY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, May 08, 2025 at 01:51:48PM +0800, David Wang wrote: > At 2025-05-08 12:07:40, "Kent Overstreet" wrote: > >Another thing to note is that memory layout - avoiding pointer chasing - > >is hugely important, but it'll almost never show up as allocator calls. > > > >To give you some examples, mempools and biosets used to be separately > >allocated. This was mainly to make error paths in outer object > >constructors/destructors easier and safer: instead of keeping track of > >what's initialized and what's not, if you've got a pointer to a > >mempool/bioset you call *_free() on it. > > > >(People hadn't yet clued that you can just kzalloc() the entire outer > >object, and then if the inner object is zeroed it wasn't initialized). > > > >But that means you're adding a pointer chase to every mempool_alloc() > >call, and since bioset itself has mempools allocating bios had _two_ > >unnecessary pointer derefs. That's death for performance when you're > >running cache cold, but since everyone benchmarks cache-hot... > > > >(I was the one who fixed that). > > > >Another big one was generic_file_buffered_read(). Main buffered read > >path, everyone wants it to be as fast as possible. > > > >But the core is (was) a loop that walks the pagecache radix tree to get > >the page, then copies 4k of data out to userspace (there goes l1), then > >repeats all that pointer chasing for the next 4k. Pre large folios, it > >was horrific. > > > >Solution - vectorize it. Look up all the pages we're copying from all at > >once, stuff them in a (dynamically allocated! for each read!) vector, > >and then do the copying out to userspace all at once. Massive > >performance gain. > > > >Of course, to do that I first had to clean up a tangled 250+ line > >monstrosity of half baked, poorly thought out "optimizations" (the worst > >spaghetti of gotos you'd ever seen) and turn it into something > >manageable... > > > >So - keep things simple, don't overthink the little stuff, so you can > >spot and tackle the big algorithmic wins :) > I will keep this in mind~! :) > > And thanks for the enlightening notes~!! > > Though I could not quite catch up with the first one, I think I got > the point: avoid unnecessary pointer chasing and keep the pointer > chasing as short(balanced) as possible~ To illustrate - DRAM latency is 30-70n. At 4GHz, that's 120-280 cycles, and a properly fed CPU can do multiple instructions per clock - so a cache miss all the way to DRAM can cost you hundreds of instructions. > The second one, about copy 4k by 4k, seems quite similar to seq_file, > at least the "4k" part, literally. seq_file read() defaults to alloc > 4k buffer, and read data until EOF or the 4k buffer is full, and > start over again for the next read(). > > One solution could be make changes to seq_file, do not stop until user > buffer is full for each read. kind of similar to your second note, in > a sequential style, I think. > > If user read with 128K buffer, and seq_file fill the buffer 4k by > 4k, it would only need ~3 read calls for allocinfo. (I did post a > patch for seq_file to fill user buffer, but start/stop still happens > at 4k boundary , so no help for > the iterator rewinding when read /proc/allocinfo yet. > https://lore.kernel.org/lkml/20241220140819.9887-1-00107082@163.com/ ) > The solution in this patch is keeping the iterator alive and valid > cross read boundary, this can also avoid the cost for each start > over. The first question is - does it matter? If the optimization is just for /proc/allocinfo, who's reading it at a high enough rate that we care? If it's only being used interactively, it doesn't matter. If it's being read at a high rate by some sort of profiling program, we'd want to skip the text interface entirely and add an ioctl to read the data out in a binary format. The idea of changing seq_file to continue until the user buffer is full - that'd be a good one, if you're making changes that benefit all seq_file users.