From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFB9CC04FFE for ; Fri, 17 May 2024 18:09:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E478A6B007B; Fri, 17 May 2024 14:09:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DF60D6B0083; Fri, 17 May 2024 14:09:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CBEAA6B0085; Fri, 17 May 2024 14:09:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id AA1776B007B for ; Fri, 17 May 2024 14:09:07 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 2C97AA1369 for ; Fri, 17 May 2024 18:09:07 +0000 (UTC) X-FDA: 82128674334.18.2EF3EC6 Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) by imf19.hostedemail.com (Postfix) with ESMTP id 2AA4D1A0018 for ; Fri, 17 May 2024 18:09:04 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Eh8E+4K4; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.128.47 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715969345; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S70wmijJISH2XDqb7ht0G5vJ2xhskQ6j76ewB6PF6cM=; b=aGVgbPr85jMYgDXpSth4e8Vulc+Izr9ZYWWuuW/GWaip9QnDjxwhqSWfsroZ5zHl/aiQoa JqiP4MfCY5zWB4h6ShZ+82Bh9Mhd6OFx5N76s8IP4gx9uEC6Z3NHxWNUteLeHfqDr4Lu+2 zxzskGmxv6m95no7NnFXuVy1+3D6v7M= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Eh8E+4K4; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.128.47 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715969345; a=rsa-sha256; cv=none; b=nW/zDfdxX9MGljjTjRVgZk0z1VLjhB47PtoTJ23pBfETzCEcZCY5gp9X6SHOnUJY5tIgME EHMLqOahhdD9MpaMqjS+d+oEKlZw/h8Nr7gCOqANyg8tZkHG+4Lknvhy5K8JyN/8olMqQi i86MzDXWvhRzCjU/KeM+J2ILLSVRoaY= Received: by mail-wm1-f47.google.com with SMTP id 5b1f17b1804b1-4202cea9a2fso3445945e9.3 for ; Fri, 17 May 2024 11:09:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715969343; x=1716574143; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=S70wmijJISH2XDqb7ht0G5vJ2xhskQ6j76ewB6PF6cM=; b=Eh8E+4K4CBzyeUD/P6gzXJF4mxG0IJOZD8cK8nN0x2WH1JUzHxtRzbsvGaFWKVxQgG WLxvFNEpsKIJLE/p7aq1kDxhazYLEEBoeetSD9Jw29dPO+51nXjAeOPT+IKn2fqarTfS Ua0N1TZqenfRD5kfAT4VH8JM8zcuCvPW0Zqut7+hqmOe2CVmez565RCHzZPkdtGKGjdh NwWt+05jBZx4EQxUb1vd40vuZ1RQvOHfDTRWGlt0w0xtp092kyP46NYFlxqUNRByYj44 lQiaxZooV9k7jzdY+ksEnH11299naFG2SqYjaxmua8KlcaHhV7b9RZhzuyua7WB5mPm0 wWmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715969343; x=1716574143; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=S70wmijJISH2XDqb7ht0G5vJ2xhskQ6j76ewB6PF6cM=; b=ICnTZxaOPUnnQ3Lzhm2CtlMknePbDOns7P30v8rvIs07rw6mATh6AwxsXmR48z/bhd BPyqy6yHFVC87WG2HldJ8lsbGE+SWGYYGcBYbJb/pgZiBDQpKeD6ckCttgoHnrErtD6z uyHCAnQjH2C/zYygtDBpZ609StKmRakq1JocOaSXsfv3H6u6JI4zY/oU5H3Bho3oEzmu jQsL2ynRsoXP8eWLZreRcuhQHdmQepOrG82CpHZOfWu+mLu1j9WP+pljBzfuyzMkiPLT 9yvCEITGfObJK4ibpbLzauwTXGqqjkJL587nLJIwyHB0SVfcEVKVCdjnS6xoet3xEEKM 6fng== X-Forwarded-Encrypted: i=1; AJvYcCVNRC1VoGVtkg2rjeKWFvtiMAvXBSGAVWe9Vd7DI5JtfFtmj3TneMg7sEleFzaWfJajwFt3EMmZMQy+z6TiYq7pZ3s= X-Gm-Message-State: AOJu0YxrPtQH3de4N1JjXMPlbPzdvxSDdCfa8B3fYHPryxhWPgyd0Q/O Wc6z+OjTpSqv7SpKyJ4npZ3hKdugJfB+XmF8kZtgX9PPdAU8tw80 X-Google-Smtp-Source: AGHT+IFnD/UIZ3ZG6RwuktTru2jpJebDU5Tcq74B7wHAksRTTMhHs0ay3Vr/shKBJaNwde29s5OOAQ== X-Received: by 2002:a05:600c:314e:b0:41b:4443:9e10 with SMTP id 5b1f17b1804b1-41feac63467mr151572635e9.29.1715969343368; Fri, 17 May 2024 11:09:03 -0700 (PDT) Received: from f (cst-prg-73-12.cust.vodafone.cz. [46.135.73.12]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-41f881110f9sm350488635e9.37.2024.05.17.11.08.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 May 2024 11:09:02 -0700 (PDT) Date: Fri, 17 May 2024 20:08:41 +0200 From: Mateusz Guzik To: Kairui Song Cc: "zhangpeng (AS)" , Rongwei Wang , linux-mm@kvack.org, LKML , Andrew Morton , dennisszhou@gmail.com, shakeelb@google.com, jack@suse.cz, Suren Baghdasaryan , kent.overstreet@linux.dev, mhocko@suse.cz, vbabka@suse.cz, Yu Zhao , yu.ma@intel.com, wangkefeng.wang@huawei.com, sunnanyong@huawei.com Subject: Re: [RFC PATCH v2 2/2] mm: convert mm's rss stats to use atomic mode Message-ID: References: <20240418142008.2775308-1-zhangpeng362@huawei.com> <20240418142008.2775308-3-zhangpeng362@huawei.com> <5nv3r7qilsye5jgqcjrrbiry6on7wtjmce6twqbxg6nmvczue3@ikc22ggphg3h> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 2AA4D1A0018 X-Stat-Signature: mreb7cahw7p71bzcanqru7naqjtt9qar X-HE-Tag: 1715969344-893157 X-HE-Meta: U2FsdGVkX1/yUJrvBs/2alhjh86KK+xZNHLDKLIWLg8nvXYHZfX5ERe0lshDZ2ER8gzkTds4nhvK5UtvA+g+Zyb4J3HbFQ/LqZTZaE6ncnaKIqGouZbuGmKL3HKL+IzkpDZNoekmspKL/zFmS5icBKJdnAQ3dZzjWzIqTQAtuqlegIoDOfJtyDbkObfpf69xcXueAYSMHJe2imt0AWrccO8fzb/rKhMeWmAPS9CpGaL4CpUV+Sa0djKuwLCBdef1eYMpvRabeWXozeaGSFSVfPSvvaAUCU4aq5ha37Q9xSqG0KN3aS/k63Nebno3BeeNOO0/Sh3ZyEISj58FvQx0dvPsFFIn2/0R2ZhmrA0X9ydofiwgz6oaSXPZOjpjfyETcVf7gPF5ySmX7I3S0yWUhQDMltjzT3xYNw0qWvUo9F+Qbe0IQCTL2okf3Ss76BpaFNdRDYMwM/bquCjGc4NEy+Vr0noWp64s5NHsE+t1L/zCr2qMRM8ZlJTfU5uQHOFzHzYkrLuwekvy2lYyt+4NAaUZTAZbK3LF2l3dIyAADb6Fm4H14CruujpsT+jrqHTphAZSaQYQzc7HtBiLimhqYOJwF1bR9ILcXmLTaBpmMZSDJ0IgR/3e2A5yiNSR0oWEiWIOVvhTH2xiNsbYpOQRfg/BI5SZQSjX+U7MhVBmQoRih3yRUBFDwD18KGyT0p/xxP9V3gjJMmTfpyuxqZvLpFqUSAKcapARxFRvxlubJw1gNlNRcFdzeUi/DKVals3RZLgivh3D6CkBmwfoi5cFRgvZZwNntLbJDjLMVZVY+qUx90tExYCbd65TBTakz/faLtOkwIn8X7l5JEvaE/qSm6JXB8NNpIL5tHLOyjKYYuJKAf57Fa8jUrZ/QtRNGCU54nJMx9yPZ1mP/r4lJ/a/bn4oXgcTQ6RRs5wvjD//R6Elh9ZQrNlIZ4nfuFDk+w5gOnztzmfBdmru3Lb4Pd5 Ghm6jZYA 9bdjj7ikuDuZ9ZAs3GncUme6Y+Q+1zcMv6W/S5Q1Ryzx/b0QFd9gGNOo3a+WYTXlRafJObYdUW4xXluStjBuUxgxm0CN8d2I/+Er3tr0qZeiZ/9FeIOiYAh9KPXYV6Wm6iIf1mbZqR7nLYQMCC+wvEorZ5G1uonmR9gfkAVIzr8psaOhv+AKcSixr83pKzgQOeHr0W4Hlk/RNPayj4En3XdbktX0VfG83m38kKFGBSDxKCQX75y4IpmsmbkFLx3Goiwk4JdnRI9mZupj3FQXIqcZ0wiYjkiKoNXZEZRh6/LyRk9IZsK16SUMmfG/x8nqPKp/jpN7pYdR39T00yKo7QjRiohTxVqXsuhStxZl64VqYtuQi0IhEYyTsyOuTU7a5pZn8mWaj4tqXcKGwth7uIQgurDrmamcPUnwdGucJjoDpiYLsHYD1MBLJ2QNQ5CJ5dVNe7SRRQmdj6vFtNxV2sG7oRIxg2CbLCQidcOJLrEePgwA6IRUDhta8q5qwGSLT/HD01gBMcRvG4AVmL/EhVGLp5uuy+ymDhizWH5xwN+pTLpI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, May 17, 2024 at 11:29:57AM +0800, Kairui Song wrote: > Mateusz Guzik 于 2024年5月16日周四 23:14写道: > > A part of The Real Solution(tm) would make counter allocations scale > > (including mcid, not just rss) or dodge them (while maintaining the > > per-cpu distribution, see below for one idea), but that boils down to > > balancing scalability versus total memory usage. It is trivial to just > > slap together a per-cpu cache of these allocations and have the problem > > go away for benchmarking purposes, while being probably being too memory > > hungry for actual usage. > > > > I was pondering an allocator with caches per some number of cores (say 4 > > or 8). Microbenchmarks aside I suspect real workloads would not suffer > > from contention at this kind of granularity. This would trivially reduce > > memory usage compared to per-cpu caching. I suspect things like > > mm_struct, task_struct, task stacks and similar would be fine with it. > > > > Suppose mm_struct is allocated from a more coarse grained allocator than > > per-cpu. Total number of cached objects would be lower than it is now. > > That would also mean these allocated but not currently used mms could > > hold on to other stuff, for example per-cpu rss and mcid counters. Then > > should someone fork or exit, alloc/free_percpu would be avoided for most > > cases. This would scale better and be faster single-threaded than the > > current state. > > And what is the issue with using only one CPU cache, and flush on mm > switch? No more alloc after boot, and the total (and fixed) memory > usage is just about a few unsigned long per CPU, which should be even > lower that the old RSS cache solution (4 unsigned long per task). And > it scaled very well with many kinds of microbench or workload I've > tested. > > Unless the workload keeps doing something like "alloc one page then > switch to another mm", I think the performance will be horrible > already due to cache invalidations and many switch_*()s, RSS isn't > really a concern there. > I only skimmed through your patchset. I do think it has a legitimate approach, but personally I would not do it like that due to the extra work on context switches. However, I have 0 say about this, so you will need to prod the mm overlords to get this moving forward. Maybe I was not clear enough in my opening e-mail, so I'm going to reiterate some bits: there are scalability problems in execve even with your patchset or the one which uses atomics. One of them concerns another bit which allocates per-cpu memory (the mcid thing). I note that sorting it out would possibly also take care of the rss problem, outlining an example approach above.