From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA662C47077 for ; Tue, 16 Jan 2024 22:06:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4E2D46B00A9; Tue, 16 Jan 2024 17:06:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 46BEC6B00AA; Tue, 16 Jan 2024 17:06:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E4AE6B00AB; Tue, 16 Jan 2024 17:06:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 15DEC6B00A9 for ; Tue, 16 Jan 2024 17:06:09 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id DB79D1A06C3 for ; Tue, 16 Jan 2024 22:06:08 +0000 (UTC) X-FDA: 81686558016.27.00D2239 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf24.hostedemail.com (Postfix) with ESMTP id 03C2618000F for ; Tue, 16 Jan 2024 22:06:06 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=037valm9; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf24.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705442767; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HM9DN0iEeF1TIiq3KDFYboZdxDVFyQNi4kQI4ruCO88=; b=tKEbYN+h7uEhBQuv/9E0MewaIGKuf2QWb4FxgSViS2i2L3OBNw27L35N4AIp12TYp5WPus AFusAjT8hXXKqMPxWVf3zeJA7cA8iNkiokUxiiUWvn1rXN0NifP1HWS0/YNKzUmL5+5Laa 0OBtOzXbMniPsV9I3q4PUvEEMB9KMiU= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=037valm9; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf24.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705442767; a=rsa-sha256; cv=none; b=2ltdsD2Hq2/daVPmoy+FwbnZFncTa0o+xdXsGOhx9T7slGizROC+hB3I3v+hDsLJYiyhZ/ K6TK1e4xlLNXoqkWq1EHazQD8XDOTSQkp/GO5yQTVXu0achlB3nl8Mqno5s8a1YQO9+zR9 iZfWylvV6Yl4fSCWz72pysP/T8nMMqQ= Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-6daf9d5f111so8373157b3a.0 for ; Tue, 16 Jan 2024 14:06:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1705442766; x=1706047566; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=HM9DN0iEeF1TIiq3KDFYboZdxDVFyQNi4kQI4ruCO88=; b=037valm9UFzW7YUPe1TOr5qGZBPvseefLZs871VEBxVaNm9MSWoK8LklYmqhC7jH3V j0GtSpJ+/fO8Vb8tD4nODJ9JHMm9HNWtLroLrEl4uesNh7jn6CIjYUywUZ9VZTTTRVIN 9sU7RInBh25RO7w9ERbJC4zA9kc/Q0DU92ueuqSSXZGSZnDNPsjkFDrne02XPD8+sGCh MOFISp2IXGxFHN5scw6SB2bvoVqaixz3i7vZ2I4oqJs0WEAuvUZvzno9HBrI7NPvnenw xP0eWR62yrNXVP2CImu/Dmk/PX+cph7x54xT7xxQuUqfpzu0Mwe6V95lrY1/5q59TVmO uLAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705442766; x=1706047566; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=HM9DN0iEeF1TIiq3KDFYboZdxDVFyQNi4kQI4ruCO88=; b=fd7HSWNJdxoCP0sV7qeLrpSCeCAawWZZxN/U7cSKYyjTDIPW7lpRaksDXqOihdHtMH jDhAUv2KCr8Ol6sTsVK9pX2Hs6QF4HtL9kP2TnWM9hWOp061ZmtuGNlYKH1H99r2KeGR yAQLMSZ2x3N6rUc1xzBjOHQAnhlUweb/CTVbPaVEyyz9b8AOW8Dg6pJ7LYxhtLq770Lb 4DnCYJGimk0pZwrimscqRYMdwVaiqjxgdjCownhOatd4Uj8g3NpNH7z904ubhKTWu8Lg V7E3G0pKlhj7F7bLwNzFg8lHKYUHHaGq5VOpi7cUgwETEXX4qh9Jl29rbGpdTAUG6WZ3 /zSQ== X-Gm-Message-State: AOJu0Yyor9mh9TPXV5bmQCgrOTBarlwQQC3a+ExzcVojFXZv68sDdBp8 bHQ2HRhg8MxNivfGa/fqN1fnlV2vjoxIlM4AgNL5WtvKdno= X-Google-Smtp-Source: AGHT+IGpX+0wd5fWUk01jiCT/KzgUu3LKlSR05n71p9NaZigcaGX7cAODLNuIXDEb5hUGoKwWjhk9A== X-Received: by 2002:a05:6a00:398b:b0:6db:79f6:777b with SMTP id fi11-20020a056a00398b00b006db79f6777bmr5422457pfb.33.1705442765753; Tue, 16 Jan 2024 14:06:05 -0800 (PST) Received: from dread.disaster.area (pa49-180-249-6.pa.nsw.optusnet.com.au. [49.180.249.6]) by smtp.gmail.com with ESMTPSA id p26-20020aa79e9a000000b006d96d753ca0sm73346pfq.38.2024.01.16.14.06.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Jan 2024 14:06:05 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1rPrZK-00BJZX-1V; Wed, 17 Jan 2024 09:06:02 +1100 Date: Wed, 17 Jan 2024 09:06:02 +1100 From: Dave Chinner To: Uladzislau Rezki Cc: linux-mm@kvack.org, Andrew Morton , LKML , Baoquan He , Lorenzo Stoakes , Christoph Hellwig , Matthew Wilcox , "Liam R . Howlett" , "Paul E . McKenney" , Joel Fernandes , Oleksiy Avramchenko Subject: Re: [PATCH v3 10/11] mm: vmalloc: Set nr_nodes based on CPUs in a system Message-ID: References: <20240102184633.748113-1-urezki@gmail.com> <20240102184633.748113-11-urezki@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Stat-Signature: crq7715bcgaef16c9ko68sdt4soo4npt X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 03C2618000F X-HE-Tag: 1705442766-364127 X-HE-Meta: U2FsdGVkX1804qqIJHdYleg+KrNxgRJWR9RpPWGLs7YHBi9RH2TaHDn0pVf5Qh9R5No21si5eOGJ1jbYNwRjJm8VvPMFnN7uJurly69jJVVF+z81lPjMM4M0uNi18CQfyMnnVLNSURGE9a8hv7gIKObkIpvynrKDHGH+pcrwU9b4pnf8QeVIzGUyCGgSI3P5bbYr8JfQU6rRf3JtR3q95DNOWPZjUNL+BSi4omPCZaPzLOZk5VEtzYoEkt2DcjmuYlAoho1+280TO4bChjErpSzM1pSoGG1oAgRMpkk3aVmapl3aXT4fEhmsQwZO9j/TjUzmlJ6gitQPnMbWHA8O9Bs9884wjUhkG9djn3bdxjRsdLzTv18s/x38vgOCUKlU7WFd1W/2ZuBopR0kL66mumg3t71GvLkdU+QX9NuHG5W629VO8USF5RKJrGuCshBV8513C/qgyAFC1fXcrhASp8jkRBxNQc60AIdlUHI0n85zJnDCAJRKaI9cUWZnO3iTZkCobvshfAVQyAf+UJ0H6RijBQR8AJKun2WLqToX1vkaoiwGqxFR1h2TqzzW6B5wz69e8OLfblHmNcqJhfRcQv9laLHSbAdnl1SfuZ6tr30K4TfFRg9AI1jd1uLPzRv3cvEp91wi2wzJOQp5y6adljj5drmENoPjx58rmxRPXTu7eiMU4QHM75ftNkOcxJSvJ1WYrlfTbBrFc5+9aA3oNzfvHUa4x3dmKp/L52yJ+eM0QkagFFDu35Xfit/KkXBp1OjWy6D16hwxeIhhJD2dOMMani0inuy/J3HOgolDE2UiBM1rKCPPuNcagAif6Kcz5phyaDK/rfhyNYEefnARyzH3lNOWesbYda+HMOEPEF694/kbRipCODyUQ+p33YzPci4KIDBYCs21qZ3jQ9VGCbZpKclz8xlAqq09LjaDB30WKfRTXOtea+a1UHsF9Am0mTEON3RSwI3R2+I+mIF QT9E/7ZU bF/DqY6yjTQllI35c8zJFEJWpnIZcHDZJDhkTBQVBvz63KeBSYMJ35xEgtmcvr7KGbnzVgRWJlBwnnHHiUcCIYT8NGbcrMpOogZz18bQw7u/A68/5mJ1zf+zgO+mi4/XnJxJ/pDJ3SF2t5QzJ7NU9eTeFLCb0RSLfGgLyoeOayjF/oX3w4+O7HkmY35fP+J5BbWR0ysJVcDIJWcPCa9P9J+gSqSBdVrLzDka112eImb7CPP6wvvASZz4YFmB2njb7li/no3Zyx/arIG+aJfb4Rgqw80/JgrAWn47BFHgfxhLriuyZUT1xMwwsmM6TJpEyLAs4o4B0+6JcML04wpa8kNSjvymnlgSHMpxJVWlV9en9BtxarrQdd8qnz2GkvmFYfiACAAT8akmxMh/wyANN0VFV3EwGfturhYBvxW1XlBLrXEWMkJivJJxYtBWPL6e7SszeL2107rZdr4xLWcEiY7R/x+z78HtmvKWFUYx8+PzAkGcwDIZvNDCyDBxruLWEtALAyvYEaCn7iu0hgSXbeF7t0HnNaU6AMMZqpzC8JHiygUA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jan 15, 2024 at 08:09:29PM +0100, Uladzislau Rezki wrote: > > On Tue, Jan 02, 2024 at 07:46:32PM +0100, Uladzislau Rezki (Sony) wrote: > > > A number of nodes which are used in the alloc/free paths is > > > set based on num_possible_cpus() in a system. Please note a > > > high limit threshold though is fixed and corresponds to 128 > > > nodes. > > > > Large CPU count machines are NUMA machines. ALl of the allocation > > and reclaim is NUMA node based i.e. a pgdat per NUMA node. > > > > Shrinkers are also able to be run in a NUMA aware mode so that > > per-node structures can be reclaimed similar to how per-node LRU > > lists are scanned for reclaim. > > > > Hence I'm left to wonder if it would be better to have a vmalloc > > area per pgdat (or sub-node cluster) rather than just base the > > number on CPU count and then have an arbitrary maximum number when > > we get to 128 CPU cores. We can have 128 CPU cores in a > > single socket these days, so not being able to scale the vmalloc > > areas beyond a single socket seems like a bit of a limitation. > > > > > > Hence I'm left to wonder if it would be better to have a vmalloc > > area per pgdat (or sub-node cluster) rather than just base the > > > > Scaling out the vmalloc areas in a NUMA aware fashion allows the > > shrinker to be run in numa aware mode, which gets rid of the need > > for the global shrinker to loop over every single vmap area in every > > shrinker invocation. Only the vm areas on the node that has a memory > > shortage need to be scanned and reclaimed, it doesn't need reclaim > > everything globally when a single node runs out of memory. > > > > Yes, this may not give quite as good microbenchmark scalability > > results, but being able to locate each vm area in node local memory > > and have operation on them largely isolated to node-local tasks and > > vmalloc area reclaim will work much better on large multi-socket > > NUMA machines. > > > Currently i fix the max nodes number to 128. This is because i do not > have an access to such big NUMA systems whereas i do have an access to > around ~128 ones. That is why i have decided to stop on that number as > of now. I suspect you are confusing number of CPUs with number of NUMA nodes. A NUMA system with 128 nodes is a large NUMA system that will have thousands of CPU cores, whilst above you talk about basing the count on CPU cores and that a single socket can have 128 cores? > We can easily set nr_nodes to num_possible_cpus() and let it scale for > anyone. But before doing this, i would like to give it a try as a first > step because i have not tested it well on really big NUMA systems. I don't think you need to have large NUMA systems to test it. We have the "fakenuma" feature for a reason. Essentially, once you have enough CPU cores that catastrophic lock contention can be generated in a fast path (can take as few as 4-5 CPU cores), then you can effectively test NUMA scalability with fakenuma by creating nodes with >=8 CPUs each. This is how I've done testing of numa aware algorithms (like shrinkers!) for the past decade - I haven't had direct access to a big NUMA machine since 2008, yet it's relatively trivial to test NUMA based scalability algorithms without them these days. -Dave. -- Dave Chinner david@fromorbit.com