From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85799C369AB for ; Thu, 24 Apr 2025 17:26:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E340E6B00BE; Thu, 24 Apr 2025 13:26:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DE3F56B00D0; Thu, 24 Apr 2025 13:26:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C37396B00D1; Thu, 24 Apr 2025 13:26:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A35B86B00BE for ; Thu, 24 Apr 2025 13:26:52 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 795DDBF6A6 for ; Thu, 24 Apr 2025 17:26:53 +0000 (UTC) X-FDA: 83369617506.16.FA962D4 Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com [209.85.208.52]) by imf15.hostedemail.com (Postfix) with ESMTP id 6E405A000B for ; Thu, 24 Apr 2025 17:26:51 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jQ6HZfZr; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745515611; a=rsa-sha256; cv=none; b=TpNMXkl4xuPpXxsPoHKYsBkte+c8v+IBpBDHEFmMZYai6Yf9Hkt7hxFcpz0qtH1FnmcqZ9 FshCWMRdk4ww/T8xBPZfpBPSLUYLP+qHKI8lHppfwcaUQtZiQ69Zag8jgPPue/YyF7H1wo ZsXY9CJhr1SWAzJNfL4OLbgO06y+uJo= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jQ6HZfZr; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745515611; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wIiZhFy3LxuX7Gvw4Izh6ggx4TutWUOG1hKDXZ1mM30=; b=MlTi0F95/vQ8Ssr24HOUeWbnIn1ZNWCknul8+CgrIOpRHRcXuFqW2JsQ+s9zlkpz6UUvcq iYWrLazmmLNYtUHCXIck4RV+pIfouLADG18sHrRn+co6ZLrOSqVd5c8StDb+SuE+SfAJIC zeb4yG8UpdNXaKNTnIiKEGjXqZe1+to= Received: by mail-ed1-f52.google.com with SMTP id 4fb4d7f45d1cf-5e61d91a087so2100937a12.0 for ; Thu, 24 Apr 2025 10:26:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1745515610; x=1746120410; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=wIiZhFy3LxuX7Gvw4Izh6ggx4TutWUOG1hKDXZ1mM30=; b=jQ6HZfZrNTlQyo58Mr1UrrS45+dWxFHGGIbDNV8dOikRLnWBuHxe6lGKBX0QCK4+Kl aoNUGJG/gl8yAFANVUhhCSMxRzAiZSYCSYwRl8Tiv22zH25gA1gCD7KBJ/e2uD7JqyBR 26MOXy0VwB2aZyYA7XQMztZ02QDTYp9AcFXwCFQLkyeWltfjkYIW6lJ0p45X/PVf8SpP 3zkLVaOb7vgW2XVnpYQhffbe2izIk+8SseXlp00dlnBSFjuVlEj4TlCNoqgVgSfARyPI spe9lZzG9EmCKrXGx5BShXADPJ43gQdsi6tCo5ySR0vRVXtkOEWrQOeuQraQMNceC9+8 iriA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745515610; x=1746120410; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wIiZhFy3LxuX7Gvw4Izh6ggx4TutWUOG1hKDXZ1mM30=; b=DdeIWZMWWI/MKpQsep1w5EhJ0U7wE9PIVDgnaUHaCFCDhUdkxN4zvQd3fZbxqPlDTC 0P8EjEDhekOcPTSD3RfMw3Cshe4EMir4D0pGPzdRUP5e+EkwfH1b1PBklFgF/l8Ob8Uk vt00xojVB5TxVcCJwMFHPlNLqMKtmpkYW1Sx+MEPynAzFEKAM2rYk5ZDBjmtfkhjSS6p jkyFYpjuiUWLM8/ubss9Bhne7biz/9I/hZfB1jGKwIVjMgMXNIhWRdqKfmUsanGsEWrA xoCjquOkR8zuYqL0z+2pxFS4MqOtnMdjXuYmtnLr409HTwzLyWzsqqiytJrGGSwu4BIn zbtA== X-Forwarded-Encrypted: i=1; AJvYcCW9+mGVB+vBvw85X+Uw+7uiNC8lhRqcEX4MyT7GVX6Dneoj4lWcuMVViKCBQ1y2Z82ESMbyKy/FHw==@kvack.org X-Gm-Message-State: AOJu0YztjKpOcPqJ05axywdlKEG4tHJ/L+aw5qs9FjvlCt7a5oF8TRID 4/R/md+ltZMoJWvCPIyoI3r/W7ADsE9Eg+YcHOW5szYo4x/w7VCS8w5l/r9PkX2YM6ACQgLVYfp fPULoM5kRLfQsithOy2EcaKyhNgw= X-Gm-Gg: ASbGnctE+XoLGXci9sg0v8e8bdYJmLIs9VaApS2R+neyEFja4VQ0GUMSE4xhbRkfHsx voUSmn+EsGxSLNUTqdmGYJ23ksKjfoV/9YaB2mI80xB15VfhURJPQdAxVUXwaFSW73BUX8MXxbi yosNm5RvVrVi3R1sc0an5u X-Google-Smtp-Source: AGHT+IEGOMR+vkqbrr1aAnKdjcsgbPhgb/w0LxGlhNTkXPi4SgqxARsBt3AVG3NQTKJE+6WeiTKNkw4pRe9SmtykSJM= X-Received: by 2002:a05:6402:3481:b0:5ec:939e:a60e with SMTP id 4fb4d7f45d1cf-5f6fab0aa66mr249451a12.0.1745515609608; Thu, 24 Apr 2025 10:26:49 -0700 (PDT) MIME-Version: 1.0 References: <20250424080755.272925-1-harry.yoo@oracle.com> <80208a6c-ec42-6260-5f6f-b3c5c2788fcd@gentwo.org> In-Reply-To: From: Mateusz Guzik Date: Thu, 24 Apr 2025 19:26:35 +0200 X-Gm-Features: ATxdqUFYWN-cd_DMErkV7CJkSwlUQKtICdN_JoevuiBqeeRvsQR6ZYn-MTzoDb0 Message-ID: Subject: Re: [RFC PATCH 0/7] Reviving the slab destructor to tackle the percpu allocator scalability problem To: "Christoph Lameter (Ampere)" Cc: Harry Yoo , Vlastimil Babka , David Rientjes , Andrew Morton , Dennis Zhou , Tejun Heo , Jamal Hadi Salim , Cong Wang , Jiri Pirko , Vlad Buslov , Yevgeny Kliteynik , Jan Kara , Byungchul Park , linux-mm@kvack.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 6E405A000B X-Stat-Signature: rjbqyk1di44jawzzwtwaoa4apc7q9nbn X-Rspam-User: X-HE-Tag: 1745515611-677791 X-HE-Meta: U2FsdGVkX1+N5TMfGRec8OwS0ba3xFVyFJJJOtEIo5MjGBScTt2KxFgI2Lc7MoU9NMfse4wkPcKCvPt2QIBWSmRotPa8gpe81SlAkWq6HqFSldXlfCna6VZT9G32R0h/S1ewfPkzujnKTjub2acVxHM7hnh4Z/Nv/83mujgv4R3gGQlLzJ9/AevKyFeoG1jdI3fPSoRGZzOxzd7qjc/hRouMeMnQKYX1erlWRFbwvnrX39n4/Y56/T0pPZLj9rd9Avh40xjGZq+w54rxOmkTDCh4QXj6us9QvRf0luRqwh2q5FX4rp2oarhaSzCaDlBD4Y99biFzDJBwlipJBd0P8ZLs2JOM/xFW996pPK2ngOkx6XAWJNIvCwY/wGt6/ZnadM/skFNfErl0VUVZjSsTMQ+KV5IGOjnecDVVAxrVOrpHYFtA6DCbYsqKTSOOTuuVf2Nvv4UmiRdz49fO+6oceXZB0QsTvK4PSahLda84QDDHAQwoCc6IkSGK2Xgh33CJNUD76gs5eLzyTNdSG9ZaNmhXL3vHwOs4msC1LV7Bn+ufXGjrX1gA9rpgLv5u3ycW5XhwndecVazAiPQLsGux4MHVGWUgn+iwdWSCRdNLHofeGwbvR2raOhCBD6bw8dg0voRgbeNR8MgCGBTh4Uu3RUwr1Ye973/kB4kSPu8rRQ+cOU+qjgCDV7yslSjmA4fZZuK7T9OHiextND8Tgn3QcTxfXKQKVVKGIG59SETMxuoDMtm7H+TuJjyQ/3GngKvnNIfc5xnlM+LxVmdY+RMmylMyP+MOF1ff/LDhOD6wHX3YcRk0EtMsXirAAU0qlhIgqrf3fRnwERUEreGiXv+z9hjUtMVnEcReURdsAkFDlbzhebRZ0scnXpN0fei3cETpAPNEE4UKdxgOAnBxU2eevLSnFS9UEImvGyJGRYz7Mpq2MbmOnhH9VjFNgKOy8gPpBjC9EMCQAr9Z52YA9Wv 17Vt22va 0s0M+TSLD58Jgd4qHGLcByCNX/B6LTATioRRAclZd2eb/luUzIK1W8acnW50ocZBry1oEiUvBwPTTlP6rg79XESf2zzcT5eDPl7KInqnUNWEjV0hYLL0Bt7ufNYXU2++tiXDaFFDkYh8t8vuShz8vLu+sXl+J3k3Yy3DbM4nYbaAEXPIOZVt8Yl2Qon0RYXFk4TSzKgnovmoA82eutWtjICyAF6eCGREN1wActks4Tr9T/GPwhvvzSU5yJysNroGPtCrOBkoWsE1xHSg49UvMakpZwEoFlLg3LIDvjttY1Ec4ijIyIhjU4OT9JQRBEd8EydcF0UpLoxGb2QFsoSWlG/UTC9sym9nKIS1DLT8OioHIIGub+WocVufsI2t3306jMOgUiUPkAy0FpieQUUnB4ewq+vHyqf4Vsf/NPaSEwiViWpSXciz7lnNCEiKb6dbw9zxjc6xPvHwCg4lZcPJpuaCdBaoMEatAUVHF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 24, 2025 at 6:39=E2=80=AFPM Christoph Lameter (Ampere) wrote: > > On Thu, 24 Apr 2025, Mateusz Guzik wrote: > > > > You could allocate larger percpu areas for a batch of them and > > > then assign as needed. > > > > I was considering a mechanism like that earlier, but the changes > > needed to make it happen would result in worse state for the > > alloc/free path. > > > > RSS counters are embedded into mm with only the per-cpu areas being a > > pointer. The machinery maintains a global list of all of their > > instances, i.e. the pointers to internal to mm_struct. That is to say > > even if you deserialized allocation of percpu memory itself, you would > > still globally serialize on adding/removing the counters to the global > > list. > > > > But suppose this got reworked somehow and this bit ceases to be a probl= em. > > > > Another spot where mm alloc/free globally serializes (at least on > > x86_64) is pgd_alloc/free on the global pgd_lock. > > > > Suppose you managed to decompose the lock into a finer granularity, to > > the point where it does not pose a problem from contention standpoint. > > Even then that's work which does not have to happen there. > > > > General theme is there is a lot of expensive work happening when > > dealing with mm lifecycle (*both* from single- and multi-threaded > > standpoint) and preferably it would only be dealt with once per > > object's existence. > > Maybe change the lifecyle? Allocate a batch nr of entries initially from > the slab allocator and use them for multiple mm_structs as the need > arises. > > Do not free them to the slab allocator until you > have too many that do nothing around? > > You may also want to avoid counter updates with this scheme if you only > count the batchees useed. It will become a bit fuzzy but you improve scal= ability. > If I get this right this proposal boils down to caching all the state, but hiding the objects from reclaim? If going this kind of route, perhaps it would be simpler to prevent direct reclaim on mm objs and instead if there is memory shortage, let a different thread take care of them? --=20 Mateusz Guzik