From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFB0DC001B0 for ; Mon, 7 Aug 2023 08:39:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5B6946B0074; Mon, 7 Aug 2023 04:39:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 565B66B0075; Mon, 7 Aug 2023 04:39:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4553C8D0001; Mon, 7 Aug 2023 04:39:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 37FD26B0074 for ; Mon, 7 Aug 2023 04:39:50 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 06EB4407F1 for ; Mon, 7 Aug 2023 08:39:49 +0000 (UTC) X-FDA: 81096660540.24.C531370 Received: from mail-vk1-f181.google.com (mail-vk1-f181.google.com [209.85.221.181]) by imf14.hostedemail.com (Postfix) with ESMTP id 2E073100003 for ; Mon, 7 Aug 2023 08:39:47 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=FM53P8h7; spf=pass (imf14.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.221.181 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691397588; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BPzZTlICzLYEHMNh2iGs7nTLC2Ztxao8NLG8OMNUOK0=; b=1GG7sk7FLj1VHcLAtTf46HeEj6eUcPE7S+/131a39fOFBc9lOk1P5G4cDoftmbULHFpzAW kW5o8sPZod6deYuQgieLZTFLGP6rvOdAO34fI5IxjE8LQIdr6uda3U6lDREDGTnAr4oGLp LY1u6rz49Bw6O8I91+i5qvTl8yOKuDg= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=FM53P8h7; spf=pass (imf14.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.221.181 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691397588; a=rsa-sha256; cv=none; b=ehZoOJIfx8AV1HQnP8Pf+0bmlgFW4wY3oQHRksiEtbpx0C2YjesaytxW0Lu8RF8hFNm0XR UsrYzE11hjAXLJY3VVSblBLainfThV6ZrHtsgRcGS8h39GoXB8FQPbLmocU5FXWa+St3CU xNsI+FUNiLGpyO3yhxnNh6mXcc6ew3w= Received: by mail-vk1-f181.google.com with SMTP id 71dfb90a1353d-48735dd1b98so505797e0c.2 for ; Mon, 07 Aug 2023 01:39:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691397587; x=1692002387; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=BPzZTlICzLYEHMNh2iGs7nTLC2Ztxao8NLG8OMNUOK0=; b=FM53P8h7ve9is3BOG/5+xvjnPTfH5lB0YQcOp3ZovLoZKpjTgwypoT9loSaVciRU4V +P0bzq1sFOpMZCn66RaRdve2DTdRn47tBO+7nlmCcY7GHBV96ISMSWhef3k26Gy7Mcfp AqPBxGhncfT8VH91Y+JagWTYuQ+veYPbZI1I0kXUjF8HRs8A8cD/1znwNCoZNE3rDJ4l 7zGtBVY7Ah+t38pnjZF+B8oVo+FKOd7cKl/UYgEfvo3HLpy2KDJcUikhuPtVIDJ8pJK9 JMC0L15Snj6ToehTMzpjJaAg31eQy9LLS48Q4RRn8rFMBrLMztV7efxHsC9EzXRSkEkl 1Y7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691397587; x=1692002387; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BPzZTlICzLYEHMNh2iGs7nTLC2Ztxao8NLG8OMNUOK0=; b=kn2d0sCArF5NM4vQKPa3mlLOZ6zVDVX9foTGU07jmLjcblDXLkPbxhQ4tr/xeLCfTX JwMIcwYc3m9Tefy2rhJtAGY9fwRS5TkB70wfwOfb9cE8+IWEoescRObxzYksZITKY5Gb 6TOyv/qOcsMBQPuuPCcvQLAJNiL6hUchoMFRmfrY9Hf1asiyINiUO5k96gWYHBPrV/k6 fTF88pUA9CnrsiboXFQjg23W9bPcigC2V1qbRHlZT2u4NVEhX0AvlEJ3/x0FVa4BThwy cxnQpmoAtAPPczMpLEQgudbceO/uSINslzHwzmfK0lLkWnJgeYTLtpkoHYy7UAQvkIkv 6HZQ== X-Gm-Message-State: AOJu0YyMN9KD5CbpuvpT+/MIrQ+ZU8bLHqlqvOXFhBQJiezpQ/fWAm8l NEEGGVmP6Pweiws5cQbsymwPJglC4BefSkX65JA= X-Google-Smtp-Source: AGHT+IHL5u3t52XLQEfHjY0HO8ytpXExG0xmtQIZn6rLE5AItd5HTGIYn8AxhWWp1BT4sRC/YFbTfh8JxD1CjlWRyQs= X-Received: by 2002:a1f:3d95:0:b0:485:e984:779e with SMTP id k143-20020a1f3d95000000b00485e984779emr3965774vka.2.1691397587074; Mon, 07 Aug 2023 01:39:47 -0700 (PDT) MIME-Version: 1.0 References: <20230723190906.4082646-1-42.hyeyoo@gmail.com> <20230723190906.4082646-3-42.hyeyoo@gmail.com> <1f88aff2-8027-1020-71b2-6a6528f82207@suse.cz> In-Reply-To: <1f88aff2-8027-1020-71b2-6a6528f82207@suse.cz> From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Mon, 7 Aug 2023 17:39:35 +0900 Message-ID: Subject: Re: [RFC 2/2] mm/slub: prefer NUMA locality over slight memory saving on NUMA machines To: Vlastimil Babka Cc: Christoph Lameter , Pekka Enberg , Joonsoo Kim , David Rientjes , Andrew Morton , Roman Gushchin , Feng Tang , "Sang, Oliver" , Jay Patel , Binder Makin , aneesh.kumar@linux.ibm.com, tsahu@linux.ibm.com, piyushs@linux.ibm.com, fengwei.yin@intel.com, ying.huang@intel.com, lkp , "oe-lkp@lists.linux.dev" , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 2E073100003 X-Rspam-User: X-Stat-Signature: qoee7hapd6i73xbwmx4qjubi1ndj3njk X-Rspamd-Server: rspam01 X-HE-Tag: 1691397587-597349 X-HE-Meta: U2FsdGVkX1+1yt8zcVzwPNG2j7i8z/u7yNZH1ZOZefnHqg3X3ZzsaBSEs5ea8HUU1Z7Idsu//nNm3K6MLEEDiJiN97N8RdskkJODPyB5bF8i/s8lvU7S+xaLwtOu/GIkPk7c5iPP/9c9etrWY7ovPKsw4syhSfRHVGRvZNxJl/0C+zANhwDMSrlUkmFkQb8vRY1fvfpBz38rX8/yof5+hWJCxkc+jdG2dTBQuSZAtXFF+/qEJ+q9iON+JHdERhWxgNhv8HQV530ZbuPrX4WRARFSku6TlAO5Zen6sFO5oPskLTAqARGbZwE+1otxJgCjWyQX50vPbhY5FHpfvLwMO9Lj/aTmuej5MJxEQ8L75Ytbwz4m5cPzIyhy7B16+v1i8OrakDQKjhaiClvYA3sMIsq7+5ivaJnsqhNvpiUQGZ7N1S26WJavFWq2QP1Bb4PojjlWRnJsRsdLqA3V1z9HnF4w8ODmK6LrHOATrD27HCcKxwnHCqjgBg9xKJ1pQ/Vuq5D0hXWwj0AOgXhdK2egczaxVWcUclFqXwNY50Wg43wXz+mKl0Jj7CkvG9wL5dI0XRYJk5b8cWUhsHJK2lN1KQcd787jShR0b3XjqPFZcHN8cD+Ly7NPw/nK1ZrA/DcpeJDUQTJC7fIoyO2BByxaltDPtaHCtfqBhgqz1MV5RnhWCSp+ZEdyG6mXGnlRoBy/onzQ3CzrLTNoSnGYedMFR/Vb3hDTJZJIt7+g6U6Et2gHgdM07QAC9HSNG6HqroCUxWwfZQ5JyiilMyJwsfWAm+9n00WoLSKhYqF5qZXicNvFENOibIOvZkTkW4PL5a8idX917SxtFETGLYFO00DWdOmrJEZzhLeVjSTfirznPUsL0r2H01S+DxHv3kC+BTn3cfCaFGbLz+yUIoM9P+L9NKC5/c28XcmgGVuuxMnu2H2utQ/IaI0iwWU25L7C8ajlFwJIwpCeoGYpFprs38v Gv2NrXoV sePlg0I5wJ5hIHzkwoAbCZt2obTWcKeRy4g08vH3KJaovpdLyHXtpeUEQhJ7/9pJAy/P1LQk1yumS2r2A74cULOzpt2BSJTUgkGpxr8HRUaexEVwjz/jAm+Sj2VXPzSeNLBtohr4Fm60hmQ9/BhnywfxxlXxed1jY2mAVg4XrxqtdK10zp2VuztoLJCvokfodwcVSOXz6FFwbh4BTAzzNSVR58qyl8C1Qq2KYNJdwSDGS8sodf9/uB0/G3sOUkDkyI+FAOJYhLUtVDgspy5ug9ojgsxW5O3t6qdPI7J8SgfGex+cqg6gtEFXMZtN6yjEGHb884LPCOxObhcOyJ1HivsD8xiO4C4KGKKbjEP592x+RtEphh6NgPmL/L6kJ40DNmp6Mmx6FwupVtZPj5+Hw4/9NoSt+sZuH2hWjXEyfZx1fhAg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Aug 3, 2023 at 11:54=E2=80=AFPM Vlastimil Babka wr= ote: > > On 7/23/23 21:09, Hyeonggon Yoo wrote: > > By default, SLUB sets remote_node_defrag_ratio to 1000, which makes it > > (in most cases) take slabs from remote nodes first before trying alloca= ting > > new folios on the local node from buddy. > > > > Documentation/ABI/testing/sysfs-kernel-slab says: > >> The file remote_node_defrag_ratio specifies the percentage of > >> times SLUB will attempt to refill the cpu slab with a partial > >> slab from a remote node as opposed to allocating a new slab on > >> the local node. This reduces the amount of wasted memory over > >> the entire system but can be expensive. > > > > Although this made sense when it was introduced, the portion of > > per node partial lists in the overall SLUB memory usage has been decrea= sed > > since the introduction of per cpu partial lists. Therefore, it's worth > > reevaluating its overhead on performance and memory usage. > > > > [ > > XXX: Add performance data. I tried to measure its impact on > > hackbench with a 2 socket NUMA machine. but it seems hackbench i= s > > too synthetic to benefit from this, because the skbuff_head_cache= 's > > size fits into the last level cache. > > > > Probably more realistic workloads like netperf would benefit > > from this? > > ] > > > > Set remote_node_defrag_ratio to zero by default, and the new behavior i= s: > > 1) try refilling per CPU partial list from the local node > > 2) try allocating new slabs from the local node without reclamati= on > > 3) try refilling per CPU partial list from remote nodes > > 4) try allocating new slabs from the local node or remote nodes > > > > If user specified remote_node_defrag_ratio, it probabilistically tries > > 3) first and then try 2) and 4) in order, to avoid unexpected behaviora= l > > change from user's perspective. > > It makes sense to me, but as you note it would be great to demonstrate > benefits, because it adds complexity, especially in the already complex > ___slab_alloc(). Networking has been indeed historically a workload very > sensitive to slab performance, so seems a good candidate. Thank you for looking at it! Yeah, it was a PoC for what I thought "oh, it might be useful" and definitely I will try to measure it. > We could also postpone this until we have tried the percpu arrays > improvements discussed at LSF/MM. Possibly, but can you please share your plans/opinions on it? I think one possible way is simply to allow the cpu freelist to be mixed by objects from different slabs if we want to minimize changes, Or introduce a per cpu array similar to what SLAB does now. And one thing I'm having difficulty understanding is - what is the mind behind/or impact of managing objects on a slab basis, other than avoiding array queues in 2007?