From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09828C001DE for ; Thu, 10 Aug 2023 17:54:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5D6CC6B0071; Thu, 10 Aug 2023 13:54:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 587F56B0072; Thu, 10 Aug 2023 13:54:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 427996B0075; Thu, 10 Aug 2023 13:54:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2F9116B0071 for ; Thu, 10 Aug 2023 13:54:32 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E1B6AC10C9 for ; Thu, 10 Aug 2023 17:54:31 +0000 (UTC) X-FDA: 81108944742.20.1579D53 Received: from mail-vs1-f44.google.com (mail-vs1-f44.google.com [209.85.217.44]) by imf07.hostedemail.com (Postfix) with ESMTP id E8BB840024 for ; Thu, 10 Aug 2023 17:54:29 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=BrBti8jv; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.217.44 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691690070; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HsnXtkAbt5StrHbvkKhoLnPviOgP7ErAacnJfDUmcXw=; b=GpAS71WQeTVWeAypUkCyKCAQYE4caurNOpOKAB933BJHIVc/TqG44kwBWCpRTbJNY184cs iZ49h0gPEoXGbQAnrRIRlvHvVg6jQ083xsIzNJRRWEUnhuOI3pSkmBE99OU2op/vZEqXDe bLLHn8HwDdR8x4x/7sOrsNyfgyl0yqw= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=BrBti8jv; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.217.44 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691690070; a=rsa-sha256; cv=none; b=MsFATiPvQVt2x3148j13ZaReS6CUgrl66GJxSnAm8T/5Bhd6pnfHPhuBOk7mLGL1uxHRrc H2A7GtmfjGRDtAR4ZozCOdd1szzn2wKwjuiwnCpBHuehX954/mFPbXA1hS4uC61sVhsOBO uwn3bDupfECL7wUZqXpdtQuGNuVKVeo= Received: by mail-vs1-f44.google.com with SMTP id ada2fe7eead31-447abb2f228so487700137.0 for ; Thu, 10 Aug 2023 10:54:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691690069; x=1692294869; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=HsnXtkAbt5StrHbvkKhoLnPviOgP7ErAacnJfDUmcXw=; b=BrBti8jvZBkdZvAMzTXcVcT6qDH+oFXGuSs4dBnP5eM24lSXVCJxbhMBwaet3Mt8WT gyJ6Fwyr3yPSqvJCdvC+svhqdynOQ10h2aKIHYBbvizeMRkSzGHuj4GXZPEoAFtYUY96 XeZNsqG+9uvrQehdt8gWW8R7clbtQWdUquFXCkanNTWaEsX61rbB8mYYuerveSgMpc/b CLF8ttkx/sRFldgV9aaNNJLfSVCBm3CdFtPYfgUMMOTKaU++IfJH40/1oZ/u/07+18ve XR5y9SnAlN8LO2UUHV8TMzJ1uEQFVAmQckP993mOHBt+w/+Z9AP40lf3bFxjtmqigVnF /idg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691690069; x=1692294869; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HsnXtkAbt5StrHbvkKhoLnPviOgP7ErAacnJfDUmcXw=; b=lvoXiY+3e1YPM7onJ1ItLxOZ8heGbYSiRQ5uf2m7G7lE3Ps+q91rgZYZKZwPOR0JhJ Ft711mTXQe9NFRn8/VRAsVQQeDEZA7m+CRDDxOi7F0Ir8EI6xhi06iFPRMs153vkrKY2 zt2d16Ddk7nIpDloQQWkHLEGPPJVSPEJBV+8XGR8XNqKo2KhNhI7KnEtnhZx6sbkvHOp tNLG23XV8Lgmn+QG3CSHYWZnmLnmEIjND1Dv2ueN8s2KjLU5eRqhLpRpdxQACMmDycU4 1UySArSPI0guZH54J0DA/8E8FUBSFC+cSdUoxla4e1BA5YotseNsl8Vvl4TNpi9p3y2x AXPg== X-Gm-Message-State: AOJu0YxDiKeC4AnR5BznYo87UfhpjcVmtEOHm4jO5Ii2KaM06yEuh8lu FRW1UM5fH9BtYJuKP5rZiIzezQTz7voL540w5RI= X-Google-Smtp-Source: AGHT+IGV8Vdv97Cp2vmzxYdLDt3heK0u2PCQsPfd6YI0jzcJsCEDeETQoHnlJorjEvzWxgv0rxpJ70jrDWJCSmvoIho= X-Received: by 2002:a67:f554:0:b0:443:4e7d:c8db with SMTP id z20-20020a67f554000000b004434e7dc8dbmr2471723vsn.2.1691690068647; Thu, 10 Aug 2023 10:54:28 -0700 (PDT) MIME-Version: 1.0 References: <20230720102337.2069722-1-jaypatel@linux.ibm.com> In-Reply-To: <20230720102337.2069722-1-jaypatel@linux.ibm.com> From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Fri, 11 Aug 2023 02:54:17 +0900 Message-ID: Subject: Re: [RFC PATCH v4] mm/slub: Optimize slub memory usage To: Jay Patel Cc: linux-mm@kvack.org, cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, aneesh.kumar@linux.ibm.com, tsahu@linux.ibm.com, piyushs@linux.ibm.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: E8BB840024 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 4ypmp95xyrrf414upaqg8t4febtusuze X-HE-Tag: 1691690069-479498 X-HE-Meta: U2FsdGVkX1+1GdOS2/bl2kE9CH14Tp3ryCXFTjZAlYueYsTU5lIYVC2KvlcqZj5zGgkp+IXwyVgo1tBUAYmc8JWyYp9iDWKvN7J+piHU3ayM1HtGzuX+5w4QnxHaji2T8vIgmNYiO2meBn31YCFdtLzR8/LxHGD5qRNb+KGc0S6PVJ9tw92EfFqOmSSMRZVyNmxrzmn8uaK1ulcdE3gTQaZVOo3Eqi+sGWKJ+OFcDR6gyn/6jmq2SQ7Yy1Dri2pXdjwNOEmVenIaO4lN8P6iltC2Dih4a6tKNWXgRJybU2MUDNjTsS/IqMr7BaB3zrs4HcWzC+FG68/264wStO5VC/kYs8TEEuz5z15Wv1ksGY5NigUIsM9L/4RRPjAY+H3SZiHVYRkLLbtbQlwf5HDG8UiliyTbZGG1fyLxJP71/l4YSYOgKszFTRcUkU0njfloL3qxuZ6uskoSxZykn5qwgnfXXdoEaURHvcaMlABppglDGldiFIZo7fEWBonhkn3xrvPj919yAguEIa6mdFLHQ/wbW/9JGdj0o4UV0ZgLt/rKEpHlKZaIi+JMR/i/F64vz1wkuFTCPCOGLbLFDsKQRFM2jte1wqV7aDcJzAjbKaGbwmE/OmdA/sFDltG3iQZtiNAqpyakQPpS4qtXSK142JHyrx2jea6zA+q0t+8qNHsCOHiJBj/kAUirzWswF/LjDLOnsQJAxa4AcYeJp1ohCNypPUFQb9CZza7qgL58o1lmW1aRUIzpBk1MY70FmoDCq76KvPOr7xEujbZWTHPw1b7/NsfT9HKh0XJKt/6kt33GJ1TreV3BF8kGNacrNz5mXBRO0GicRTKpO0AYonoNUWHsMliDZUf9Y7iy/zN4uWHwVyk9Lk832UGro0d8bKEiZ8BU/mUfeSUzPGxcOeszXCrIRU9xhwgyazpVD8kZxnYfMVS5/0UNVl+hKwqOs7tc0iXDw3R/ODFDCyIAopa pVwOYdxY I0Z1yb3Ii3GFdEN+genNhSdzkWv5y1bPD0rluwDCGLg41hUgQWiyFeHKG/+tobbTgkaH0EuXxuYPmem0BEpjdZNlb7pk0INIGG5/1N5P7BF2Jo31TUCoBe0wBmsr8BZByJrgSfEXpuHo1r76xl32p9Gdwxn+VHBt4eYcXa6OpMu7NoKRd7SI9ljYKDPaeuOa0+fxTFGW/7vpwnG5MECJJcnSrffz/OVoyMxH8iZVrFKPJKyz1Jm9d1tmdFqG6gab0lRfl8ye9uXRQ4MR2k1cVbeglO3j2e9zkeaizcYGyNgAmEV1clfZiZ82g+OUPM+eHpbDecH4CBcQcRDjjNcLSB0uvWVijAC/vF7Ed9ZlcCH4e46kQenflnWKO890qGYDwMlPr3PFNh9MeMrHr2kNeoCjp74whU9ZxeI2SeiECvDNShVqEjD0YVDoX6yyMY/Feq+9/qD1BswG1g1s= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000052, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 20, 2023 at 7:24=E2=80=AFPM Jay Patel = wrote: > > In the current implementation of the slub memory allocator, the slab > order selection process follows these criteria: > > 1) Determine the minimum order required to serve the minimum number of > objects (min_objects). This calculation is based on the formula (order > =3D min_objects * object_size / PAGE_SIZE). > 2) If the minimum order is greater than the maximum allowed order > (slub_max_order), set slub_max_order as the order for this slab. > 3) If the minimum order is less than the slub_max_order, iterate > through a loop from minimum order to slub_max_order and check if the > condition (rem <=3D slab_size / fract_leftover) holds true. Here, > slab_size is calculated as (PAGE_SIZE << order), rem is (slab_size % > object_size), and fract_leftover can have values of 16, 8, or 4. If > the condition is true, select that order for the slab. > > > However, in point 3, when calculating the fraction left over, it can > result in a large range of values (like 1 Kb to 256 bytes on 4K page > size & 4 Kb to 16 Kb on 64K page size with order 0 and goes on > increasing with higher order) when compared to the remainder (rem). This > can lead to the selection of an order that results in more memory > wastage. To mitigate such wastage, we have modified point 3 as follows: > To adjust the value of fract_leftover based on the page size, while > retaining the current value as the default for a 4K page size. > > Test results are as follows: > > 1) On 160 CPUs with 64K Page size > > +-----------------+----------------+----------------+ > | Total wastage in slub memory | > +-----------------+----------------+----------------+ > | | After Boot |After Hackbench | > | Normal | 932 Kb | 1812 Kb | > | With Patch | 729 Kb | 1636 Kb | > | Wastage reduce | ~22% | ~10% | > +-----------------+----------------+----------------+ > > +-----------------+----------------+----------------+ > | Total slub memory | > +-----------------+----------------+----------------+ > | | After Boot | After Hackbench| > | Normal | 1855296 | 2944576 | > | With Patch | 1544576 | 2692032 | > | Memory reduce | ~17% | ~9% | > +-----------------+----------------+----------------+ > > hackbench-process-sockets > +-------+-----+----------+----------+-----------+ > | Amean | 1 | 1.2727 | 1.2450 | ( 2.22%) | > | Amean | 4 | 1.6063 | 1.5810 | ( 1.60%) | > | Amean | 7 | 2.4190 | 2.3983 | ( 0.86%) | > | Amean | 12 | 3.9730 | 3.9347 | ( 0.97%) | > | Amean | 21 | 6.9823 | 6.8957 | ( 1.26%) | > | Amean | 30 | 10.1867 | 10.0600 | ( 1.26%) | > | Amean | 48 | 16.7490 | 16.4853 | ( 1.60%) | > | Amean | 79 | 28.1870 | 27.8673 | ( 1.15%) | > | Amean | 110 | 39.8363 | 39.3793 | ( 1.16%) | > | Amean | 141 | 51.5277 | 51.4907 | ( 0.07%) | > | Amean | 172 | 62.9700 | 62.7300 | ( 0.38%) | > | Amean | 203 | 74.5037 | 74.0630 | ( 0.59%) | > | Amean | 234 | 85.6560 | 85.3587 | ( 0.35%) | > | Amean | 265 | 96.9883 | 96.3770 | ( 0.63%) | > | Amean | 296 | 108.6893 | 108.0870 | ( 0.56%) | > +-------+-----+----------+----------+-----------+ > > 2) On 16 CPUs with 64K Page size > > +----------------+----------------+----------------+ > | Total wastage in slub memory | > +----------------+----------------+----------------+ > | | After Boot | After Hackbench| > | Normal | 273 Kb | 544 Kb | > | With Patch | 260 Kb | 500 Kb | > | Wastage reduce | ~5% | ~9% | > +----------------+----------------+----------------+ > > +-----------------+----------------+----------------+ > | Total slub memory | > +-----------------+----------------+----------------+ > | | After Boot | After Hackbench| > | Normal | 275840 | 412480 | > | With Patch | 272768 | 406208 | > | Memory reduce | ~1% | ~2% | > +-----------------+----------------+----------------+ > > hackbench-process-sockets > +-------+----+---------+---------+-----------+ > | Amean | 1 | 0.9513 | 0.9250 | ( 2.77%) | > | Amean | 4 | 2.9630 | 2.9570 | ( 0.20%) | > | Amean | 7 | 5.1780 | 5.1763 | ( 0.03%) | > | Amean | 12 | 8.8833 | 8.8817 | ( 0.02%) | > | Amean | 21 | 15.7577 | 15.6883 | ( 0.44%) | > | Amean | 30 | 22.2063 | 22.2843 | ( -0.35%) | > | Amean | 48 | 36.0587 | 36.1390 | ( -0.22%) | > | Amean | 64 | 49.7803 | 49.3457 | ( 0.87%) | > +-------+----+---------+---------+-----------+ > > Signed-off-by: Jay Patel > --- > Changes from V3 > 1) Resolved error and optimise logic for all arch > > Changes from V2 > 1) removed all page order selection logic for slab cache base on > wastage. > 2) Increasing fraction size base on page size (keeping current value > as default to 4K page) > > Changes from V1 > 1) If min_objects * object_size > PAGE_ALLOC_COSTLY_ORDER, then it > will return with PAGE_ALLOC_COSTLY_ORDER. > 2) Similarly, if min_objects * object_size < PAGE_SIZE, then it will > return with slub_min_order. > 3) Additionally, I changed slub_max_order to 2. There is no specific > reason for using the value 2, but it provided the best results in > terms of performance without any noticeable impact. > > mm/slub.c | 17 +++++++---------- > 1 file changed, 7 insertions(+), 10 deletions(-) > > diff --git a/mm/slub.c b/mm/slub.c > index c87628cd8a9a..8f6f38083b94 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -287,6 +287,7 @@ static inline bool kmem_cache_has_cpu_partial(struct = kmem_cache *s) > #define OO_SHIFT 16 > #define OO_MASK ((1 << OO_SHIFT) - 1) > #define MAX_OBJS_PER_PAGE 32767 /* since slab.objects is u15 */ > +#define SLUB_PAGE_FRAC_SHIFT 12 > > /* Internal SLUB flags */ > /* Poison object */ > @@ -4117,6 +4118,7 @@ static inline int calculate_order(unsigned int size= ) > unsigned int min_objects; > unsigned int max_objects; > unsigned int nr_cpus; > + unsigned int page_size_frac; > > /* > * Attempt to find best configuration for a slab. This > @@ -4145,10 +4147,13 @@ static inline int calculate_order(unsigned int si= ze) > max_objects =3D order_objects(slub_max_order, size); > min_objects =3D min(min_objects, max_objects); > > - while (min_objects > 1) { > + page_size_frac =3D ((PAGE_SIZE >> SLUB_PAGE_FRAC_SHIFT) =3D=3D 1)= ? 0 > + : PAGE_SIZE >> SLUB_PAGE_FRAC_SHIFT; > + > + while (min_objects >=3D 1) { > unsigned int fraction; > > - fraction =3D 16; > + fraction =3D 16 + page_size_frac; > while (fraction >=3D 4) { Sorry I'm a bit late for the review. IIRC hexagon/powerpc can have ridiculously large page sizes (1M or 256KB) (but I don't know if such config is actually used, tbh) so I think there should be an upper bound. > order =3D calc_slab_order(size, min_objects, > slub_max_order, fraction); > @@ -4159,14 +4164,6 @@ static inline int calculate_order(unsigned int siz= e) > min_objects--; > } > - /* > - * We were unable to place multiple objects in a slab. Now > - * lets see if we can place a single object there. > - */ > - order =3D calc_slab_order(size, 1, slub_max_order, 1); > - if (order <=3D slub_max_order) > - return order; I'm not sure if it's okay to remove this? It was fine in v2 because the least wasteful order was chosen regardless of fraction but that's not true anymore. Otherwise, everything looks fine to me. I'm too dumb to anticipate the outcome of increasing the slab order :P but this patch does not sound crazy to me. Thanks! -- Hyeonggon