From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65A4FC3DA7F for ; Mon, 12 Aug 2024 15:27:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EEDD26B0095; Mon, 12 Aug 2024 11:27:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E9D3C6B0098; Mon, 12 Aug 2024 11:27:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D65D06B009A; Mon, 12 Aug 2024 11:27:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B7C376B0098 for ; Mon, 12 Aug 2024 11:27:44 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 054E940257 for ; Mon, 12 Aug 2024 15:27:44 +0000 (UTC) X-FDA: 82443973248.02.50ED9E3 Received: from mail-yb1-f174.google.com (mail-yb1-f174.google.com [209.85.219.174]) by imf23.hostedemail.com (Postfix) with ESMTP id 35816140016 for ; Mon, 12 Aug 2024 15:27:42 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=L5rIZVeM; spf=pass (imf23.hostedemail.com: domain of surenb@google.com designates 209.85.219.174 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723476407; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wmdptg1/tyPFbhndZvQOWlGVVuGmVhoQEjYbJCzYHfw=; b=kNFess5EFaO84ADKInYaLAz+lHA2LIOpzI6/L3PM2N1NjUFsOuOJ7Kn6gK+Z2s98lrFscC GJwlacVK2oAXo6zUQ1QaoMvUqgQwfB8Has+P+Rvbr5Pho0i6yy+R2tFi8W38iaDlELsvu6 PNcLPE93Fw6QjLAKFqd5VxHkqbtXePs= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=L5rIZVeM; spf=pass (imf23.hostedemail.com: domain of surenb@google.com designates 209.85.219.174 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723476407; a=rsa-sha256; cv=none; b=iQZuSbatkDuoAa5dT+gfrHizbdGP3NnON1EmlWnoCm2zApLRitPldaVfutIXa812ruQSqA 2WcGfLmGEjBRvuHA6eOs6oeV5Q3H4hGrEIuD+vbkkn7DwtgQt6DWw6GBKHFw9Ln4liZ6wJ XPRGqAlfISTxBuZbgecynDtZc9cpiC4= Received: by mail-yb1-f174.google.com with SMTP id 3f1490d57ef6-e0e76380433so3965067276.2 for ; Mon, 12 Aug 2024 08:27:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1723476461; x=1724081261; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=wmdptg1/tyPFbhndZvQOWlGVVuGmVhoQEjYbJCzYHfw=; b=L5rIZVeM08n9Fm6mkN8SWqmFA233WmJGisKlcBnxnmdiThTPcKxU57OztUvg4Zql/j 6Awfm0xZ6U/LDjUtn0aWHuBBgDPUvbU+C4e7jCWPGXqmd4EQodlyDkdZpN0sAKLPDEMW q4uXk5PVUnYaafRsHreiR3kFc9tfKYA8tnmTFBXLYrn8lYLQJAnUY7IoOXfuV+O4dpRn WHJNu7j5tODnwrbDPREv3QI8lDnvuTeMdh5CS7EBSa/U8qmNR4vKcisaXhQli+BB25+v PYbhaXy0gxukb76wt8czXrFCdB4ImDF62Iq0iJ8RVfghxitQ4K0jYvHuQCLx+3thgwUa NbGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723476461; x=1724081261; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wmdptg1/tyPFbhndZvQOWlGVVuGmVhoQEjYbJCzYHfw=; b=AWFUEuENU6C4+KPjnZ049xKVC1zxLx0jpmgRKB7/q69YrEwZFbj5VNFLJK+Z6GZshZ o8KJG3+dtkwM23VwwAh558AK5l3quS2TeAK5cpQWjgL/AToKMy19fJ2vR8ij6YvytIAm WENB3I7M3kgGxLSvIb+lM3EeLwL2fzf/XrJELPOEnEq+VgX+symsMA4B/bHjxEcoFr8j tiQIO2o7GePX2tYUJTa8/pXuYCxihsE/W9bOmMvCTiYhNcjDA7GLsB+8YqpI7x9DRJUG rk4XATm999aCMz96YU9hlhtL86VdjRrca6mHr8Uh5oI2/OdnlzoqaLbtnwfKoQNi1UoY z1Sw== X-Forwarded-Encrypted: i=1; AJvYcCVKk416juXwpThmLk5Jd221F2eXQx633agjNTmKT6b87qJ0hG2Z0nDsParvH/12vb04ZDqkz5hTm+V9g5I6md1emOw= X-Gm-Message-State: AOJu0YwtsUd0hj8KvgkyI0nZVfFdOLNwj7IqNrpWBNB8/vZvaaztKesb rZh8xfWB6syHrme/qdvSAqn4OeS++mjfdN8mLeEl11UQRGsFhlG78x34nsrcPSckwvTjqXNZV0J gOT5KgYQjGVNQJSxoONdNOErwqPmTwM5Xp2xq X-Google-Smtp-Source: AGHT+IGTrjh8TSkmkypNTI0dBHCTUrRP3dQJZfw7vDTGhaLT9vmR4W8VsOJmhTfQtQEp+pwDN1X0jPRCs3GGTrc7WJU= X-Received: by 2002:a05:690c:f8c:b0:652:e91f:a1bc with SMTP id 00721157ae682-6a971146f58mr9062257b3.3.1723476460758; Mon, 12 Aug 2024 08:27:40 -0700 (PDT) MIME-Version: 1.0 References: <20240808185949.1094891-1-mjguzik@gmail.com> In-Reply-To: From: Suren Baghdasaryan Date: Mon, 12 Aug 2024 08:27:27 -0700 Message-ID: Subject: Re: [RFC PATCH] vm: align vma allocation and move the lock back into the struct To: Mateusz Guzik , Mel Gorman Cc: Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Liam.Howlett@oracle.com, pedro.falcato@gmail.com, Lorenzo Stoakes , Mel Gorman Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: m8iirpjww88iy467xnty67bze19qa4dq X-Rspam-User: X-Rspamd-Queue-Id: 35816140016 X-Rspamd-Server: rspam02 X-HE-Tag: 1723476462-395149 X-HE-Meta: U2FsdGVkX1/8EWaB/NWfDRHbdUpAo5pJvgbIJOx3pQd2PQlBtKTcTKKzmpJ1W8MdW3ASHtzfQn/WO6bzHu4qB9m3JlkuKFnNecTjdaU2MWoS2rmwt496PuZba1AmGebzZX8P1rciyHacTeJZkNb+crKZFjqjXymzlWmD6l3Pio0KqrEDndFFuzbD1DPmLc2lI+7AWH4t90SAZXUrsbSNYkXbN/Wwxd/WygN5yBMWNxXpN8D1gYv3SdL2pwFbn9ECXszDxstf72nWoPfaKbCA73bJCTgK/03xclrzcF6/94rO4Oux8tI2LFpiV+dJujzsmkkxY6XmuV6LbH8kQtLRYZaSQ66GQQUV2T5KO2QHOPThEiXtp0qiwudsHz9FfOD5GpzJO7uwQGBahhpeRzR7HG/KFMdQWpSEM7BzJbku74OwgRUrOOdO+zOZscWX8YZesQi2sruK/3M/IezWXxqUiGe8/TGbjAUgh51wauzL8xVGVm7QV8SluiBZ63CAMfDGwBLMnCz0t4JSxWlnKW5iQrC9NRTHwKwxj4AqCmF/eDv6JCxq4d/sfP+0XyZHe40gS9D7rzDFmQLfjlJmoDD8XDRsGoNwwH7lYiVGhm8W8FYgYdZ2Gr1qb1pCyt53K9hx0laQKpiWUCO4zGMkkVx0tfNqsrr0X5wZvokBAD91aiSpzuuCexSwPnAqQmbCV4mzJxzwIgGyGGdXe3dx/9vWvu1IS22owjn6/V024qxGaepphXFKRU+CkBCtML+4E6C6MIV6BsGe1z9dnPnislA0igZGJbcRRBxA0vYJc51WoOm0OrzM/NVYe7zMAssIq4D0iHQdF/4aHZcMCdOLqOo0orNV+fJRfxN3TBQjOOn0dOWRthztKqZek5Nue+DZ++FvbRPy6QWYk9p9CoWofp56hYHvGmpMrj4tLYUajJaGQSbGOmwzB4dp7GcmzXwXLCFFVeHiK65XjNfvLKR4DHt Pilp/rgZ EnKZ7xYqog//mqSfP3zoiweCVvR7Sj/y2d6cqzaYMt7+kKrZEaIdBk+oinL4jPkZEY8184hoAZApp9s1blP8ws1mJEyCkEkSkHkzLGA9eXY1PkrmmgMUu3DiRurA6ZUMiXgUU44IOfL0ShMU1aEI03XUeeIxyqqLf+WK1CJeP0JLsvuYsr+lEM7osw+EHAfqrDcLnCauS9SszzmY1Ntq0zVfOJLfoDmsDweW11e6kj9hw+96/VcE4G8eULimByTbCMisd1BqaHq4S6RAhPDLnK2Y68qMnOSQwtx7QsIhv5oMfc+7cw9ybyDbTyjjuyP+q9nJpCaNEDiucx3+UlpEh+jVvKA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000034, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Aug 11, 2024 at 9:29=E2=80=AFPM Mateusz Guzik w= rote: > > On Mon, Aug 12, 2024 at 12:50=E2=80=AFAM Suren Baghdasaryan wrote: > > Ok, disabling adjacent cacheline prefetching seems to do the trick (or > > at least cuts down the regression drastically): > > > > Hmean faults/cpu-1 470577.6434 ( 0.00%) 470745.2649 * 0.04= %* > > Hmean faults/cpu-4 445862.9701 ( 0.00%) 445572.2252 * -0.07= %* > > Hmean faults/cpu-7 422516.4002 ( 0.00%) 422677.5591 * 0.04= %* > > Hmean faults/cpu-12 344483.7047 ( 0.00%) 330476.7911 * -4.07= %* > > Hmean faults/cpu-21 192836.0188 ( 0.00%) 195266.8071 * 1.26= %* > > Hmean faults/cpu-30 140745.9472 ( 0.00%) 140655.0459 * -0.06= %* > > Hmean faults/cpu-48 110507.4310 ( 0.00%) 103802.1839 * -6.07= %* > > Hmean faults/cpu-56 93507.7919 ( 0.00%) 95105.1875 * 1.71= %* > > Hmean faults/sec-1 470232.3887 ( 0.00%) 470404.6525 * 0.04= %* > > Hmean faults/sec-4 1757368.9266 ( 0.00%) 1752852.8697 * -0.26= %* > > Hmean faults/sec-7 2909554.8150 ( 0.00%) 2915885.8739 * 0.22= %* > > Hmean faults/sec-12 4033840.8719 ( 0.00%) 3845165.3277 * -4.68= %* > > Hmean faults/sec-21 3845857.7079 ( 0.00%) 3890316.8799 * 1.16= %* > > Hmean faults/sec-30 3838607.4530 ( 0.00%) 3838861.8142 * 0.01= %* > > Hmean faults/sec-48 4882118.9701 ( 0.00%) 4608985.0530 * -5.59= %* > > Hmean faults/sec-56 4933535.7567 ( 0.00%) 5004208.3329 * 1.43= %* > > > > Now, how do we disable prefetching extra cachelines for vm_area_structs= only? > > I'm unaware of any mechanism of the sort. > > The good news is that Broadwell is an old yeller and if memory serves > right the impact is not anywhere near this bad on newer > microarchitectures, making "merely" 64 alignment (used all over in the > kernel for amd64) a practical choice (not just for vma). That's indeed good news if other archs are not that sensitive to this. > > Also note that in your setup you are losing out on performance in > other multithreaded cases, unrelated to anything vma. > > That aside as I mentioned earlier the dedicated vma lock cache results > in false sharing between separate vmas, except this particular > benchmark does not test for it (which in your setup should be visible > even if the cache grows the SLAB_HWCACHE_ALIGN flag). When implementing VMA locks I did experiment with SLAB_HWCACHE_ALIGN for vm_lock cache using different benchmarks and didn't see improvements above noise level. Do you know of some specific benchmark that would possibly show improvement? > > I think the thing to do here is to bench on other cpus and ignore the > Broadwell + adjacent cache line prefetcher result if they come back > fine -- the code should not be held hostage by an old yeller. That sounds like a good idea. Mel Gorman first reported this regression when I was developing VMA locks and I believe he has a farm of different machines to run mmtests on. CC'ing Mel. Mel, would you be able to run PFT tests with the patch at https://lore.kernel.org/all/20240808185949.1094891-1-mjguzik@gmail.com/ vs baseline on your farm? The goal is to see if any architecture other than Broadwell shows performance regression. > > To that end I think it would be best to ask the LKP folks at Intel. > They are very approachable so there should be no problem arranging it > provided they have some spare capacity. I believe grabbing the From > person and the cc list from this thread will do it: > https://lore.kernel.org/oe-lkp/ZriCbCPF6I0JnbKi@xsang-OptiPlex-9020/ . > By default they would run their own suite, which presumably has some > overlap with this particular benchmark in terms of generated workload > (but I don't think they run *this* particular benchmark itself, > perhaps it would make sense to ask them to add it?). It's your call > here. Thanks for the suggestion. Let's see if Mel can use his farm first and then will ask Intel folks. > > If there are still problems and the lock needs to remain separate, the > bare minimum damage-controlling measure would be to hwalign the vma > lock cache -- it wont affect the pts benchmark, but it should help > others. Sure but I'll need to measure the improvement and for that I need a banchmark or a workload. Any suggestions? > > Should the decision be to bring the lock back into the struct, I'll > note my patch is merely slapped together to a state where it can be > benchmarked and I have no interest in beating it into a committable > shape. You stated you already had an equivalent (modulo keeping > something in a space previously occupied by the pointer to the vma > lock), so as far as I'm concerned you can submit that with your > authorship. Thanks! If we end up doing that I'll keep you as Suggested-by and will add a link to this thread. Thanks, Suren. > -- > Mateusz Guzik