From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2ECD6C3DA7F for ; Mon, 12 Aug 2024 04:29:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3C9536B008A; Mon, 12 Aug 2024 00:29:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 379256B008C; Mon, 12 Aug 2024 00:29:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 240ED6B0092; Mon, 12 Aug 2024 00:29:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 025676B008A for ; Mon, 12 Aug 2024 00:29:55 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6DB084084E for ; Mon, 12 Aug 2024 04:29:55 +0000 (UTC) X-FDA: 82442315550.20.8C395E6 Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by imf16.hostedemail.com (Postfix) with ESMTP id 998DC180005 for ; Mon, 12 Aug 2024 04:29:53 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=deTcqjO5; spf=pass (imf16.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723436939; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iNMyygeVNgEAwx9iJlCggUVVUh6pFJu2DlL4gUoLISM=; b=ONN4jO00IWptxobfc9yYP0xPLPnuqeTbt+bgDpA6biNVIexltQzccQiIqts/Yw/y+1R3Nz 5JL54Vu9Ap0SW0tw+InENELiSLQfLhhT18EI/GwoTfxcN6yR5Ahcqv9Lhg64Ts5OF14NxP M1zU9vZZEXmbEBnFEHUxFPO/tW4daro= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=deTcqjO5; spf=pass (imf16.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723436939; a=rsa-sha256; cv=none; b=1rNORWI9mvmTDLrqqkmhMzJCtdD1XhcdxMXzAvu1wac9sq1B105j2B5Cc1rppYfO2Ha3mi CqaRjZC1UdMN4TIGh3oe/rK9lerkVQm2KjwQN4e21Mq2iQdQWLN5M2gPUHl3G5LuoHu8B+ +CvV0Y9EHXYVzU3BL5KvkxzPct+mMpU= Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-5ba482282d3so4739139a12.2 for ; Sun, 11 Aug 2024 21:29:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723436992; x=1724041792; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=iNMyygeVNgEAwx9iJlCggUVVUh6pFJu2DlL4gUoLISM=; b=deTcqjO5sx1fVTbS/6sfWpV4nYTBYc9QITcyFGMVRNqEfgdQCIN1haSuPXTUy/iW/y rHrwB1lPdVcQPZtsJZf9jm433Y6HQjn2gwhzFoLp+zsUzjImAOQrF/XqJh574DBGdoz+ NyHFC8fZluIHr8AVs96R7Kk/Gzw8Ml1g44nooTaIkWA6fUJL0RxS9Ypuqc2MLrd27jw3 74Y/w2kw6L+h4MgLEN4mmem2jbZ/a7iXffLZACds4OgbTNfqP60OwaE/PCeocT6sKssv S3zuEnjfrRUcZWV3XvOtnd4/cVu8wg7afSjYupYSgvYYTTKMhRkVKlUPnGedDvNme5ZS 883A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723436992; x=1724041792; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iNMyygeVNgEAwx9iJlCggUVVUh6pFJu2DlL4gUoLISM=; b=Gm3swDhrKRwiIaDXfHbkspcsQHRnUqVomFAggR9JlkZMqznUyVDHP+/KBmL+9j2k1j Nmo/JjM2mqh9YOnETL6b1n93IvoZ4nuTrztDqf/mF4SQiPod1V0JxP4E2/g/QZu/MW/k z9N6EiR24ucArSaVOaBwINlOsgMHCM6apELuH4CLFyA86fDu1aCI38mL/LMa7WGbucLe SZbIYXEIZ/Q6BrbhE3bnKtwxsteSXBmqsP5nJZnFwYpUawXEai3iGqijVlo9NcQpNYKl dvhGJ8iDVC+vccEbkdKSi9+6emC2Q8KotObIotUlgZliaiVS8F6FtRF+H0kBOEO3ChPk PHoA== X-Forwarded-Encrypted: i=1; AJvYcCXG8LlptWvoozKRZlkuLYqE7plzjSgRBVnqGNItqXbb62WTAIaI3gQpkXHjRirNFWa0OKWsXmv0iw==@kvack.org X-Gm-Message-State: AOJu0YzgRKSEipDen/FvGAbgVZFcAbtS1bcRQ4TdCkfuzzGyw2VkXX2G hkn3dcDpXHHpH6vG84xeZXQpu9qzN9Rla0u4SMtV9SNoLbbnASphiFpwq289FfegvYOb2kqv0aW r2J6ixE6+nLGLesuQwcg7vFNnvwP3FWNP X-Google-Smtp-Source: AGHT+IHZ/UQ0l7NCmkXYTApz/zNg7ux6+r5p94v9JKh/7y/3DhGUrfnzSVtGwaD9JtUj5ZRlXmPT8tZezMNQFLkDpB4= X-Received: by 2002:a17:907:1c2a:b0:a77:b054:ba7d with SMTP id a640c23a62f3a-a80aa65a401mr488737966b.46.1723436991580; Sun, 11 Aug 2024 21:29:51 -0700 (PDT) MIME-Version: 1.0 References: <20240808185949.1094891-1-mjguzik@gmail.com> In-Reply-To: From: Mateusz Guzik Date: Mon, 12 Aug 2024 06:29:38 +0200 Message-ID: Subject: Re: [RFC PATCH] vm: align vma allocation and move the lock back into the struct To: Suren Baghdasaryan Cc: Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Liam.Howlett@oracle.com, pedro.falcato@gmail.com, Lorenzo Stoakes Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: jpwfyesrfxfbmtog8a7atww6i3hfc1px X-Rspam-User: X-Rspamd-Queue-Id: 998DC180005 X-Rspamd-Server: rspam02 X-HE-Tag: 1723436993-564867 X-HE-Meta: U2FsdGVkX19Q7BbCe+KxbuTKC4kV8NJ1KFZ208whdQkElMReDw1ta26VyUHguCnEz6bP2rA85WurCsNs1hGM1OkuuHG8rAM9wRYv3frvGl+DXEFTQu/dQZ6dFvqz77ay0syg/VBdT/OskAISlAlAsNKjZz4KzMAx76IJrSpBakA2RNT6McGM49S6nHPcxfkpaDqTpoMIUR9OjwxZjG0hjslr7nZV839Xszi9jUpj3hM8xDdjxYh47v6WK3TRIUSw3rhQ05s8PqHsZVEbGyS4wmVEIyIsevEhg1VDdsCOBHyYqI6K1RejTEiArNevojLgA7JkmDwa07eYkkkqZ4rA5n6shUsDirx5TZouwwq4bCVAjLwv+nE8aYD2F5WdIKkIOGJk5gg8vJqAtfaU85z45/FPw/zOBtsiUjPOVR7Nu/S07cNWoDeWcUHRVmfLGPdyI/zV1MYak+D3KLdZAZm18EtGJBrc6RjFFgNldgXipkEvgjyHnoM1tE8atJEuaUv22dKOEc/cQHjPj5iSu9UEd9u/MKxmBOWanw1q/PAdkMpYcQ+CgnKK+FET84oQ5r0r9k1e5/1ps1jYRDJVbwa1zD7o0hao04oRETFleKuJGjpZjHWHRG0/iLufpy1hG0ikG7SRyQ+t7OnJzyhEgonq6jGbbO96H4R6TwhzpDSgwSVio1pfiPscfYIanxI+IlNnNgMmk0PzWLcIrW6ZxA/I0XPsQW7mscvmIV3i3Tc1nDzpFhjM0kZqApYJnCtvEGfk5W1Iyv/rjVpOfD1ocu/ftyUc5/OkdTMvbVrcbtsIIw1sSpE7+vzXXDCZuUmLhiX0NhfW+WccjFhF95W/lM2qHScNF5WEnlKPNSDvE+e4jp18F4LARP5SuReoTSkFu0k4KlntYaF1taADg5w/zCXqnfJSXdkEdYb2uYVcc72NvFOQxkrroAvcguf9bVRbYSPVbl7yE+Z0BsgSPRMVSu/ v9dWTXbg cVPRbsu2AhfSNU6khT15vD2h3r/LGde1P/rpFjno3hPJhg0pOQtWoH8vgyhGPUTyiRybsnuxtD2OBlNYO+xQ7rU6pkg0cyhflZAZNGc0ArPsUCBuIA6nBYUDuIBuig1nkW9veQeZsHD4lrvwld+mmYojWjoK8QfkTrBjnjhXoD3eDNNp56LTdFIcaav69Nq3/6oE6lUoVg5qqpN0/AtKqVA1b8lyHsMxwWhPYYkNF7MMrBmW9Cbp8Oyweeg5QEXt9pqT0kMuvtkUpJ12e9v2ShiDRJUgdm7Qb3NeucgBQKzvpax5BRab2T+AMKtleyeiVWVFObclgNgTpRNoYbrcvic2ejyY+RI+GRkGDCJxFj20G661HdRgfNUAGdXQYsOqH/JUj2HI/M8LDHL+WCghDEHE6WQjuFKnzDDK5p1c9su/ePXC/0pmf42RfQjPuhqrBDeWz1yI/JdV0IwY7ru2h2wqZmw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000250, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Aug 12, 2024 at 12:50=E2=80=AFAM Suren Baghdasaryan wrote: > Ok, disabling adjacent cacheline prefetching seems to do the trick (or > at least cuts down the regression drastically): > > Hmean faults/cpu-1 470577.6434 ( 0.00%) 470745.2649 * 0.04%* > Hmean faults/cpu-4 445862.9701 ( 0.00%) 445572.2252 * -0.07%* > Hmean faults/cpu-7 422516.4002 ( 0.00%) 422677.5591 * 0.04%* > Hmean faults/cpu-12 344483.7047 ( 0.00%) 330476.7911 * -4.07%* > Hmean faults/cpu-21 192836.0188 ( 0.00%) 195266.8071 * 1.26%* > Hmean faults/cpu-30 140745.9472 ( 0.00%) 140655.0459 * -0.06%* > Hmean faults/cpu-48 110507.4310 ( 0.00%) 103802.1839 * -6.07%* > Hmean faults/cpu-56 93507.7919 ( 0.00%) 95105.1875 * 1.71%* > Hmean faults/sec-1 470232.3887 ( 0.00%) 470404.6525 * 0.04%* > Hmean faults/sec-4 1757368.9266 ( 0.00%) 1752852.8697 * -0.26%* > Hmean faults/sec-7 2909554.8150 ( 0.00%) 2915885.8739 * 0.22%* > Hmean faults/sec-12 4033840.8719 ( 0.00%) 3845165.3277 * -4.68%* > Hmean faults/sec-21 3845857.7079 ( 0.00%) 3890316.8799 * 1.16%* > Hmean faults/sec-30 3838607.4530 ( 0.00%) 3838861.8142 * 0.01%* > Hmean faults/sec-48 4882118.9701 ( 0.00%) 4608985.0530 * -5.59%* > Hmean faults/sec-56 4933535.7567 ( 0.00%) 5004208.3329 * 1.43%* > > Now, how do we disable prefetching extra cachelines for vm_area_structs o= nly? I'm unaware of any mechanism of the sort. The good news is that Broadwell is an old yeller and if memory serves right the impact is not anywhere near this bad on newer microarchitectures, making "merely" 64 alignment (used all over in the kernel for amd64) a practical choice (not just for vma). Also note that in your setup you are losing out on performance in other multithreaded cases, unrelated to anything vma. That aside as I mentioned earlier the dedicated vma lock cache results in false sharing between separate vmas, except this particular benchmark does not test for it (which in your setup should be visible even if the cache grows the SLAB_HWCACHE_ALIGN flag). I think the thing to do here is to bench on other cpus and ignore the Broadwell + adjacent cache line prefetcher result if they come back fine -- the code should not be held hostage by an old yeller. To that end I think it would be best to ask the LKP folks at Intel. They are very approachable so there should be no problem arranging it provided they have some spare capacity. I believe grabbing the From person and the cc list from this thread will do it: https://lore.kernel.org/oe-lkp/ZriCbCPF6I0JnbKi@xsang-OptiPlex-9020/ . By default they would run their own suite, which presumably has some overlap with this particular benchmark in terms of generated workload (but I don't think they run *this* particular benchmark itself, perhaps it would make sense to ask them to add it?). It's your call here. If there are still problems and the lock needs to remain separate, the bare minimum damage-controlling measure would be to hwalign the vma lock cache -- it wont affect the pts benchmark, but it should help others. Should the decision be to bring the lock back into the struct, I'll note my patch is merely slapped together to a state where it can be benchmarked and I have no interest in beating it into a committable shape. You stated you already had an equivalent (modulo keeping something in a space previously occupied by the pointer to the vma lock), so as far as I'm concerned you can submit that with your authorship. --=20 Mateusz Guzik