From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BB5AC05027 for ; Fri, 17 Feb 2023 21:48:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BAEBC6B0072; Fri, 17 Feb 2023 16:48:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B5F1B6B0073; Fri, 17 Feb 2023 16:48:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A012A6B0074; Fri, 17 Feb 2023 16:48:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 902436B0072 for ; Fri, 17 Feb 2023 16:48:02 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6159DA930E for ; Fri, 17 Feb 2023 21:48:02 +0000 (UTC) X-FDA: 80478122004.13.2B0057C Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf19.hostedemail.com (Postfix) with ESMTP id 76AC81A0020 for ; Fri, 17 Feb 2023 21:48:00 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=pBdWthIl; spf=pass (imf19.hostedemail.com: domain of hughd@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676670480; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=enluXHsm7DkEc3hL0ajHrApyhPOHqK5EyqrxYkxs80U=; b=uEK173g5Coe4Vxie5MCazp/+MeAyRL87Mo7/5o/BsbJbBkCv1jQLov9+wPr5eOog0vMmrl aD6SgWyYHtEmbSYe7YaauoDUbZnwhGcQdvnivGJdxUc/OJYGVP49CJcvcuDn6frQtEDbx3 vl0pIgeMjM5KAzf2IHLSLDBLpbiYxR0= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=pBdWthIl; spf=pass (imf19.hostedemail.com: domain of hughd@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676670480; a=rsa-sha256; cv=none; b=a9rOWmJ80nYyFXxO9OyFsNddOvzrG4122gBvIPr1Tze38ZSVuLPpfmNr4Cdn2kqxnxWhKJ UuGDjLA5JZpkkpjQED8GKs3oem6aDYT9kIbIeIWZ+BA2xuEW/LunLXBvI2TdhXmtrtKa6c eYArH+nn9ok151Ff7rFfp5rtNUyT22g= Received: by mail-qt1-f173.google.com with SMTP id c11so1003858qth.3 for ; Fri, 17 Feb 2023 13:48:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=enluXHsm7DkEc3hL0ajHrApyhPOHqK5EyqrxYkxs80U=; b=pBdWthIls+n16dmQx+0uCIxs88coY3ueOgBTPDqIo2ZjUKm5rXzl00XbA1LHggch4+ ayNT9pjqN2zGamMiXfsmYn8z9L10SLkOCrgRny0a8xPqIoiBYBQqrGproEXVqOE1d9RV ho+OSX9aonEFUfurMbwRw0J/ZfFgVXSlvyF71o7ns9lcX+ik9UwUGtr5IKlIZsEEnUTm QfjHZ67OaoNgPkhWnl6YBDrpY7norkpNkC5xGIpTQZbbL5sOG2VY58eUAau8CPXizi6O 1QjV34ZZkHhhTxRks41EDe3dJZ0oLoq8uJBgSZIcJERPN+NMKWKeCc6dYkXnKHtw8jus bSCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=enluXHsm7DkEc3hL0ajHrApyhPOHqK5EyqrxYkxs80U=; b=KxWtuE6xJArqDRdMS7OSyocqPh+fBJfdmDCDWxXzHXr4vHTRdYLoVBdXt5zqhcqgXy vIUcq2R7b5jBGX9D863cOB6+epbc8jqU/+BwLYbBMqa/9QESLv7em38GNiD6Pu5qIsU6 RqGocN0COlmbvwGrZat9phvH2EoWCypQeWCVRZw7Rbaa1MeK7fsWxwWjuNG1pcgFrk9r JymW3kSu3vRzwjmViHx9HSR+tVnpguqVcwQ66wpJ14l9A1lW8JyfdX9Fer40D1R/9Sou bP6sRvXvMwDqhzfl72M40v206NI6Wkj2TmZIC8RHNTbG5LgzIsGJ5UYRMTRq0Dzx6nHP 7u/g== X-Gm-Message-State: AO0yUKWNVSfvhBr42bZxyAAYBnY4+3gx9628Xlw8G07I28lueh4HMhq7 y94cYwQrC3HcdAsTUIgDDlTMHg== X-Google-Smtp-Source: AK7set9s13MPA7KIevE05yTqu6MkLcqJ1s2/ei6hCs9yDNe/IGqHKwYXxV+NibSXi7jOLMm0ZposlQ== X-Received: by 2002:a05:622a:389:b0:3ba:138f:7b46 with SMTP id j9-20020a05622a038900b003ba138f7b46mr3732914qtx.42.1676670479446; Fri, 17 Feb 2023 13:47:59 -0800 (PST) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id r9-20020ac85c89000000b003b869f71eedsm4013713qta.66.2023.02.17.13.47.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Feb 2023 13:47:58 -0800 (PST) Date: Fri, 17 Feb 2023 13:47:48 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Huang Ying cc: Andrew Morton , Jan Kara , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Zi Yan , Yang Shi , Baolin Wang , Oscar Salvador , Matthew Wilcox , Bharata B Rao , Alistair Popple , Xin Hao , Minchan Kim , Mike Kravetz , Hyeonggon Yoo <42.hyeyoo@gmail.com> Subject: Re: [PATCH -v5 0/9] migrate_pages(): batch TLB flushing In-Reply-To: <20230213123444.155149-1-ying.huang@intel.com> Message-ID: <87a6c8c-c5c1-67dc-1e32-eb30831d6e3d@google.com> References: <20230213123444.155149-1-ying.huang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Stat-Signature: 4imfb5xzgbjhu3f5su7sck7eds6npokn X-Rspam-User: X-Rspamd-Queue-Id: 76AC81A0020 X-Rspamd-Server: rspam06 X-HE-Tag: 1676670480-647797 X-HE-Meta: U2FsdGVkX18uSDWyWeJYCnHmK/thUMe2/j2mzi2FWc03t9rh/xz7XbDgHsAPMtzF6gRavYqsHnxnjmslJW0QcGy9xk9QV7DV1bhge7rdmBfZnwABhi70L3/gu8rq2uMTEem0nRRglpnRDBZ2ROGbOpNf4Byd55M3naX/LPUq5YXNd1Vet3TQS8qPLWfkCgdrneMJq0NK/RtmDnsMUkbgQ+Tr0YvfS8tWzqY1gSAMU72cKCYBg9Tq3TFfbmLlCjbw8dHHfsCLOxg5o4Hs29gqVVHZbcQm1366lH/w1mB7khuimKXhXhh4nX+mnsz5tzlgdX76RkWiJ4QREpWDgu4UtBMZovX1o0VJmoJCIOasBNtCbe08nMy6BVVZ0K8zG5e5zhDlZDU58Jqnw13YkUTUc/TQFcmVwXqtJbHuUf/LKTG2hTiE4zgmzSqdmMAbMdnWbsW13L/s2rJqQj0r3pXdnurRrR+iYY8laOOtBOXJRgMqGUlFGOvm40jTSwqluRw+QxwrrWkRJrodKy0MNVt2CWZ5Jo5eDkqNd0ISEL+7Bn1NSK0WajrAHg4/jpm7kue9QoN3roEdtKNrGY0Pk8tnkNzQJ4KI8ovZFQKuG21mfFMgqmO9yqdDto7shjYxoWp+vKJIwL8mlQQUbGzwADlt/GNdKCH1T882STU61cE84/7IKviqmOzFRcpzZDa23fp1xCT3zdaj9NTGbbUcafMl7FjvcJH/IxfZyI6mN5GpEG8jlrjVO5nhCOgcF/B09C4Z5o8z5XuNtzmvNdmUUqUstoO/dbbFki1wlyEVFZI/OCs76AY87QZHfY5DnpNAVtbFZDD1Q4kMTg+j6FtpVq4kRacXrrZ0yiXs77vGrnkI6Wu6Rw5TlgJ578/lYeOu4646xsjrT2Gkta6nEGUswBahBMEEHrcrhKTOfQ9GaH+0fulqf8XflxwO15ilwsOogjr0yZObe/g1cfBJxUeo5Pf LNaP87sm 4hN8ieCw1tcv4OaiN3fbJEZSy+18nAzymhVnE7vMFLhkYZn4u2mqV7bPVGdedeZmRTC3hWyIhmCQqjo28RLWAgNnuehNv91pol7auyhOt2RedkubHeDG5BQuuuXYjAWslb/m0mkOxhAoaieuR3+sl+4pF6rWMUO8mLh4qtxGm1RHOtIajX/+AL9M7WxkdYSUN7e3qNUmSKXSwnArVHeyqV+fbMYrory6pc244UF8L7Wt4Bk6zAbhDB4DRK9U1NObHwnerTdHK10YGLgHAyqCjFYr6pkMCmj+h0R+Pg3w67DgznzNiDeZqAEUzL8Jxcq79Ru/jQw9wv+lAWZdMgYyuLgPIkROXzEOQ0A9yJUJTz4g+2NXyga9N8L6cCxlJjpzgiMHmE+7ML2uxyWV7rnQlvN2D0/4Sxhq97/fqysoCmCrvjsGue7jqdUfHrBQhgXNy17JWa07btJ3sxlLAIlIs23XqeoeY5Uz3Xzt37t/bW+YyGb4hbeDPmnpdhcYbmRl54N+RMPSzJSKQKYhe8Hqq+Yzi3kAbzkEKdigU21AWBOKZl0M= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 13 Feb 2023, Huang Ying wrote: > From: "Huang, Ying" > > Now, migrate_pages() migrate folios one by one, like the fake code as > follows, > > for each folio > unmap > flush TLB > copy > restore map > > If multiple folios are passed to migrate_pages(), there are > opportunities to batch the TLB flushing and copying. That is, we can > change the code to something as follows, > > for each folio > unmap > for each folio > flush TLB > for each folio > copy > for each folio > restore map > > The total number of TLB flushing IPI can be reduced considerably. And > we may use some hardware accelerator such as DSA to accelerate the > folio copying. > > So in this patch, we refactor the migrate_pages() implementation and > implement the TLB flushing batching. Base on this, hardware > accelerated folio copying can be implemented. > > If too many folios are passed to migrate_pages(), in the naive batched > implementation, we may unmap too many folios at the same time. The > possibility for a task to wait for the migrated folios to be mapped > again increases. So the latency may be hurt. To deal with this > issue, the max number of folios be unmapped in batch is restricted to > no more than HPAGE_PMD_NR in the unit of page. That is, the influence > is at the same level of THP migration. > > We use the following test to measure the performance impact of the > patchset, > > On a 2-socket Intel server, > > - Run pmbench memory accessing benchmark > > - Run `migratepages` to migrate pages of pmbench between node 0 and > node 1 back and forth. > > With the patch, the TLB flushing IPI reduces 99.1% during the test and > the number of pages migrated successfully per second increases 291.7%. > > Xin Hao helped to test the patchset on an ARM64 server with 128 cores, > 2 NUMA nodes. Test results show that the page migration performance > increases up to 78%. > > This patchset is based on mm-unstable 2023-02-10. And back in linux-next this week: I tried next-20230217 overnight. There is a deadlock in this patchset (and in previous versions: sorry it's taken me so long to report), but I think one that's easily solved. I've not bisected to precisely which patch (load can take several hours to hit the deadlock), but it doesn't really matter, and I expect that you can guess. My root and home filesystems are ext4 (4kB blocks with 4kB PAGE_SIZE), and so is the filesystem I'm testing, ext4 on /dev/loop0 on tmpfs. So, plenty of ext4 page cache and buffer_heads. Again and again, the deadlock is seen with buffer_migrate_folio_norefs(), either in kcompactd0 or in khugepaged trying to compact, or in both: it ends up calling __lock_buffer(), and that schedules away, waiting forever to get BH_lock. I have not identified who is holding BH_lock, but I imagine a jbd2 journalling thread, and presume that it wants one of the folio locks which migrate_pages_batch() is already holding; or maybe it's all more convoluted than that. Other tasks then back up waiting on those folio locks held in the batch. Never a problem with buffer_migrate_folio(), always with the "more careful" buffer_migrate_folio_norefs(). And the patch below fixes it for me: I've had enough hours with it now, on enough occasions, to be confident of that. Cc'ing Jan Kara, who knows buffer_migrate_folio_norefs() and jbd2 very well, and I hope can assure us that there is an understandable deadlock here, from holding several random folio locks, then trying to lock buffers. Cc'ing fsdevel, because there's a risk that mm folk think something is safe, when it's not sufficient to cope with the diversity of filesystems. I hope nothing more than the below is needed (and I've had no other problems with the patchset: good job), but cannot be sure. [PATCH next] migrate_pages: fix deadlock on buffer heads When __buffer_migrate_folio() is called from buffer_migrate_folio_norefs(), force MIGRATE_ASYNC mode so that buffer_migrate_lock_buffers() will only trylock_buffer(), failing with -EAGAIN as usual if that does not succeed. Signed-off-by: Hugh Dickins --- next-20230217/mm/migrate.c +++ fixed/mm/migrate.c @@ -748,7 +748,8 @@ static int __buffer_migrate_folio(struct if (folio_ref_count(src) != expected_count) return -EAGAIN; - if (!buffer_migrate_lock_buffers(head, mode)) + if (!buffer_migrate_lock_buffers(head, + check_refs ? MIGRATE_ASYNC : mode)) return -EAGAIN; if (check_refs) {