From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5B68C05027 for ; Thu, 26 Jan 2023 09:32:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7EE386B0075; Thu, 26 Jan 2023 04:32:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 79E596B0078; Thu, 26 Jan 2023 04:32:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 63F798E0002; Thu, 26 Jan 2023 04:32:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5748C6B0075 for ; Thu, 26 Jan 2023 04:32:36 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 2C731C02CB for ; Thu, 26 Jan 2023 09:32:36 +0000 (UTC) X-FDA: 80396435112.11.37CBF33 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf17.hostedemail.com (Postfix) with ESMTP id 305A940008 for ; Thu, 26 Jan 2023 09:32:32 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=GxW8M38K; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=0enVDQ8H; spf=pass (imf17.hostedemail.com: domain of jack@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674725553; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yDHNH2fSqAKUUR+K5oA7z8NizLgJceEfWp/fbYJzxM0=; b=CRYQj7vk8SRrQzl1GMcVJL6npeyxWIC15/xyTThWSubslaLK9+dpTvHMGc9sUsZ5ew8KBx 6AKXK9LPNLVKb1CHF+QEErsUqKP8bZ5MlFcKGFKSIDmDrO1btCPWQGkbfNm6pt3RtpkTNT g2eXO2mEN9g111b+h91KQstDn5PaGFE= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=GxW8M38K; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=0enVDQ8H; spf=pass (imf17.hostedemail.com: domain of jack@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674725553; a=rsa-sha256; cv=none; b=vcjhbVKIh41g5eLA0zZJ7/2psCTO24HP0EpgffxhqDlnuumb5juG7oOZfmsYm41Gpinv8C nfzjghQseP6av/D/mRkqlJNWwd6Ww6yERfblS7aHKP6lACTfogBQRM+sLU0WuBklgwH8He Ul8A3pdND7F2dHg6gWmK6aqioc7U3Vw= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id B5BFE21E5E; Thu, 26 Jan 2023 09:32:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1674725551; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yDHNH2fSqAKUUR+K5oA7z8NizLgJceEfWp/fbYJzxM0=; b=GxW8M38KoSKbC1KGQ9rxri2RBDK6pP+mFQafYBDTijpbOj3kYwr0UEAnM5krWx9nPbyP7S zhwqKcSGAS4jlqmwJxkESUQ+w3LTAA48h9Ls8mLo3N/8cf+Jwb98ZPUCmK2g1TbcK3rX6p GJm0G2RAZMxtvUCYuhlzUEojPjD7x1Q= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1674725551; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yDHNH2fSqAKUUR+K5oA7z8NizLgJceEfWp/fbYJzxM0=; b=0enVDQ8HrHT2lGq2DqcUctb5I0Rl5x1X48CmvA/gaH4OxIevJ8ljfplWGo0I+VbXCmzg1O A3ktuVVGf/vfwGAw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A68B2139B3; Thu, 26 Jan 2023 09:32:31 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 63SdKK9I0mO+WgAAMHmgww (envelope-from ); Thu, 26 Jan 2023 09:32:31 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 3D17CA06B4; Thu, 26 Jan 2023 10:32:31 +0100 (CET) Date: Thu, 26 Jan 2023 10:32:31 +0100 From: Jan Kara To: "Bhatnagar, Rishabh" Cc: Jan Kara , tytso@mit.edu, akpm@linux-foundation.org, linux-mm@kvack.org, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, abuehaze@amazon.com Subject: Re: EXT4 IOPS degradation between 4.14 and 5.10 Message-ID: <20230126093231.ujn6yaxhexwzizp5@quack3> References: <053b60a6-133e-5d59-0732-464d5160772a@amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <053b60a6-133e-5d59-0732-464d5160772a@amazon.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 305A940008 X-Rspam-User: X-Stat-Signature: 1khbdr5y8iijr7mesonrm65wj8crwyse X-HE-Tag: 1674725552-322923 X-HE-Meta: U2FsdGVkX1+Hfc2oz77Lxebc7IytwZBuN/AI3SPAbDGIcCGIN+Y0/QVyzT8irdZKT0JoMau/rqPBGNJGlFmcBtlOemQZICFoNjfibfuZQCbFIa9c7JmFs61k77bRXYQTjOOafA2xah2XtE/symbSSdA53ExsInHYRPpblUHuEsplSlLJXxDUwNzP5brKoS+iUw83bBRfIPx4RPa4RW5GJupgqwLw4nWNca8IGSSkj1ztcimd3ip5oB2n10cjXWsJPXp06YG31Nh1yLVNYmskvfUAHbjtgR00wCjrVbJ7HqZXNX4pJqcDRaUuHxxMl23CqxQFpJlve6CMB+NMgW+HbIOioJZNQaVk7dE21hADPUGBeyTyFiOSuM64eai7y6HbtzUttmxw33oj1ASK/l+4EygD7TdcWakk+Khq7ye43tS3gUB5KF+9Ea4mFnflPbxGI6Ix1HY6I4iatj/b1u1azoHVB/U038Aza7Gk2oTYxvTKLEJ5/BMmF99rUovuPJJJkzb3YZWDWdJs+i0nkIthE31IYt3RWHPp1zRRxcqrnp+LWJGtZHGbRtkbCEuLtJx3Frzw0SmMaQoYQTKX0Z93XAG3B0y8cL4dcWjzS3n62h/SSDljAr3Mr1pCMUjax/W5TgzsmauXf2MfmTDVYWkhw8mN7ynAQ4mkJQcdU8O7kUNTaYHxe+Pk0Eo7+qShNBb3ajeUT5HPipOEt83PQKaq8qyPL/JSWLk8iUSp5rGINajeTsDtefvPkBfNyO6EsCCitApgR6eExc0CU3BqsuR+opVe9lEw12Jh2QB0KGAYldETftJM6fuE8+a/0cnF8mDw+CfQHOjDJr9h6whuM0lqca+SvShSbPtzbEQbVgPzNbOcJK6ZkWowXU83+4Y0gZIAbSiu2W/XvZOPUhR1J4Zcm2vZ+j74oIUEZVDLSOt+0wcxuJMJ5t5LVKAU7WqEzA44wL1NDY4tVvx8zfRJ/ij UOUhZYuE wvRvzaNCxXSPSQOwBOGItJyEQ7ZkIBsfR8JX1KIGaZCYpjwL6Jic0lfzip/q+wcmjX8hvkfhzS4hPl41+jdlgDM8YZoztXNs19v4BOsWWy7wQT8ioMMNQ2E+DI3asFW4BMtl7f4lkZKuWFDmN8nAarZsmr3X53PTuT4o7tL3AfyUngvszUgcCDwrz7qoauMR9hYOio3CyA8aF5m18SHLP20Epfk844vIwxRjcyvtmroAdm+7h4cR/bRPW8uujJQowWJK2kDCKht1GUU5cQmfM4Hw0d5nmA/X/i7KeXeTCz0yBwGbk5D5FFqZCHw61obxXc4D1wy4MpjDyWKM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello! On Wed 25-01-23 16:33:54, Bhatnagar, Rishabh wrote: > As discussed in the previous thread I'm chasing IOPS regression between 4.14 > -> 5.10 kernels. > https://lore.kernel.org/lkml/20230112113820.hjwvieq3ucbwreql@quack3/T/ > > > Last issue we discussed was difficult to resolve so keeping it on the back > burner for now. > > I did some more bisecting and saw another series of patches that potentially > impacts iops score: > 72b045aecdd856b083521f2a963705b4c2e59680 (mm: implement > find_get_pages_range_tag()) > > Running fio tests on tip as 9c19a9cb1642c074aa8bc7693cd4c038643960ae > (including the 16 patch series) vs tip as > 6b4c54e3787bc03e810062bd257a3b05fd9c72d6 (without the above series) shows an > IOPS jump. Ok, thanks for pinpointing this. That series landed a long time ago ;). > Fio with buffered io/fsync=1/randwrite So I'm curious. Do you have any workload that actually does these synchronous random buffered IOs? Or is it just a benchmarking exercise? > With HEAD as 9c19a9cb1642c074aa8bc7693cd4c038643960ae (with the above > series) > > write: io=445360KB, bw=7418.6KB/s, *iops=463*, runt= 60033msec > clat (usec): min=4, max=32132, avg=311.90, stdev=1812.74 > lat (usec): min=5, max=32132, avg=312.28, stdev=1812.74 > clat percentiles (usec): > | 1.00th=[ 8], 5.00th=[ 10], 10.00th=[ 16], 20.00th=[ 25], > | 30.00th=[ 36], 40.00th=[ 47], 50.00th=[ 60], 60.00th=[ 71], > | 70.00th=[ 84], 80.00th=[ 97], 90.00th=[ 111], 95.00th=[ 118], > | 99.00th=[11840], 99.50th=[15936], 99.90th=[21888], 99.95th=[23936], > > With HEAD as 6b4c54e3787bc03e810062bd257a3b05fd9c72d6(without the above > series) > > write: io=455184KB, bw=7583.4KB/s, *iops=473*, runt= 60024msec > clat (usec): min=6, max=24325, avg=319.72, stdev=1694.52 > lat (usec): min=6, max=24326, avg=319.99, stdev=1694.53 > clat percentiles (usec): > | 1.00th=[ 9], 5.00th=[ 11], 10.00th=[ 17], 20.00th=[ 26], > | 30.00th=[ 38], 40.00th=[ 50], 50.00th=[ 60], 60.00th=[ 73], > | 70.00th=[ 85], 80.00th=[ 98], 90.00th=[ 111], 95.00th=[ 118], > | 99.00th=[ 9792], 99.50th=[14016], 99.90th=[21888], 99.95th=[22400], > | 99.99th=[24192] OK, about 2% regression. How stable is that across multiple runs? > I also see that number of handles per transaction were much higher before > this patch series > > 0ms waiting for transaction > 0ms request delay > 20ms running transaction > 0ms transaction was being locked > 0ms flushing data (in ordered mode) > 10ms logging transaction > *13524us average transaction commit time* > *73 handles per transaction* > 0 blocks per transaction > 1 logged blocks per transaction > > vs after the patch series. > > 0ms waiting for transaction > 0ms request delay > 20ms running transaction > 0ms transaction was being locked > 0ms flushing data (in ordered mode) > 20ms logging transaction > *21468us average transaction commit time* > *66 handles per transaction* > 1 blocks per transaction > 1 logged blocks per transaction > > This is probably again helping in bunching the writeback transactions and > increasing throughput. Yeah, probably. > I looked at the code to understand what might be going on. > It seems like commit 72b045aecdd856b083521f2a963705b4c2e59680 changes the > behavior of find_get_pages_range_tag. > Before this commit if find_get_pages_tag cannot find nr_pages (PAGEVEC_SIZE) > it returns the number of pages found as ret and > sets the *index to the last page it found + 1. After the commit the behavior > changes such that if we don’t find nr_pages pages > we set the index to end and not to the last found page. (added diff from > above commit) > Since pagevec_lookup_range_tag is always called in a while loop (index <= > end) the code before the commit helps in coalescing > writeback of pages if there are multiple threads doing write as it might > keep finding new dirty (tagged) pages since it doesn’t set index to end. > > + /* > + * We come here when we got at @end. We take care to not overflow the > + * index @index as it confuses some of the callers. This breaks the > + * iteration when there is page at index -1 but that is already broken > + * anyway. > + */ > + if (end == (pgoff_t)-1) > + *index = (pgoff_t)-1; > + else > + *index = end + 1; > +out: > rcu_read_unlock(); > > - if (ret) > - *index = pages[ret - 1]->index + 1; > - > > From the description of the patch i didn't see any mention of this > functional change. > Was this change intentional and did help some usecase or general performance > improvement? So the change was intentional. When I was working on the series, I was somewhat concerned that the old code could end up in a pathological situation like: We scan range 0-1000000, find the only dirty page at index 0, return it. We scan range 1-1000000, find the only dirty page at index 1, return it. ... This way we end up with rather inefficient scanning and in theory malicious user could livelock writeback like this. That being said this was/is mostly a theoretical concern. Honza -- Jan Kara SUSE Labs, CR