From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74DB2C27C4F for ; Fri, 31 May 2024 18:04:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A99396B00A0; Fri, 31 May 2024 14:04:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A49F26B00A1; Fri, 31 May 2024 14:04:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 910FF6B00A2; Fri, 31 May 2024 14:04:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 722206B00A0 for ; Fri, 31 May 2024 14:04:39 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 14C78813C3 for ; Fri, 31 May 2024 18:04:39 +0000 (UTC) X-FDA: 82179466278.28.46EBAF3 Received: from mail-yb1-f178.google.com (mail-yb1-f178.google.com [209.85.219.178]) by imf09.hostedemail.com (Postfix) with ESMTP id 33C7E14000F for ; Fri, 31 May 2024 18:04:37 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IntoEZe0; spf=pass (imf09.hostedemail.com: domain of lkml.byungchul.park@gmail.com designates 209.85.219.178 as permitted sender) smtp.mailfrom=lkml.byungchul.park@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717178677; a=rsa-sha256; cv=none; b=7taxFbXq2nr9Qg+pB9A6jWOKBsaUUv6mGfdiLYVqxeRD/f4s4dXHuXWIplPNBdlvQWDU+b vRh8NuqjIqMYYutCuSM782/9ekimhNweOZga4UgEV1+2S5oIzw5noh5U8RHIoHHYEYSvW7 ZKwk4zZZhiyJ3UU6sA/Zp88PHvjj0Ig= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IntoEZe0; spf=pass (imf09.hostedemail.com: domain of lkml.byungchul.park@gmail.com designates 209.85.219.178 as permitted sender) smtp.mailfrom=lkml.byungchul.park@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717178677; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HfTfA1pAaQTMxSVE2Twz55nEqsHzsr+JodUe0ZaWjik=; b=MWa0plQAw4//UdlwXHRoZTaV40gfb86jual20sTs/BSoDqysbGopdS8WWo2F6uHnA/u2zg fsG0aJ6tEG9egLlfj/874jRNhICSdcyKL0+I+LVJMH+Ki1DXiMAKIep83LiSQ4ayCg0JRD f+baXt49ytVmK1PhUel9bNlvxLrHihs= Received: by mail-yb1-f178.google.com with SMTP id 3f1490d57ef6-dfa6e0add60so1423462276.3 for ; Fri, 31 May 2024 11:04:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717178676; x=1717783476; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=HfTfA1pAaQTMxSVE2Twz55nEqsHzsr+JodUe0ZaWjik=; b=IntoEZe0G42/HZzDFj5IGR0jupHx9ncX9g6xDgjW8kFmaTHo3EeRLXVApAFlbkATED JeJKvJfsSOu7YK7AntpepvlqvnN1prHq+m72hKXSEn2Vfe4KsGVYif9xqAVDWdeVqNRx o4LD+UsgTPgkSInFnpdOqRfSUXzIyXKdvP78C1Z4s1k37MNVStTLU/EUTwldxN1qYXQs 2soUP/gO7KqQORHJP8slA/+YaVYMJZM6v0We/IvlTO+OIqm4/FH0t2qQelrysdtXWQiL w5D1CcGgeV6sB6cC9TiovhmFpKToUJqeg5ujKBBzeZ3fnJOXVITbx32I6sIMpqE/D2rR SatQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717178676; x=1717783476; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=HfTfA1pAaQTMxSVE2Twz55nEqsHzsr+JodUe0ZaWjik=; b=aMnG7N59rTh1PRN9tsWFTPy/+K1YSd40+2o3NQUsnVD3QuRVU6ZKviATX4DghJRuBX L/asl0wlbuYMCuI07K7wncboTVWRteqqJEC4804D3JFMXAs8ejqGOX1Oi9mtALyzSKCg DDxcPtPSN/4jgBAZPLRRXz74MEtXwl6MMN/ipIRjQyn6gbjWQKho81WZFZ2JkgKnuawQ wmEr2kdCoQWACQDlMlFg+GU89r0wWEMxP2KVQtlqqxfB6MJMEbpAdo5fX3TZytwCeUlN KmGQ5WDEMIhKNetfV+XyU9omSM25wPfeiI0ghn4pzO8g9aNsRN0Vafj4FH6o4G91DYa8 66Og== X-Forwarded-Encrypted: i=1; AJvYcCV9USjsM7Z3FvbZWuv2qbMEvhXkF1LCRQ7BFee50HdBi9rjuP2O283mdHopPRyczkJ6jIfk5FqQIgF/up6fac8eHj4= X-Gm-Message-State: AOJu0YxceggMAzs/udejqMRAATS3QyaTACnZVt7Jw3J9mD1PG34strmE njYJz8aTrlmSo1m5LCy0UIeG0/bJ01JMnGqe7mU0TyaCiHMKLI+7RadT2lPPQJelsXKFSnA0oT4 a33XpW8Q+C2tbXO7oJ4ckzohw/fQ= X-Google-Smtp-Source: AGHT+IHZ7DzbFO/XPeTe08EYDWHAjg5vZfkwMpAq2QcBpNCFaEg5A5YyrZzevmndi7U2DXiwqUlyRPv+8X4RDj+5MhI= X-Received: by 2002:a25:c58b:0:b0:de5:5bca:ecb0 with SMTP id 3f1490d57ef6-dfa72ec2cf9mr2624477276.0.1717178675951; Fri, 31 May 2024 11:04:35 -0700 (PDT) MIME-Version: 1.0 References: <20240531092001.30428-1-byungchul@sk.com> <20240531092001.30428-10-byungchul@sk.com> In-Reply-To: From: Byungchul Park Date: Sat, 1 Jun 2024 03:04:24 +0900 Message-ID: Subject: Re: [PATCH v11 09/12] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped To: Dave Hansen Cc: Byungchul Park , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 33C7E14000F X-Stat-Signature: df7ex3nhp1iue5fa6tn97oqtwtf9ybqp X-HE-Tag: 1717178677-554667 X-HE-Meta: U2FsdGVkX1/nJgybAQOpQ1s8FhkeoDYGXUMgEf0GWF2473T9xsI56k43AmJlySwSYliEgPjF2uHj7BmrkwfpfBjOwBsRSiN4k0HbjZ2dQZZ4+8bh6wkB5eATpgn2ukqRg2LDt5PDj8hrmiDFrobKm/pgM4HvT/rpLMcKUgEgfF7gvYLn88jXud0LsVvBavJyNUygrBlij2YLO8Mqp+taVvNs2IQqmJjHTTHR9U1zlsfqFdNaa5Q+GIepSLJVROBcefcZxDPE8v2+UOPPv6PQB2xa9/PZ+2LX6nD3ENCxvO9hbRtH4jGrh6C/aBTTFAayUCUsAMqfdmQBnWfprwPK1ThfYigrTzH/E6AnwUIe6sAxSdDtAwRE0bDnhSTSpVKB+pqihB1dfYMXGOuyDuvo9u1wTW/O5r43a5zoAgYyTaq4dGJLxy+ZVltg1yTCIZow39SNp4OHUGScdYhApvMifPodlSDc+cR4mPfdSrbyAIcI2oHlAvAgtah3IXGGnvH2tYZpHEwCaogN8ATv7t59wWU7h5n/QtwbBsPJkctjUHh9uCyuDwmg5tJ/enDWat2OL3ZZsx1ysdYcg8B/CFu/qVZmvAND65JBsR5CPkAJ9RHSM4g2t17nkAOma5AsV3jJdPw2qsD9JsjU6FR5q61Q1/Wbtbg+02wkfZYJcPGdF2lu/O+ZeLjoKlt2Go/Kq+eLOIpfM5MlEFvdkCGaU7pD+FpZzxMUAKTeCQo1WmNFtP6IqNlQq8xJCHwc2YBDp1AIFsS0mLGP76KArjDVKZUmCRublkXL0388SAh74XVsuuH2HGl7UeDFnayc2m6aPBtK5/GwE1idBJWVmvAAqLUQSsnlWLKY0DcAH3YQvg4+cOdVOhUnlsSeJhmpLWCTkX8cjcuzQTFNeMoLx14l4KssN19TJr5+XG8Vo1T5T9qqpOH8K2QTSKgvIS0huXi+/6nqaXS7kdjyJwv/YUqybKw EpPfMlI2 OoKwvCCn1o1NQZYoOCiY4ibS01sV7Ep2jZvYn7Ki1EjYfs7FU7573AmwCsotUWcuL3orFZdpCe9+x/IMuBnYUcp8gNDqalw44Q92zUHGGjZrP0lyAWkZlRvpGEO8v1BodoRu24JeGKso1ES5p+8hfsom6cvZlC3aH2rV70v5pv/to46d8Gifw4sNjjHIVCCjYaZY1/+sYHMmOFjYuqq+P4onlsMHkVfv2BXqdQH8ks5S8g15S6KGvZMG4jiflm2L8a6R9PtI33DgYBzTuQX/7JmCnEC7rGehCJpsfUrMDdtCY8yHKTq2vd20cOHbCNqEu4fjq9WX34k2jKAIXL139zQ7inzl+L0FmzzhGGh/0OGql5Tb0K/vdpvA5ODoOby23AuLjF6zvoPioUt2lg76xSwKzw8oo3q7UMcakXy0slCxt/j2w5FfyBWFRLA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Dave Hansen wrote: > > On 5/31/24 02:19, Byungchul Park wrote: > .. > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index 0283cf366c2a..03683bf66031 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -2872,6 +2872,12 @@ static inline void file_end_write(struct file *file) > > if (!S_ISREG(file_inode(file)->i_mode)) > > return; > > sb_end_write(file_inode(file)->i_sb); > > + > > + /* > > + * XXX: If needed, can be optimized by avoiding luf_flush() if > > + * the address space of the file has never been involved by luf. > > + */ > > + luf_flush(); > > } > .. > > +void luf_flush(void) > > +{ > > + unsigned long flags; > > + unsigned short int ugen; > > + > > + /* > > + * Obtain the latest ugen number. > > + */ > > + spin_lock_irqsave(&luf_lock, flags); > > + ugen = luf_gen; > > + spin_unlock_irqrestore(&luf_lock, flags); > > + > > + check_luf_flush(ugen); > > +} > > Am I reading this right? There's now an unconditional global spinlock It looked *too much* to split the lock to several locks as rcu does until version 11. However, this code introduced in v11 looks problematic. > acquired in the sys_write() path? How can this possibly scale? I should find a better way. > So, yeah, I think an optimization is absolutely needed. But, on a more > fundamental level, I just don't believe these patches are being tested. > Even a simple microbenchmark should show a pretty nasty regression on > any decently large system: > > > https://github.com/antonblanchard/will-it-scale/blob/master/tests/write1.c > > Second, I was just pointing out sys_write() as an example of how the > page cache could change. Couldn't a separate, read/write mmap() of the > file do the same thing and *not* go through sb_end_write()? > > So: > > fd = open("foo"); > ptr1 = mmap(fd, PROT_READ); > ptr2 = mmap(fd, PROT_READ|PROT_WRITE); > > foo = *ptr1; // populate the page cache > ... page cache page is reclaimed and LUF'd > *ptr2 = bar; // new page cache page is allocated and written to I think this part would work but I'm not convinced. I will check again. > printk("*ptr1: %d\n", *ptr1); > > Doesn't the printk() see stale data? > > I think tglx would call all of this "tinkering". The approach to this > series is to "fix" narrow, specific cases that reviewers point out, make > it compile, then send it out again, hoping someone will apply it. Sorry for not perfect work and bothering you but you know what? I can see what is happening in this community too. Of course, I bet you would post better quality mm patches from the 1st version than me but might not in other subsystems. > So, for me, until the approach to this series changes: NAK, for x86. I understand why you got mad and feel sorry but I couldn't expect the regression you mentioned above. And I admit the patches have had problems I couldn't find in advance until you, Hildenbrand and Ying. I will do better. > Andrew, please don't take this series. Or, if you do, please drop the > patch enabling it on x86. I don't want to ask to merge either, if there are still issues. > I also have the feeling our VFS friends won't take kindly to having That is also what I thought it was. What should I do then? I don't believe you do not agree with the concept itself. Thing is the current version is not good enough. I will do my best by doing what I can do. > random luf_foo() hooks in their hot paths, optimized or not. I don't > see any of them on cc. Yes. I should've cc'd them. I will. Byungchul