From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DF8AC4361B for ; Wed, 16 Dec 2020 18:47:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EF3BC233FE for ; Wed, 16 Dec 2020 18:47:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EF3BC233FE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5D12B8D000B; Wed, 16 Dec 2020 13:47:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B22F8D0002; Wed, 16 Dec 2020 13:47:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 422878D000B; Wed, 16 Dec 2020 13:47:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0179.hostedemail.com [216.40.44.179]) by kanga.kvack.org (Postfix) with ESMTP id 2D0888D0002 for ; Wed, 16 Dec 2020 13:47:11 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id F25D91EE6 for ; Wed, 16 Dec 2020 18:47:10 +0000 (UTC) X-FDA: 77600027820.07.tree47_3c15a672742e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id D45081803FD61 for ; Wed, 16 Dec 2020 18:47:10 +0000 (UTC) X-HE-Tag: tree47_3c15a672742e X-Filterd-Recvd-Size: 8371 Received: from aserp2130.oracle.com (aserp2130.oracle.com [141.146.126.79]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Wed, 16 Dec 2020 18:47:10 +0000 (UTC) Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0BGIduP7097052; Wed, 16 Dec 2020 18:47:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : references : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=Xr8yAw8/V8mkqXz0+iWuL/b/ysvEUbubY0zlp+7SudM=; b=FTpbDV5tlDmycn3h+k5CoK4fcGRpwnQnNeP7DsFp6KIARJDiuZXyYeoFmTh7XXe5AiJf TGgMHzw2g7ZMbMKhnTSS1vChoVjBO4P//dp4BwCMOL0auiRIzE1TpDs6gDE1QxFbrSVr US6sj9d96e4hZg/FCRnqdssHFXjNq27K0jpAL+v/CaDq1KeFBAZY5oanMKSwY33b4SGE vXPfJftZW/D65+kN6bj7k+KLl+CHGYogQYVogq1lFQWd1U3ucAvkEh/4419UD+o8qqIQ bQn+VIuDXKJqya8ED1X6lf0gVno38QdCXJdFWYmzLLJO5Ur7KZQSSWZDeLsswLIlxYN8 pQ== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by aserp2130.oracle.com with ESMTP id 35ckcbj2nr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 16 Dec 2020 18:47:08 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0BGIfVGv192259; Wed, 16 Dec 2020 18:47:08 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3030.oracle.com with ESMTP id 35d7epw130-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 16 Dec 2020 18:47:08 +0000 Received: from abhmp0004.oracle.com (abhmp0004.oracle.com [141.146.116.10]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0BGIl7gU000469; Wed, 16 Dec 2020 18:47:07 GMT Received: from dhcp-10-159-155-197.vpn.oracle.com (/10.159.155.197) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 16 Dec 2020 10:47:06 -0800 Subject: Re: [PATCH RFC 0/8] dcache: increase poison resistance From: Junxiao Bi To: Konstantin Khlebnikov Cc: Konstantin Khlebnikov , Linux Kernel Mailing List , linux-fsdevel , linux-mm@kvack.org, Alexander Viro , Waiman Long , Gautham Ananthakrishna , matthew.wilcox@oracle.com References: <158893941613.200862.4094521350329937435.stgit@buzz> <97ece625-2799-7ae6-28b5-73c52c7c497b@oracle.com> <04b4d5cf-780d-83a9-2b2b-80ae6029ae2c@oracle.com> Message-ID: <4bcbd2e7-b5e3-6f45-51cf-8658f9c9009d@oracle.com> Date: Wed, 16 Dec 2020 10:46:46 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0) Gecko/20100101 Thunderbird/78.5.1 MIME-Version: 1.0 In-Reply-To: <04b4d5cf-780d-83a9-2b2b-80ae6029ae2c@oracle.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9837 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 spamscore=0 bulkscore=0 suspectscore=0 adultscore=0 mlxscore=0 mlxlogscore=999 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012160117 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9837 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxlogscore=999 priorityscore=1501 mlxscore=0 suspectscore=0 adultscore=0 phishscore=0 malwarescore=0 impostorscore=0 lowpriorityscore=0 clxscore=1015 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012160117 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Konstantin, How would you like to proceed with this patch set? This patchset as it is already fixed the customer issue we faced, it=20 will stop memory fragmentation causing by negative dentry and no=20 performance regression through our test. In production workload, it is=20 common that some app kept creating and removing tmp files, this will=20 leave a lot of negative dentry over time, some time later, it will cause=20 memory fragmentation and system run into memory compaction and not=20 responsible. It will be good to push it to upstream merge. If you are=20 busy, we can try push it again. Thanks, Junxiao. On 12/14/20 3:10 PM, Junxiao Bi wrote: > On 12/13/20 11:43 PM, Konstantin Khlebnikov wrote: > >> >> >> On Sun, Dec 13, 2020 at 9:52 PM Junxiao Bi > > wrote: >> >> =C2=A0=C2=A0=C2=A0 On 12/11/20 11:32 PM, Konstantin Khlebnikov wrote: >> >> =C2=A0=C2=A0=C2=A0 > On Thu, Dec 10, 2020 at 2:01 AM Junxiao Bi >> =C2=A0=C2=A0=C2=A0 >> =C2=A0=C2=A0=C2=A0 > >> >> =C2=A0=C2=A0=C2=A0 wrote: >> =C2=A0=C2=A0=C2=A0 > >> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0Hi Konstantin, >> =C2=A0=C2=A0=C2=A0 > >> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0We tested this patch set recen= tly and found it limiting=20 >> negative >> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0dentry >> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0to a small part of total memor= y. And also we don't see any >> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0performance >> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0regression on it. Do you have = any plan to integrate it into >> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0mainline? It >> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0will help a lot on memory frag= mentation issue causing by >> =C2=A0=C2=A0=C2=A0 dentry slab, >> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0there were a lot of customer c= ases where sys% was very high >> =C2=A0=C2=A0=C2=A0 since >> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0most >> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0cpu were doing memory compacti= on, dentry slab was taking too >> =C2=A0=C2=A0=C2=A0 much >> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0memory >> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0and nearly all dentry there we= re negative. >> =C2=A0=C2=A0=C2=A0 > >> =C2=A0=C2=A0=C2=A0 > >> =C2=A0=C2=A0=C2=A0 > Right now I don't have any plans for this. I susp= ect such >> =C2=A0=C2=A0=C2=A0 problems will >> =C2=A0=C2=A0=C2=A0 > appear much more often since machines are getting= bigger. >> =C2=A0=C2=A0=C2=A0 > So, somebody will take care of it. >> =C2=A0=C2=A0=C2=A0 We already had a lot of customer cases. It made no = sense to leave so >> =C2=A0=C2=A0=C2=A0 many negative dentry in the system, it caused memor= y fragmentation >> =C2=A0=C2=A0=C2=A0 and >> =C2=A0=C2=A0=C2=A0 not much benefit. >> >> >> Dcache could grow so big only if the system lacks of memory pressure. >> >> Simplest solution is a cronjob=C2=A0which provinces=C2=A0such pressure= by >> creating sparse file on disk-based fs and then reading it. >> This should wash away all inactive caches with no IO and zero chance=20 >> of oom. > Sound good, will try. >> >> =C2=A0=C2=A0=C2=A0 > >> =C2=A0=C2=A0=C2=A0 > First part which collects negative dentries at th= e end list of >> =C2=A0=C2=A0=C2=A0 > siblings could be >> =C2=A0=C2=A0=C2=A0 > done in a more obvious way by splitting the list = in two. >> =C2=A0=C2=A0=C2=A0 > But this touches much more code. >> =C2=A0=C2=A0=C2=A0 That would add new field to dentry? >> >> >> Yep. Decision=C2=A0is up to maintainers. >> >> =C2=A0=C2=A0=C2=A0 > >> =C2=A0=C2=A0=C2=A0 > Last patch isn't very rigid but does non-trivial = changes. >> =C2=A0=C2=A0=C2=A0 > Probably it's better to call some garbage collect= or thingy >> =C2=A0=C2=A0=C2=A0 periodically. >> =C2=A0=C2=A0=C2=A0 > Lru list needs pressure to age and reorder entrie= s properly. >> >> =C2=A0=C2=A0=C2=A0 Swap the negative dentry to the head of hash list w= hen it get >> =C2=A0=C2=A0=C2=A0 accessed? >> =C2=A0=C2=A0=C2=A0 Extra ones can be easily trimmed when swapping, usi= ng GC is to=20 >> reduce >> =C2=A0=C2=A0=C2=A0 perf impact? >> >> >> Reclaimer/shrinker scans denties=C2=A0in LRU lists, it's an another li= st. > > Ah, you mean GC to reclaim from LRU list. I am not sure it could catch=20 > up the speed of negative dentry generating. > > Thanks, > > Junxiao. > >> My patch used order in hash lists is a very unusual way. Don't be=20 >> confused. >> >> There are four lists >> parent - siblings >> hashtable - hashchain >> LRU >> inode - alias >> >> >> =C2=A0=C2=A0=C2=A0 Thanks, >> >> =C2=A0=C2=A0=C2=A0 Junxioao. >> >> =C2=A0=C2=A0=C2=A0 > >> =C2=A0=C2=A0=C2=A0 > Gc could be off by default or thresholds set very= high (50% of >> =C2=A0=C2=A0=C2=A0 ram for >> =C2=A0=C2=A0=C2=A0 > example). >> =C2=A0=C2=A0=C2=A0 > Final setup could be left up to owners of large s= ystems, which >> =C2=A0=C2=A0=C2=A0 needs >> =C2=A0=C2=A0=C2=A0 > fine tuning. >>