From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62A4AC432BE for ; Thu, 26 Aug 2021 10:41:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 520546109F for ; Thu, 26 Aug 2021 10:41:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 520546109F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=molgen.mpg.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id B0DBC8D0002; Thu, 26 Aug 2021 06:41:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A96788D0001; Thu, 26 Aug 2021 06:41:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9372E8D0002; Thu, 26 Aug 2021 06:41:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0223.hostedemail.com [216.40.44.223]) by kanga.kvack.org (Postfix) with ESMTP id 73C9E8D0001 for ; Thu, 26 Aug 2021 06:41:29 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 08369267DF for ; Thu, 26 Aug 2021 10:41:29 +0000 (UTC) X-FDA: 78516890298.16.BBB8178 Received: from mx1.molgen.mpg.de (mx3.molgen.mpg.de [141.14.17.11]) by imf04.hostedemail.com (Postfix) with ESMTP id 3CB0650000B2 for ; Thu, 26 Aug 2021 10:41:28 +0000 (UTC) Received: from [192.168.0.3] (ip5f5aeb42.dynamic.kabel-deutschland.de [95.90.235.66]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: pmenzel) by mx.molgen.mpg.de (Postfix) with ESMTPSA id 3790B61E30B9C; Thu, 26 Aug 2021 12:41:26 +0200 (CEST) From: Paul Menzel Subject: Minimum inode cache size? (was: Slow file operations on file server with 30 TB hardware RAID and 100 TB software RAID) To: LKML Cc: it+linux-xfs@molgen.mpg.de, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org References: <3e380495-5f85-3226-f0cf-4452e2b77ccb@molgen.mpg.de> Message-ID: <58e701f4-6af1-d47a-7b3e-5cadf9e27296@molgen.mpg.de> Date: Thu, 26 Aug 2021 12:41:25 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <3e380495-5f85-3226-f0cf-4452e2b77ccb@molgen.mpg.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Authentication-Results: imf04.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf04.hostedemail.com: domain of pmenzel@molgen.mpg.de designates 141.14.17.11 as permitted sender) smtp.mailfrom=pmenzel@molgen.mpg.de X-Stat-Signature: yeo58mdm9t3s7j5esggaqmrrdxa4w75k X-Rspamd-Queue-Id: 3CB0650000B2 X-Rspamd-Server: rspam04 X-HE-Tag: 1629974488-244679 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Dear Linux folks, Am 20.08.21 um 16:39 schrieb Paul Menzel: > Am 20.08.21 um 16:31 schrieb Paul Menzel: >=20 >> Short problem statement: Sometimes changing into a directory on a file= =20 >> server wit 30 TB hardware RAID and 100 TB software RAID both formatted= =20 >> with XFS takes several seconds. >> >> >> On a Dell PowerEdge T630 with two Xeon CPU E5-2603 v4 @ 1.70GHz and 96= =20 >> GB RAM a 30 TB hardware RAID is served by the hardware RAID controller= =20 >> and a 100 TB MDRAID software RAID connected to a Microchip 1100-8e=20 >> both formatted using XFS. Currently, Linux 5.4.39 runs on it. >> >> ``` >> $ more /proc/version >> Linux version 5.4.39.mx64.334 (root@lol.molgen.mpg.de) (gcc version 7.= 5.0 (GCC)) #1 SMP Thu May 7 14:27:50 CEST 2020 >> $ dmesg | grep megar >> [=C2=A0=C2=A0 10.322823] megaraid cmm: 2.20.2.7 (Release Date: Sun Jul= 16 00:01:03 EST 2006) >> [=C2=A0=C2=A0 10.331910] megaraid: 2.20.5.1 (Release Date: Thu Nov 16 = 15:32:35 EST 2006) >> [=C2=A0=C2=A0 10.345055] megaraid_sas 0000:03:00.0: BAR:0x1=C2=A0 BAR'= s base_addr(phys):0x0000000092100000=C2=A0 mapped virt_addr:0x0000000059e= a5995 >> [=C2=A0=C2=A0 10.345057] megaraid_sas 0000:03:00.0: FW now in Ready st= ate >> [=C2=A0=C2=A0 10.351868] megaraid_sas 0000:03:00.0: 63 bit DMA mask an= d 32 bit consistent mask >> [=C2=A0=C2=A0 10.361655] megaraid_sas 0000:03:00.0: firmware supports = msix=C2=A0=C2=A0=C2=A0 : (96) >> [=C2=A0=C2=A0 10.369433] megaraid_sas 0000:03:00.0: requested/availabl= e msix 13/13 >> [=C2=A0=C2=A0 10.377113] megaraid_sas 0000:03:00.0: current msix/onlin= e cpus=C2=A0=C2=A0=C2=A0 : (13/12) >> [=C2=A0=C2=A0 10.385190] megaraid_sas 0000:03:00.0: RDPQ mode=C2=A0=C2= =A0=C2=A0 : (disabled) >> [=C2=A0=C2=A0 10.392092] megaraid_sas 0000:03:00.0: Current firmware s= upports maximum commands: 928=C2=A0=C2=A0=C2=A0=C2=A0 LDIO threshold: 0 >> [=C2=A0=C2=A0 10.403895] megaraid_sas 0000:03:00.0: Configured max fir= mware commands: 927 >> [=C2=A0=C2=A0 10.416840] megaraid_sas 0000:03:00.0: Performance mode := Latency >> [=C2=A0=C2=A0 10.424029] megaraid_sas 0000:03:00.0: FW supports sync c= ache=C2=A0=C2=A0=C2=A0 : No >> [=C2=A0=C2=A0 10.431417] megaraid_sas 0000:03:00.0: megasas_disable_in= tr_fusion is called outbound_intr_mask:0x40000009 >> [=C2=A0=C2=A0 10.486158] megaraid_sas 0000:03:00.0: FW provided suppor= tMaxExtLDs: 1=C2=A0=C2=A0=C2=A0 max_lds: 64 >> [=C2=A0=C2=A0 10.495502] megaraid_sas 0000:03:00.0: controller type=C2= =A0=C2=A0=C2=A0 : MR(2048MB) >> [=C2=A0=C2=A0 10.502988] megaraid_sas 0000:03:00.0: Online Controller = Reset(OCR)=C2=A0=C2=A0=C2=A0 : Enabled >> [=C2=A0=C2=A0 10.511445] megaraid_sas 0000:03:00.0: Secure JBOD suppor= t=C2=A0=C2=A0=C2=A0 : No >> [=C2=A0=C2=A0 10.518543] megaraid_sas 0000:03:00.0: NVMe passthru supp= ort=C2=A0=C2=A0=C2=A0 : No >> [=C2=A0=C2=A0 10.525834] megaraid_sas 0000:03:00.0: FW provided TM Tas= kAbort/Reset timeout: 0 secs/0 secs >> [=C2=A0=C2=A0 10.536251] megaraid_sas 0000:03:00.0: JBOD sequence map = support=C2=A0=C2=A0=C2=A0 : No >> [=C2=A0=C2=A0 10.543931] megaraid_sas 0000:03:00.0: PCI Lane Margining= support=C2=A0=C2=A0=C2=A0 : No >> [=C2=A0=C2=A0 10.574406] megaraid_sas 0000:03:00.0: megasas_enable_int= r_fusion is called outbound_intr_mask:0x40000000 >> [=C2=A0=C2=A0 10.585995] megaraid_sas 0000:03:00.0: INIT adapter done >> [=C2=A0=C2=A0 10.592409] megaraid_sas 0000:03:00.0: JBOD sequence map = is disabled megasas_setup_jbod_map 5660 >> [=C2=A0=C2=A0 10.603273] megaraid_sas 0000:03:00.0: pci id=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 : (0x1000)/(0x005d)/(0x1028)/(0x1f42) >> [=C2=A0=C2=A0 10.612815] megaraid_sas 0000:03:00.0: unevenspan support= =C2=A0=C2=A0=C2=A0 : yes >> [=C2=A0=C2=A0 10.619919] megaraid_sas 0000:03:00.0: firmware crash dum= p=C2=A0=C2=A0=C2=A0 : no >> [=C2=A0=C2=A0 10.627013] megaraid_sas 0000:03:00.0: JBOD sequence map=C2= =A0=C2=A0=C2=A0 : disabled >> $ dmesg | grep 1100-8e >> [=C2=A0=C2=A0 25.853170] smartpqi 0000:84:00.0: added 11:2:0:0 0000000= 000000000 RAID=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 Adaptec=C2=A0 1100-8e >> [=C2=A0=C2=A0 25.867069] scsi 11:2:0:0: RAID=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Adaptec=C2=A0 1100-8e= =C2=A02.93 PQ: 0 ANSI: 5 >> $ xfs_info /dev/sdc >> meta-data=3D/dev/sdc=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 isize=3D512=C2=A0=C2=A0=C2=A0 agcount=3D= 28, agsize=3D268435455 blks >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sectsz=3D512=C2=A0=C2=A0 at= tr=3D2, projid32bit=3D1 >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 crc=3D1=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 finobt=3D1, sparse=3D0, rmapbt=3D0 >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 reflink=3D0 >> data=C2=A0=C2=A0=C2=A0=C2=A0 =3D=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 bsize=3D4096=C2=A0=C2=A0 blocks=3D7323648000, imaxpct=3D= 5 >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sunit=3D0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 swidth=3D0 blks >> naming=C2=A0=C2=A0 =3Dversion 2=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bsize=3D4096=C2=A0=C2=A0 ascii-ci= =3D0, ftype=3D1 >> log=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3Dinternal log=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bsize=3D4096=C2=A0=C2=A0 blocks=3D= 521728, version=3D2 >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sectsz=3D512=C2=A0=C2=A0 su= nit=3D0 blks, lazy-count=3D1 >> realtime =3Dnone=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 extsz=3D4096=C2=A0= =C2=A0 blocks=3D0, rtextents=3D0 >> $ xfs_info /dev/md0 >> meta-data=3D/dev/md0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 isize=3D512=C2=A0=C2=A0=C2=A0 agcount=3D= 102, agsize=3D268435328 blks >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sectsz=3D4096=C2=A0 attr=3D= 2, projid32bit=3D1 >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 crc=3D1=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 finobt=3D1, sparse=3D0, rmapbt=3D0 >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 reflink=3D0 >> data=C2=A0=C2=A0=C2=A0=C2=A0 =3D=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 bsize=3D4096=C2=A0=C2=A0 blocks=3D27348633088, imaxpct= =3D1 >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sunit=3D128=C2=A0=C2=A0=C2=A0= swidth=3D1792 blks >> naming=C2=A0=C2=A0 =3Dversion 2=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bsize=3D4096=C2=A0=C2=A0 ascii-ci= =3D0, ftype=3D1 >> log=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3Dinternal log=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bsize=3D4096=C2=A0=C2=A0 blocks=3D= 521728, version=3D2 >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sectsz=3D4096=C2=A0 sunit=3D= 1 blks, lazy-count=3D1 >> realtime =3Dnone=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 extsz=3D4096=C2=A0= =C2=A0 blocks=3D0, rtextents=3D0 >> $ df -i /dev/sdc >> Filesystem=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Inodes=C2=A0= =C2=A0 IUsed=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 IFree IUse% Mounted on >> /dev/sdc=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2929459200 4985849 292447= 3351=C2=A0=C2=A0=C2=A0 1% /home/pmenzel >> $ df -i /dev/md0 >> Filesystem=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Inodes=C2=A0= =C2=A0 IUsed=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 IFree IUse% Mounted on >> /dev/md0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2187890624 5331603 218255= 9021=C2=A0=C2=A0=C2=A0 1% /jbod/M8015 >> ``` >> >> After not using a directory for a while (over 24 hours), changing into= =20 >> it (locally) takes over five seconds or doing some git operations. For= =20 >> example the Linux kernel source git tree located in my home directory.= =20 >> (My shell has some git integration showing the branch name in the=20 >> prompt (`/usr/share/git-contrib/completion/git-prompt.sh`.) Once in=20 >> that directory, everything reacts instantly again. When waiting the=20 >> Linux pressure stall information (PSI) shows IO resource contention. >> >> Before: >> >> =C2=A0=C2=A0=C2=A0=C2=A0 $ grep -R . /proc/pressure/ >> =C2=A0=C2=A0=C2=A0=C2=A0 /proc/pressure/io:some avg10=3D0.40 avg60=3D0= .10 avg300=3D0.10 total=3D48330841502 >> =C2=A0=C2=A0=C2=A0=C2=A0 /proc/pressure/io:full avg10=3D0.40 avg60=3D0= .10 avg300=3D0.10 total=3D48067233340 >> =C2=A0=C2=A0=C2=A0=C2=A0 /proc/pressure/cpu:some avg10=3D0.00 avg60=3D= 0.00 avg300=3D0.00 total=3D755842910 >> =C2=A0=C2=A0=C2=A0=C2=A0 /proc/pressure/memory:some avg10=3D0.00 avg60= =3D0.00 avg300=3D0.00 total=3D2530206336 >> =C2=A0=C2=A0=C2=A0=C2=A0 /proc/pressure/memory:full avg10=3D0.00 avg60= =3D0.00 avg300=3D0.00 total=3D2318140732 >> >> During `git log stable/linux-5.10.y`: >> >> =C2=A0=C2=A0=C2=A0=C2=A0 $ grep -R . /proc/pressure/ >> =C2=A0=C2=A0=C2=A0=C2=A0 /proc/pressure/io:some avg10=3D26.20 avg60=3D= 9.72 avg300=3D2.37 total=3D48337351849 >> =C2=A0=C2=A0=C2=A0=C2=A0 /proc/pressure/io:full avg10=3D26.20 avg60=3D= 9.72 avg300=3D2.37 total=3D48073742033 >> =C2=A0=C2=A0=C2=A0=C2=A0 /proc/pressure/cpu:some avg10=3D0.00 avg60=3D= 0.00 avg300=3D0.00 total=3D755843898 >> =C2=A0=C2=A0=C2=A0=C2=A0 /proc/pressure/memory:some avg10=3D0.00 avg60= =3D0.00 avg300=3D0.00 total=3D2530209046 >> =C2=A0=C2=A0=C2=A0=C2=A0 /proc/pressure/memory:full avg10=3D0.00 avg60= =3D0.00 avg300=3D0.00 total=3D2318143440 >> >> The current explanation is, that over night several maintenance=20 >> scripts like backup/mirroring and accounting scripts are run, which=20 >> touch all files on the devices. Additionally sometimes other users run= =20 >> cluster jobs with millions of files on the software RAID. Such things=20 >> invalidate the inode cache, and =E2=80=9Cmy=E2=80=9D are thrown out. W= hen I use it=20 >> afterward it=E2=80=99s slow in the beginning. There is still free memo= ry=20 >> during these times according to `top`. >=20 > =C2=A0=C2=A0=C2=A0 $ free -h > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 total=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 used=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 free=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 shared=C2=A0 buff/cache available > =C2=A0=C2=A0=C2=A0 Mem:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 94G=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 8.3G=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 5.3G=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 2.3M=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 80G =C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 83G > =C2=A0=C2=A0=C2=A0 Swap:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 0B=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= 0B=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0B >=20 >> Does that sound reasonable with ten million inodes? Is that easily=20 >> verifiable? >=20 > If an inode consume 512 bytes with ten million inodes, that would be=20 > around 500 MB, which should easily fit into the cache, so it does not=20 > need to be invalidated? Something is wrong with that calculation, and the cache size is much bigg= er. Looking into `/proc/slabinfo` and XFS=E2=80=99 runtime/internal statistic= s [1],=20 it turns out that the inode cache is likely the problem. XFS=E2=80=99 internal stats show that only one third of the inodes reques= ts are=20 answered from cache. $ grep ^ig /sys/fs/xfs/stats/stats ig 1791207386 647353522 20111 1143854223 394 1142080045 10683174 During the problematic time, the SLAB size is around 4 GB and, according=20 to slabinfo, the inode cache only has around 200.000 (sometimes even as=20 low as 50.000). $ sudo grep inode /proc/slabinfo nfs_inode_cache 16 24 1064 3 1 : tunables 24=20 12 8 : slabdata 8 8 0 rpc_inode_cache 94 138 640 6 1 : tunables 54=20 27 8 : slabdata 23 23 0 mqueue_inode_cache 1 4 896 4 1 : tunables 54=20 27 8 : slabdata 1 1 0 xfs_inode 1693683 1722284 960 4 1 : tunables 54=20 27 8 : slabdata 430571 430571 0 ext2_inode_cache 0 0 768 5 1 : tunables 54=20 27 8 : slabdata 0 0 0 reiser_inode_cache 0 0 760 5 1 : tunables 54=20 27 8 : slabdata 0 0 0 hugetlbfs_inode_cache 2 12 608 6 1 : tunables=20 54 27 8 : slabdata 2 2 0 sock_inode_cache 346 670 768 5 1 : tunables 54=20 27 8 : slabdata 134 134 0 proc_inode_cache 121 288 656 6 1 : tunables 54=20 27 8 : slabdata 48 48 0 shmem_inode_cache 2249 2827 696 11 2 : tunables 54=20 27 8 : slabdata 257 257 0 inode_cache 209098 209482 584 7 1 : tunables 54=20 27 8 : slabdata 29926 29926 0 (What is the difference between `xfs_inode` and `inode_cache`?) Then going through all the files with `find -ls`, the inode cache grows=20 to four to five million and the SLAB size grows to around 8 GB. Over=20 night it shrinks back to the numbers above and the page cache grows back. In the discussions [2], adji`vfs_cache_pressure` is recommended, but =E2=80= =93=20 besides setting it to 0 =E2=80=93 it only seems to delay the shrinking of= the=20 cache. (As it=E2=80=99s an integer 1 is the lowest non-zero (positive) nu= mber,=20 which would delay it by a factor of 100. Is there a way to specify the minimum numbers of entries in the inode=20 cache, or a minimum SLAB size up to that the caches should not be decreas= ed? Kind regards, Paul [1]: https://xfs.org/index.php/Runtime_Stats#ig [2]:=20 https://linux-xfs.oss.sgi.narkive.com/qa0AYeBS/improving-xfs-file-system-= inode-performance "Improving XFS file system inode performance" from 2010