From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45B7BC47DB3 for ; Tue, 30 Jan 2024 00:39:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CD31B6B008C; Mon, 29 Jan 2024 19:39:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C83756B0092; Mon, 29 Jan 2024 19:39:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AFCC36B0093; Mon, 29 Jan 2024 19:39:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 972966B008C for ; Mon, 29 Jan 2024 19:39:51 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5E171C0B2A for ; Tue, 30 Jan 2024 00:39:51 +0000 (UTC) X-FDA: 81734119782.22.448B61F Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) by imf22.hostedemail.com (Postfix) with ESMTP id 81179C0019 for ; Tue, 30 Jan 2024 00:39:49 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=a6sqiemN; spf=pass (imf22.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.167.50 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706575189; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JlI400jR3YnfQZTTcjtOn/8rcjJzuCncqYMoAGQqT+o=; b=w5fci93CncFW/GmG53OC70gjmYssc9G4q+xYucsm4RSp6teghlib4scAjIWVhsSyiRZiDt Z5X8OdgQjV5M5XRp4mV/6fHJYfhstkVlD2UecBXUpky5CF1OL+Kdyx3/8muNUhlusCYi6f sxUBIHuUelM4SXK2vryKI9u6A9c883Y= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706575189; a=rsa-sha256; cv=none; b=3w+bwBAEjQznRPpbmOAx0efRVYyn0XRKlzM+4dl9hZ1APuVcqV4PU3ZfSoslFMrqta08Tu VS6exSvh5EWXJVmnsNyjvKAqy2iD090xKAO8YWbgGTyIlwoqR+hrc1s/2FTXYEwXXOWHaR DFrFH8Huy/JzF+wDWfw80njXrTvEzYE= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=a6sqiemN; spf=pass (imf22.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.167.50 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lf1-f50.google.com with SMTP id 2adb3069b0e04-5110f515deaso2812898e87.2 for ; Mon, 29 Jan 2024 16:39:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706575187; x=1707179987; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=JlI400jR3YnfQZTTcjtOn/8rcjJzuCncqYMoAGQqT+o=; b=a6sqiemNcYDvvUo9YB99hGp0ghWKRxeyKsFqokXflIIcxX0j5cMZlNSgjoDxQ5i/z4 jGqWy/iawZ6mx5O2DWB9AfaJuy8qEYZ4lIxL9RUK7QXa+3X/nHm36Ksz3XOx2pA23OUr nz8TB+dxowtC3CmMyb4BLkGftajX/Mug7QsvpFvCMbXM4UrKXYAECdJPeRl4QiFcpMSx 3HQFZpB0/fThRTSwFs0RPSXVl9KwUOWsqn+nsqB2myh5BivchE+D+9uyHlbZyKzWz84c KaKsiJuEVOjk+qyG5D9qRINjmhIn8FWLiWR3DXwi3B102URKZOxV8N43xvQUJwXqmMAP Wfsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706575187; x=1707179987; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=JlI400jR3YnfQZTTcjtOn/8rcjJzuCncqYMoAGQqT+o=; b=LgMf7VwSItgRgrwy7fcOwhaPVFN0OUu5CqlWRFkX06owbLpG2YrLLMPyzhCZA1g8U/ P3JGkOvOmeUswRLV5pYmvua48pdGuygcefcFxS+XHq0/XZfUdtYBDkP5Q7NdOnxjJ69m lQO6+cvIA9+jijthKBuzyOFFqJXReSJJ25LkATbQUe+YYV8CqDYtPNNfPv9JflyGjUME kH9qzo3Gy06vaG/a9SbpL4kT5gu0tvTtUbXTGpKv9bKzmxVBW0diGFgq0vql4+Fk8V81 /eqMI3bkjy/ia6hqm6daiTcGVFdyK99dFRfHlwyZ8GHjlVthipK9shsPPisjgYkRmTzz Ff7A== X-Gm-Message-State: AOJu0YxkWXRd3akJGMwaNwEXtosGiZJLXiyJtT4iyUrc1qH9lwDm8Z6P PCOVzeQWAuF49/f6JhfntP2cVSkLvf19bBrNNZcnuHEgV4V7ccsqdOTVbOXxJaOWjmTsZmfekvm 9fLNCEmhnicFPQfAqcSB+gZrSG94= X-Google-Smtp-Source: AGHT+IH7gS+4dYip+E0Srlh1B9v1gRjkNvr9S/OM3jjWDx8JH4Bm7dtQF6ObS3uehk1aWB/n4lQ6cM/rQYdPzYjd63A= X-Received: by 2002:a2e:91c3:0:b0:2d0:58b9:2bf3 with SMTP id u3-20020a2e91c3000000b002d058b92bf3mr358516ljg.20.1706575187188; Mon, 29 Jan 2024 16:39:47 -0800 (PST) MIME-Version: 1.0 References: <20240102175338.62012-1-ryncsn@gmail.com> <20240102175338.62012-10-ryncsn@gmail.com> <871qar9sb2.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: From: Kairui Song Date: Tue, 30 Jan 2024 08:39:34 +0800 Message-ID: Subject: Re: [PATCH v2 9/9] mm/swap, shmem: use new swapin helper to skip readahead conditionally To: "Huang, Ying" Cc: linux-mm , Andrew Morton , Chris Li , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , LKML Content-Type: multipart/alternative; boundary="000000000000d0a29106101effd4" X-Rspamd-Queue-Id: 81179C0019 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: qthkzeubcon6t7f4ziu9ffp7tf4m79yz X-HE-Tag: 1706575189-982162 X-HE-Meta: U2FsdGVkX1/jnWx3kPFk+PKkpPsUd3fuOlJwmOO0NT869tGs2H0j3+A/OXiz6Y6OxB+WlHbKcqWZlevuwBDfA9/9RJH5VnhAky5oZfjSMmnkzJJH2xwrVW9+8wLSU87F0MOBE7G/1n51psjDzKKEVYTbiEqY5rzMWYAoVnbhmruJttfqsL47TunrTE/kx5ZwUlRKRwPwgS3/8iZ74nCkC8jNWFIzQsuir/UGRYRnE+NosLugsSVJpQOwD6U2t5olCakEQF3vWI3olfm3GuWvrE860/sTtjpuCaRBq3HvvOzWspoHW1ltt7LucbpS03afrczLDfoM5B4pObst7SbonnFO77Dmr/dlYDT542q5VlJHgES1hORpiDvQ3L4ZUgKtbs9kpXQnCuNBKUZr6zzdX1d+SHoi1o1xQ3Hk05owG7EPYrnvp0+O/ukhFw4dSSz249NtRaDg1O0LcWAYPxdmxlwvMYQbqdcm7pZAdhgw7AKnLumDAVaW5Bpgi1pW472775ik8ZhEgI0uW3aY9LWvKtYUbmBXGWRzz+qJ1qgym7NAeo320bIJI93ajL+nRrNCP754jSPBFeiaT6QGdSQ1VxysHtgUlnw+ZNZ5rpAy9v8b12P+dxVYkLh74NJFpB24kGqTXTkq8CyS3sstQOuTpnceQnbYvhlJGaSLB7yvf7i9RkcQx1ySQhxKgLvo+TO6JG2byIEe5UTAnZoIUZKNTTjTcsIjXB7KMV3L8kwDyp+cfEYqpAOuSLnsWOqizEmE58mfGmjA3C5y5u7oAxosL076C9vu1EqPqMNksxSzBi8h+qlslwKxFXROUPrBzCRsfLM86802+mybt1HUp4dVjVn6fO+HX/gWRjW+PEpo3LD0iAzT2QtpbEZNmjYUGDj9gMBDfsQCdpxwauufMAt7a8vkspmVfCoWl11GkXKfkdXrfwUaUY6rv41bsQgiu+p9NLKhT56CDCQPhbibU38 R17LNfzz c7q5nXIaZAx2G1qt0aEi5gRTUIawTlSwsPemNTjO/AxF7OUBIw1UxtgesPK1yhcQ4Fq4Xq0OvB12cI263XRrH9q4nhwNHoZTMSN/KMRACa/C7m/fMXSTtF6P8jVtTvZUQme9eUADtW8O7aWE++ba1p3I4IJDtdiND9i1/CfNpeCcUf5XohO8gQTMh1vEiHbd47H15LOSz0Z0QVoHj6doaOrpZH/EQdOhfsr1L6tXjt7K/OjMnPTaApph9Z7mBnHg+jWRkDM0CUy/cb7EU/rtLsCHDZHEmCu70ua3rSik9/Ndk8VB60ibd1ftM32iN52tCqGJQ/8oXvJh5vOjjF8wSKe56f0YWOHR949lMglO5mclsXijb4EWW5mAplTyaP2JHTNOaaWa1pzMD7YIGa/+nLj16LMn30D1GSLOcYnn3AzB04/uqChS5DIhsSXS65GShw9E2hwevV3En4QYSjn8R6+5x0xbDA8p7GhGnOrwsNlgE6iugWWvZh2u+tbwX1k1c1VqZE7hhDkIatS1D93GKC5H/wUsTwFNqqHTHpY5+9bxj7FU9zANDbvd0CI055SDb5b+PCYLp4wA3Bo8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --000000000000d0a29106101effd4 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Jan 10, 2024 at 11:35=E2=80=AFAM Kairui Song wro= te: > > Huang, Ying =E4=BA=8E2024=E5=B9=B41=E6=9C=889=E6= =97=A5=E5=91=A8=E4=BA=8C 10:05=E5=86=99=E9=81=93=EF=BC=9A > > > > Kairui Song writes: > > > > > From: Kairui Song > > > > > > Currently, shmem uses cluster readahead for all swap backends. Cluste= r > > > readahead is not a good solution for ramdisk based device (ZRAM) at all. > > > > > > After switching to the new helper, most benchmarks showed a good result: > > > > > > - Single file sequence read: > > > perf stat --repeat 20 dd if=3D/tmpfs/test of=3D/dev/null bs=3D1M count=3D8192 > > > (/tmpfs/test is a zero filled file, using brd as swap, 4G memcg limit) > > > Before: 22.248 +- 0.549 > > > After: 22.021 +- 0.684 (-1.1%) > > > > > > - Random read stress test: > > > fio -name=3Dtmpfs --numjobs=3D16 --directory=3D/tmpfs \ > > > --size=3D256m --ioengine=3Dmmap --rw=3Drandread --random_distribution=3Drandom \ > > > --time_based --ramp_time=3D1m --runtime=3D5m --group_reporting > > > (using brd as swap, 2G memcg limit) > > > > > > Before: 1818MiB/s > > > After: 1888MiB/s (+3.85%) > > > > > > - Zipf biased random read stress test: > > > fio -name=3Dtmpfs --numjobs=3D16 --directory=3D/tmpfs \ > > > --size=3D256m --ioengine=3Dmmap --rw=3Drandread --random_distribution=3Dzipf:1.2 \ > > > --time_based --ramp_time=3D1m --runtime=3D5m --group_reporting > > > (using brd as swap, 2G memcg limit) > > > > > > Before: 31.1GiB/s > > > After: 32.3GiB/s (+3.86%) > > > > > > So cluster readahead doesn't help much even for single sequence read, > > > and for random stress test, the performance is better without it. > > > > > > Considering both memory and swap device will get more fragmented > > > slowly, and commonly used ZRAM consumes much more CPU than plain > > > ramdisk, false readahead could occur more frequently and waste > > > more CPU. Direct SWAP is cheaper, so use the new helper and skip > > > read ahead for SWP_SYNCHRONOUS_IO device. > > > > It's good to take advantage of swap_direct (no readahead). I also hope= s > > we can take advantage of VMA based swapin if shmem is accessed via mmap= . > > That appears possible. > > Good idea, that should be doable, will update the series. Hi Ying, Turns out it's quite complex to do VMA bases swapin readhead for shmem: VMA address / Page Tables doesn't contain swapin entry for shmem. For anon page simply read nearby page table is easy and good enough, but for shmem, it's stored in the inode mapping so the readahead needs to walk the inode mapping instead. That's doable but requires more work to make it actually usable. I've sent V3 without this feature, worth another series for this readahead extension. --000000000000d0a29106101effd4 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Wed, Jan 10, 2024 at 11:35=E2=80=AFAM Kairui Song <= ry= ncsn@gmail.com> wrote:
>
> Huang, Ying <ying.huang@intel.com> =E4=BA=8E2024=E5=B9=B41= =E6=9C=889=E6=97=A5=E5=91=A8=E4=BA=8C 10:05=E5=86=99=E9=81=93=EF=BC=9A
> >
> > Kairui Song <ryncsn@gmail.com> writes:
> >
> > > From: Kairui Song <kasong@tencent.com>
> > >
> > > Currently, shmem uses cluster readahead for all swap backend= s. Cluster
> > > readahead is not a good solution for ramdisk based device (Z= RAM) at all.
> > >
> > > After switching to the new helper, most benchmarks showed a = good result:
> > >
> > > - Single file sequence read:
> > >=C2=A0 =C2=A0perf stat --repeat 20 dd if=3D/tmpfs/test of=3D/= dev/null bs=3D1M count=3D8192
> > >=C2=A0 =C2=A0(/tmpfs/test is a zero filled file, using brd as= swap, 4G memcg limit)
> > >=C2=A0 =C2=A0Before: 22.248 +- 0.549
> > >=C2=A0 =C2=A0After:=C2=A0 22.021 +- 0.684 (-1.1%)
> > >
> > > - Random read stress test:
> > >=C2=A0 =C2=A0fio -name=3Dtmpfs --numjobs=3D16 --directory=3D/= tmpfs \
> > >=C2=A0 =C2=A0--size=3D256m --ioengine=3Dmmap --rw=3Drandread = --random_distribution=3Drandom \
> > >=C2=A0 =C2=A0--time_based --ramp_time=3D1m --runtime=3D5m --g= roup_reporting
> > >=C2=A0 =C2=A0(using brd as swap, 2G memcg limit)
> > >
> > >=C2=A0 =C2=A0Before: 1818MiB/s
> > >=C2=A0 =C2=A0After:=C2=A0 1888MiB/s (+3.85%)
> > >
> > > - Zipf biased random read stress test:
> > >=C2=A0 =C2=A0fio -name=3Dtmpfs --numjobs=3D16 --directory=3D/= tmpfs \
> > >=C2=A0 =C2=A0--size=3D256m --ioengine=3Dmmap --rw=3Drandread = --random_distribution=3Dzipf:1.2 \
> > >=C2=A0 =C2=A0--time_based --ramp_time=3D1m --runtime=3D5m --g= roup_reporting
> > >=C2=A0 =C2=A0(using brd as swap, 2G memcg limit)
> > >
> > >=C2=A0 =C2=A0Before: 31.1GiB/s
> > >=C2=A0 =C2=A0After:=C2=A0 32.3GiB/s (+3.86%)
> > >
> > > So cluster readahead doesn't help much even for single s= equence read,
> > > and for random stress test, the performance is better withou= t it.
> > >
> > > Considering both memory and swap device will get more fragme= nted
> > > slowly, and commonly used ZRAM consumes much more CPU than p= lain
> > > ramdisk, false readahead could occur more frequently and was= te
> > > more CPU. Direct SWAP is cheaper, so use the new helper and = skip
> > > read ahead for SWP_SYNCHRONOUS_IO device.
> >
> > It's good to take advantage of swap_direct (no readahead).=C2= =A0 I also hopes
> > we can take advantage of VMA based swapin if shmem is accessed vi= a mmap.
> > That appears possible.
>
> Good idea, that should be doable, will update the series.

Hi Ying,

Turns out it's quite complex to do VMA bases swapin readhead for shmem:= VMA address / Page Tables doesn't contain swapin entry for shmem. For = anon page simply read nearby page table is easy and good enough, but for sh= mem, it's stored in the inode mapping so the readahead needs to walk th= e inode mapping instead. That's doable but requires more work to make i= t actually usable. I've sent V3 without this feature, worth another ser= ies for this readahead extension.
--000000000000d0a29106101effd4--