From: Hsin-Yi Wang <hsinyi@chromium.org>
To: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Xiongwei Song <sxwjean@gmail.com>,
Matthew Wilcox <willy@infradead.org>,
Zheng Liang <zhengliang6@huawei.com>,
Zhang Yi <yi.zhang@huawei.com>, Hou Tao <houtao1@huawei.com>,
Miao Xie <miaoxie@huawei.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
"Song, Xiongwei" <Xiongwei.Song@windriver.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"squashfs-devel@lists.sourceforge.net"
<squashfs-devel@lists.sourceforge.net>
Subject: Re: squashfs performance regression and readahea
Date: Thu, 12 May 2022 14:23:21 +0800 [thread overview]
Message-ID: <CAJMQK-gQ+LD6t74FUwuxYVcmETgJxK8Q5_ZtuJvELm+yr=f8Yg@mail.gmail.com> (raw)
In-Reply-To: <adf436dd-d17d-7a84-68ba-0dd2125620cf@squashfs.org.uk>
[-- Attachment #1: Type: text/plain, Size: 11745 bytes --]
On Thu, May 12, 2022 at 3:13 AM Phillip Lougher <phillip@squashfs.org.uk> wrote:
>
> On 11/05/2022 16:12, Hsin-Yi Wang wrote:
> > On Tue, May 10, 2022 at 9:19 PM Xiongwei Song <sxwjean@gmail.com> wrote:
> >>
> >> On Tue, May 10, 2022 at 8:47 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> >>>
> >>> On Tue, May 10, 2022 at 8:31 PM Xiongwei Song <sxwjean@gmail.com> wrote:
> >>>>
> >>>> Hi Hsin-Yi,
> >>>>
> >>>> On Mon, May 9, 2022 at 10:29 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> >>>>>
> >>>>> On Mon, May 9, 2022 at 9:21 PM Matthew Wilcox <willy@infradead.org> wrote:
> >>>>>>
> >>>>>> On Mon, May 09, 2022 at 08:43:45PM +0800, Xiongwei Song wrote:
> >>>>>>> Hi Hsin-Yi and Matthew,
> >>>>>>>
> >>>>>>> With the patch from the attachment on linux 5.10, ran the command as I
> >>>>>>> mentioned earlier,
> >>>>>>> got the results below:
> >>>>>>> 1:40.65 (1m + 40.65s)
> >>>>>>> 1:10.12
> >>>>>>> 1:11.10
> >>>>>>> 1:11.47
> >>>>>>> 1:11.59
> >>>>>>> 1:11.94
> >>>>>>> 1:11.86
> >>>>>>> 1:12.04
> >>>>>>> 1:12.21
> >>>>>>> 1:12.06
> >>>>>>>
> >>>>>>> The performance has improved obviously, but compared to linux 4.18, the
> >>>>>>> performance is not so good.
> >>>>>>>
> >>>>> I think you shouldn't compare the performance with 4.18 directly,
> >>>>> since there might be other factors that impact the performance.
> >>>>
> >>>> Make sense.
> >>>>
> >>>>> I'd suggest comparing the same kernel version with:
> >>>>> a) with this patch
> >>>>> b) with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted.
> >>>>
> >>>> With 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
> >>>> read-ahead") reverted and applied 0001-WIP-squashfs-implement-readahead.patch,
> >>>> test result on linux 5.18:
> >>>> 1:41.51 (1m + 41.51s)
> >>>> 1:08.11
> >>>> 1:10.37
> >>>> 1:11.17
> >>>> 1:11.32
> >>>> 1:11.59
> >>>> 1:12.23
> >>>> 1:12.08
> >>>> 1:12.76
> >>>> 1:12.51
> >>>>
> >>>> performance worse 1 ~ 2s than linux 5.18 vanilla.
> >>>>
> >>>
> >>> Can you share the pack file you used for testing? Thanks
> >>
> >> You are saying the files that are put in squashfs partitions? If yes, I can tell
> >> I just put some dynamic libraries in partitions:
> >> -rwxr-xr-x 1 root root 200680 Apr 20 03:57 ld-2.33.so
> >> lrwxrwxrwx 1 root root 10 Apr 20 03:57 ld-linux-x86-64.so.2 -> ld-2.33.so
> >> -rwxr-xr-x 1 root root 18776 Apr 20 03:57 libanl-2.33.so
> >> lrwxrwxrwx 1 root root 14 Apr 20 03:57 libanl.so.1 -> libanl-2.33.so
> >> lrwxrwxrwx 1 root root 17 Apr 20 04:08 libblkid.so.1 -> libblkid.so.1.1.0
> >> -rwxr-xr-x 1 root root 330776 Apr 20 04:08 libblkid.so.1.1.0
> >> -rwxr-xr-x 1 root root 1823192 Apr 20 03:57 libc-2.33.so
> >> ...... snip ......
> >>
> >> The number of files is 110(55 libs + 55 soft links to libs). I have 90 squashfs
> >> partitions which save the identical files. The space of each partition is 11M,
> >> nothing special.
> >>
> >> Thanks.
> >>
> >
> > I noticed that there's a crash at
> > https://elixir.bootlin.com/linux/latest/source/lib/lzo/lzo1x_decompress_safe.c#L218
> > when testing on my system.
> > (I have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS enabled)
> >
> > Full logs:
> > [ 119.062420] Unable to handle kernel paging request at virtual
> > address ffffffc017337000
> > [ 119.062437] Mem abort info:
> > [ 119.062442] ESR = 0x96000047
> > [ 119.062447] EC = 0x25: DABT (current EL), IL = 32 bits
> > [ 119.062451] SET = 0, FnV = 0
> > [ 119.062454] EA = 0, S1PTW = 0
> > [ 119.062457] Data abort info:
> > [ 119.062460] ISV = 0, ISS = 0x00000047
> > [ 119.062464] CM = 0, WnR = 1
> > [ 119.062469] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000041099000
> > [ 119.062473] [ffffffc017337000] pgd=000000010014a003,
> > p4d=000000010014a003, pud=000000010014a003, pmd=000000010ba59003,
> > pte=0000000000000000
> > [ 119.062489] Internal error: Oops: 96000047 [#1] PREEMPT SMP
> > [ 119.062494] Modules linked in: vhost_vsock vhost vhost_iotlb
> > vmw_vsock_virtio_transport_common vsock rfcomm algif_hash
> > algif_skcipher af_alg veth uinput xt_cgroup mtk_dip mtk_cam_isp
> > mtk_vcodec_enc mtk_vcodec_dec hci_uart mtk_fd mtk_mdp3 v4l2_h264
> > mtk_vcodec_common mtk_vpu xt_MASQUERADE mtk_jpeg cros_ec_rpmsg btqca
> > videobuf2_dma_contig v4l2_fwnode v4l2_mem2mem btrtl elants_i2c mtk_scp
> > mtk_rpmsg rpmsg_core mtk_scp_ipi mt8183_cci_devfreq ip6table_nat fuse
> > 8021q bluetooth ecdh_generic ecc iio_trig_sysfs cros_ec_lid_angle
> > cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer
> > kfifo_buf cros_ec_sensorhub cros_ec_typec typec hid_google_hammer
> > ath10k_sdio lzo_rle lzo_compress ath10k_core ath mac80211 zram
> > cfg80211 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
> > videobuf2_common cdc_ether usbnet r8152 mii joydev
> > [ 119.062613] CPU: 4 PID: 4161 Comm: chrome Tainted: G W
> > 5.10.112 #105 39f11bffda227eaae4c704733b9bf01db22d8b4d
> > [ 119.062617] Hardware name: Google burnet board (DT)
> > [ 119.062623] pstate: 20400005 (nzCv daif +PAN -UAO -TCO BTYPE=--)
> > [ 119.062636] pc : lzo1x_decompress_safe+0x1dc/0x564
> > [ 119.062643] lr : lzo_uncompress+0x134/0x1f0
> > [ 119.062647] sp : ffffffc01837b860
> > [ 119.062650] x29: ffffffc01837b860 x28: 0000000000000000
> > [ 119.062656] x27: 0000000000005451 x26: ffffffc0171c9445
> > [ 119.062662] x25: 0000000000000000 x24: ffffffc017437000
> > [ 119.062668] x23: ffffffc0171c944f x22: ffffffc017136000
> > [ 119.062673] x21: ffffffc017336ff1 x20: ffffffc017237000
> > [ 119.062679] x19: ffffffc01837b8d0 x18: 0000000000000000
> > [ 119.062684] x17: 00000000000001eb x16: 0000000000000012
> > [ 119.062689] x15: 000000000010000f x14: d600120202000001
> > [ 119.062695] x13: ffffffc017336ff1 x12: ffffffc017336ff4
> > [ 119.062700] x11: 0000000000000002 x10: 01010101010100ff
> > [ 119.062705] x9 : ffffffffffffffff x8 : ffffffc0171c944d
> > [ 119.062710] x7 : d15d3aaabd294330 x6 : 0206397115fe28ab
> > [ 119.062715] x5 : ffffffc0171c944f x4 : 000000000009344f
> > [ 119.062721] x3 : ffffffc01837b8d0 x2 : ffffffc017237000
> > [ 119.062726] x1 : 000000000009344f x0 : ffffffc0171c9447
> > [ 119.062731] Call trace:
> > [ 119.062738] lzo1x_decompress_safe+0x1dc/0x564
> > [ 119.062742] lzo_uncompress+0x134/0x1f0
> > [ 119.062746] squashfs_decompress+0x6c/0xb4
> > [ 119.062753] squashfs_read_data+0x1a8/0x298
> > [ 119.062758] squashfs_readahead+0x308/0x474
> > [ 119.062765] read_pages+0x74/0x280
> > [ 119.062769] page_cache_ra_unbounded+0x1d0/0x228
> > [ 119.062773] do_page_cache_ra+0x44/0x50
> > [ 119.062779] do_sync_mmap_readahead+0x188/0x1a0
> > [ 119.062783] filemap_fault+0x100/0x350
> > [ 119.062789] __do_fault+0x48/0x10c
> > [ 119.062793] do_cow_fault+0x58/0x12c
> > [ 119.062797] handle_mm_fault+0x544/0x904
> > [ 119.062804] do_page_fault+0x260/0x384
> > [ 119.062809] do_translation_fault+0x44/0x5c
> > [ 119.062813] do_mem_abort+0x48/0xb4
> > [ 119.062819] el0_da+0x28/0x34
> > [ 119.062824] el0_sync_compat_handler+0xb8/0xcc
> > [ 119.062829] el0_sync_compat+0x188/0x1c0
> > [ 119.062837] Code: f94001ae f90002ae f94005ae 910041ad (f90006ae)
> > [ 119.062842] ---[ end trace 3e9828c7360fd7be ]---
> > [ 119.090436] Kernel panic - not syncing: Oops: Fatal exception
> > [ 119.090455] SMP: stopping secondary CPUs
> > [ 119.090467] Kernel Offset: 0x2729c00000 from 0xffffffc010000000
> > [ 119.090471] PHYS_OFFSET: 0xffffffd880000000
> > [ 119.090477] CPU features: 0x08240002,2188200c
> >
> > 1) Traces near when the crash happened:
> > [ 79.495580] Block @ 0x60eea9c, compressed size 65744, src size 1048576
> > [ 80.363573] Block @ 0x1f9f000, compressed size 200598, src size 1048576
> > [ 80.371256] Block @ 0x1fcff96, compressed size 80772, src size 1048576
> > [ 80.428388] Block @ 0x1fe3b1a, compressed size 83941, src size 1048576
> > [ 80.435319] Block @ 0x1ff82ff, compressed size 77936, src size 1048576
> > [ 80.724331] Block @ 0x4501000, compressed size 364069, src size 1048576
> > [ 80.738683] Block @ 0x4dccf28, compressed size 603215, src size 2097152
>
> Src size 2097152 is clearly wrong, as the maximum data block size is 1
> Mbyte or 1048576.
>
> That debug line comes from
>
> https://elixir.bootlin.com/linux/latest/source/fs/squashfs/block.c#L156
>
> ----
> TRACE("Block @ 0x%llx, %scompressed size %d, src size %d\n",
> index, compressed ? "" : "un", length, output->length);
> ----
>
> Which indicates your code has created a page_actor of 2 Mbytes in size
> (output->length).
>
> This is completely incorrect, as the page_actor should never be larger
> than the size of the block to be read in question. In most cases that
> will be msblk->block_size, but it may be less at the end of the file.
>
> You appear to be trying to read the amount of readahead requested. But,
> you should always be trying to read the lesser of readahead, and the
> size of the block in question.
>
> Hope that helps.
>
> Phillip
>
Hi Phillip,
Thanks for the explanation. After restricting the size feed to
page_actor, the crash no longer happened.
Hi Xiongwei,
Can you test this version (sent as attachment) again? I've tested on
my platform:
- arm64
- kernel 5.10
- pack_data size ~ 300K
- time ureadahead pack_data
1. with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted:
0.633s
0.755s
0.804s
2. apply the patch:
0.625s
0.656s
0.768s
Hi Matthew,
Thanks for reviewing the patch previously. Does this version look good
to you? If so, I can send it to the list.
Thanks for all of your help.
> >
> > It's also noticed that when the crash happened, nr_pages obtained by
> > readahead_count() is 512.
> > nr_pages = readahead_count(ractl); // this line
> >
> > 2) Normal cases that won't crash:
> > [ 22.651750] Block @ 0xb3bbca6, compressed size 42172, src size 262144
> > [ 22.653580] Block @ 0xb3c6162, compressed size 29815, src size 262144
> > [ 22.656692] Block @ 0xb4a293f, compressed size 17484, src size 131072
> > [ 22.666099] Block @ 0xb593881, compressed size 39742, src size 262144
> > [ 22.668699] Block @ 0xb59d3bf, compressed size 37841, src size 262144
> > [ 22.695739] Block @ 0x13698673, compressed size 65907, src size 131072
> > [ 22.698619] Block @ 0x136a87e6, compressed size 3155, src size 131072
> > [ 22.703400] Block @ 0xb1babe8, compressed size 99391, src size 131072
> > [ 22.706288] Block @ 0x1514abc6, compressed size 4627, src size 131072
> >
> > nr_pages are observed to be 32, 64, 256... These won't cause a crash.
> > Other values (max_pages, bsize, block...) looks normal
> >
> > I'm not sure why the crash happened, but I tried to modify the mask
> > for a bit. After modifying the mask value to below, the crash is gone
> > (nr_pages are <=256).
> > Based on my testing on a 300K pack file, there's no performance change.
> >
> > diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > index 20ec48cf97c5..f6d9b6f88ed9 100644
> > --- a/fs/squashfs/file.c
> > +++ b/fs/squashfs/file.c
> > @@ -499,8 +499,8 @@ static void squashfs_readahead(struct
> > readahead_control *ractl)
> > {
> > struct inode *inode = ractl->mapping->host;
> > struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> > - size_t mask = (1UL << msblk->block_log) - 1;
> > size_t shift = msblk->block_log - PAGE_SHIFT;
> > + size_t mask = (1UL << shift) - 1;
> >
> >
> > Any pointers are appreciated. Thanks!
>
[-- Attachment #2: 0001-WIP-squashfs-implement-readahead.patch --]
[-- Type: text/x-patch, Size: 3224 bytes --]
From d50220684beed2b6d370d5e63a7dfb31a2b0788b Mon Sep 17 00:00:00 2001
From: Hsin-Yi Wang <hsinyi@chromium.org>
Date: Sun, 10 Oct 2021 21:22:25 +0800
Subject: [PATCH] squashfs: implement readahead
Implement readahead callback for squashfs. It will read datablocks
which cover pages in readahead request. For a few cases it will
not mark page as uptodate, including:
- file end is 0.
- current batch of pages isn't in the same datablock or not enough in a
datablock.
Otherwise pages will be marked as uptodate. The unhandled pages will be
updated by readpage later.
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
---
fs/squashfs/file.c | 74 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 73 insertions(+), 1 deletion(-)
diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index 89d492916dea..7cd57e0d88de 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -39,6 +39,7 @@
#include "squashfs_fs_sb.h"
#include "squashfs_fs_i.h"
#include "squashfs.h"
+#include "page_actor.h"
/*
* Locate cache slot in range [offset, index] for specified inode. If
@@ -494,7 +495,78 @@ static int squashfs_readpage(struct file *file, struct page *page)
return 0;
}
+static void squashfs_readahead(struct readahead_control *ractl)
+{
+ struct inode *inode = ractl->mapping->host;
+ struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
+ size_t mask = (1UL << msblk->block_log) - 1;
+ size_t shift = msblk->block_log - PAGE_SHIFT;
+ loff_t req_end = readahead_pos(ractl) + readahead_length(ractl);
+ loff_t start = readahead_pos(ractl) &~ mask;
+ size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
+ struct squashfs_page_actor *actor;
+ unsigned int nr_pages = 0;
+ struct page **pages;
+ u64 block = 0;
+ int bsize, res, i, index;
+ int file_end = i_size_read(inode) >> msblk->block_log;
+ unsigned int max_pages = 1UL << shift;
+
+ readahead_expand(ractl, start, (len | mask) + 1);
+
+ if (readahead_pos(ractl) + readahead_length(ractl) < req_end ||
+ file_end == 0)
+ return;
+
+ nr_pages = min(readahead_count(ractl), max_pages);
+
+ pages = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL);
+ if (!pages)
+ return;
+
+ actor = squashfs_page_actor_init_special(pages, nr_pages, 0);
+ if (!actor)
+ goto out;
+
+ for (;;) {
+ nr_pages = __readahead_batch(ractl, pages, max_pages);
+ if (!nr_pages)
+ break;
+
+ if (readahead_pos(ractl) >= i_size_read(inode) ||
+ nr_pages < max_pages)
+ goto skip_pages;
+
+ index = pages[0]->index >> shift;
+ bsize = read_blocklist(inode, index, &block);
+ res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
+ actor);
+
+ if (res >= 0 && (pages[nr_pages - 1]->index >> shift) == index)
+ for (i = 0; i < nr_pages; i++)
+ SetPageUptodate(pages[i]);
+
+ for (i = 0; i < nr_pages; i++) {
+ unlock_page(pages[i]);
+ put_page(pages[i]);
+ }
+ }
+
+ kfree(actor);
+ return;
+
+skip_pages:
+ for (i = 0; i < nr_pages; i++) {
+ unlock_page(pages[i]);
+ put_page(pages[i]);
+ }
+
+ kfree(actor);
+out:
+ kfree(pages);
+}
const struct address_space_operations squashfs_aops = {
- .readpage = squashfs_readpage
+ .readpage = squashfs_readpage,
+ .readahead = squashfs_readahead
};
--
2.31.0
next prev parent reply other threads:[~2022-05-12 6:23 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-08 14:46 Song, Xiongwei
2022-05-08 16:44 ` Matthew Wilcox
2022-05-09 7:05 ` Hsin-Yi Wang
2022-05-09 7:10 ` Hsin-Yi Wang
2022-05-09 9:42 ` Song, Xiongwei
2022-05-09 12:43 ` Xiongwei Song
2022-05-09 13:21 ` Matthew Wilcox
2022-05-09 14:29 ` Hsin-Yi Wang
2022-05-10 12:30 ` Xiongwei Song
2022-05-10 12:47 ` Hsin-Yi Wang
2022-05-10 13:18 ` Xiongwei Song
2022-05-11 15:12 ` Hsin-Yi Wang
2022-05-11 19:13 ` Phillip Lougher
2022-05-12 6:23 ` Hsin-Yi Wang [this message]
2022-05-13 5:33 ` Phillip Lougher
2022-05-13 6:35 ` Hsin-Yi Wang
2022-05-15 0:54 ` Phillip Lougher
2022-05-16 8:23 ` Hsin-Yi Wang
2022-05-16 9:00 ` Xiongwei Song
2022-05-16 9:10 ` Hsin-Yi Wang
2022-05-16 9:34 ` Xiongwei Song
2022-05-16 11:01 ` Hsin-Yi Wang
2022-05-13 13:09 ` Matthew Wilcox
2022-05-15 0:05 ` Phillip Lougher
2022-05-13 12:16 ` Xiongwei Song
2022-05-13 12:29 ` Xiongwei Song
2022-05-13 16:43 ` Hsin-Yi Wang
2022-05-13 18:12 ` Matthew Wilcox
2022-05-13 23:12 ` Xiongwei Song
2022-05-14 11:51 ` Hsin-Yi Wang
2022-05-10 1:11 ` Phillip Lougher
2022-05-10 2:35 ` Matthew Wilcox
2022-05-10 3:20 ` Phillip Lougher
2022-05-10 3:41 ` Phillip Lougher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJMQK-gQ+LD6t74FUwuxYVcmETgJxK8Q5_ZtuJvELm+yr=f8Yg@mail.gmail.com' \
--to=hsinyi@chromium.org \
--cc=Xiongwei.Song@windriver.com \
--cc=akpm@linux-foundation.org \
--cc=houtao1@huawei.com \
--cc=linux-mm@kvack.org \
--cc=miaoxie@huawei.com \
--cc=phillip@squashfs.org.uk \
--cc=squashfs-devel@lists.sourceforge.net \
--cc=sxwjean@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
--cc=yi.zhang@huawei.com \
--cc=zhengliang6@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox