From: Keith Busch <kbusch@kernel.org>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: "Matthew Wilcox" <willy@infradead.org>,
"Theodore Ts'o" <tytso@mit.edu>,
"Pankaj Raghav" <p.raghav@samsung.com>,
"Daniel Gomez" <da.gomez@samsung.com>,
"Javier González" <javier.gonz@samsung.com>,
lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-block@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations
Date: Fri, 3 Mar 2023 15:07:55 -0700 [thread overview]
Message-ID: <ZAJvu2hZrHu816gj@kbusch-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <ZAJqjM6qLrraFrrn@bombadil.infradead.org>
On Fri, Mar 03, 2023 at 01:45:48PM -0800, Luis Chamberlain wrote:
>
> You'd hope most of it is left to FS + MM, but I'm not yet sure that's
> quite it yet. Initial experimentation shows just enabling > PAGE_SIZE
> physical & logical block NVMe devices gets brought down to 512 bytes.
> That seems odd to say the least. Would changing this be an issue now?
I think you're talking about removing this part:
---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c2730b116dc68..2c528f56c2973 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1828,17 +1828,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
unsigned short bs = 1 << ns->lba_shift;
u32 atomic_bs, phys_bs, io_opt = 0;
- /*
- * The block layer can't support LBA sizes larger than the page size
- * yet, so catch this early and don't allow block I/O.
- */
- if (ns->lba_shift > PAGE_SHIFT) {
- capacity = 0;
- bs = (1 << 9);
- }
-
blk_integrity_unregister(disk);
-
atomic_bs = phys_bs = bs;
if (id->nabo == 0) {
/*
--
This is what happens today if the driver were to let the disk create with its
actual size (testing 8k LBA size on x86):
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP
CPU: 10 PID: 115 Comm: kworker/u32:2 Not tainted 6.2.0-00032-gdb7183e3c314-dirty #105
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: nvme-wq nvme_scan_work
RIP: 0010:create_empty_buffers+0x24/0x240
Code: 66 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 d4 ba 01 00 00 00 55 53 48 89 fb e8 17 f5 ff ff 48 89 c5 48 89 c2 eb 03 48 89 ca <48> 8b 4a 08 4c 09 22 48 85 c9 75 f1 48 89 6a 08 48 8b 43 18 48 8d
RSP: 0000:ffffc900004578f0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffffea0000152580 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffea0000152580
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
R10: ffff88803ecb6c18 R11: 0000000000000000 R12: 0000000000000000
R13: ffffea0000152580 R14: 0000000000100cc0 R15: ffff888017030288
FS: 0000000000000000(0000) GS:ffff88803ec80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000002c2a001 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? blkdev_readahead+0x20/0x20
create_page_buffers+0x79/0x90
block_read_full_folio+0x58/0x410
? blkdev_write_begin+0x20/0x20
? xas_store+0x56/0x5b0
? xas_load+0x8/0x40
? xa_get_order+0x51/0xe0
? __mod_memcg_lruvec_state+0x41/0x90
? blkdev_readahead+0x20/0x20
? blkdev_readahead+0x20/0x20
filemap_read_folio+0x41/0x2a0
? scan_shadow_nodes+0x30/0x30
? blkdev_readahead+0x20/0x20
? folio_add_lru+0x2d/0x40
? blkdev_readahead+0x20/0x20
do_read_cache_folio+0x103/0x420
? __switch_to_asm+0x3a/0x60
? __switch_to_asm+0x34/0x60
? get_page_from_freelist+0x735/0x1070
read_part_sector+0x2f/0xa0
read_lba+0xa2/0x150
efi_partition+0xdb/0x760
? snprintf+0x49/0x60
? is_gpt_valid.part.5+0x3f0/0x3f0
bdev_disk_changed+0x1ce/0x560
blkdev_get_whole+0x73/0x80
blkdev_get_by_dev+0x199/0x2e0
disk_scan_partitions+0x63/0xd0
device_add_disk+0x3c0/0x3d0
nvme_scan_ns+0x574/0xcc0
? nvme_scan_work+0x23a/0x3f0
nvme_scan_work+0x23a/0x3f0
process_one_work+0x1da/0x3a0
worker_thread+0x205/0x3a0
? process_one_work+0x3a0/0x3a0
kthread+0xc0/0xe0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
next prev parent reply other threads:[~2023-03-03 22:08 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-01 3:52 Theodore Ts'o
2023-03-01 4:18 ` Gao Xiang
2023-03-01 4:40 ` Matthew Wilcox
2023-03-01 4:59 ` Gao Xiang
2023-03-01 4:35 ` Matthew Wilcox
2023-03-01 4:49 ` Gao Xiang
2023-03-01 5:01 ` Matthew Wilcox
2023-03-01 5:09 ` Gao Xiang
2023-03-01 5:19 ` Gao Xiang
2023-03-01 5:42 ` Matthew Wilcox
2023-03-01 5:51 ` Gao Xiang
2023-03-01 6:00 ` Gao Xiang
2023-03-02 3:13 ` Chaitanya Kulkarni
2023-03-02 3:50 ` Darrick J. Wong
2023-03-03 3:03 ` Martin K. Petersen
2023-03-02 20:30 ` Bart Van Assche
2023-03-03 3:05 ` Martin K. Petersen
2023-03-03 1:58 ` Keith Busch
2023-03-03 3:49 ` Matthew Wilcox
2023-03-03 11:32 ` Hannes Reinecke
2023-03-03 13:11 ` James Bottomley
2023-03-04 7:34 ` Matthew Wilcox
2023-03-04 13:41 ` James Bottomley
2023-03-04 16:39 ` Matthew Wilcox
2023-03-05 4:15 ` Luis Chamberlain
2023-03-05 5:02 ` Matthew Wilcox
2023-03-08 6:11 ` Luis Chamberlain
2023-03-08 7:59 ` Dave Chinner
2023-03-06 12:04 ` Hannes Reinecke
2023-03-06 3:50 ` James Bottomley
2023-03-04 19:04 ` Luis Chamberlain
2023-03-03 21:45 ` Luis Chamberlain
2023-03-03 22:07 ` Keith Busch [this message]
2023-03-03 22:14 ` Luis Chamberlain
2023-03-03 22:32 ` Keith Busch
2023-03-03 23:09 ` Luis Chamberlain
2023-03-16 15:29 ` Pankaj Raghav
2023-03-16 15:41 ` Pankaj Raghav
2023-03-03 23:51 ` Bart Van Assche
2023-03-04 11:08 ` Hannes Reinecke
2023-03-04 13:24 ` Javier González
2023-03-04 16:47 ` Matthew Wilcox
2023-03-04 17:17 ` Hannes Reinecke
2023-03-04 17:54 ` Matthew Wilcox
2023-03-04 18:53 ` Luis Chamberlain
2023-03-05 3:06 ` Damien Le Moal
2023-03-05 11:22 ` Hannes Reinecke
2023-03-06 8:23 ` Matthew Wilcox
2023-03-06 10:05 ` Hannes Reinecke
2023-03-06 16:12 ` Theodore Ts'o
2023-03-08 17:53 ` Matthew Wilcox
2023-03-08 18:13 ` James Bottomley
2023-03-09 8:04 ` Javier González
2023-03-09 13:11 ` James Bottomley
2023-03-09 14:05 ` Keith Busch
2023-03-09 15:23 ` Martin K. Petersen
2023-03-09 20:49 ` James Bottomley
2023-03-09 21:13 ` Luis Chamberlain
2023-03-09 21:28 ` Martin K. Petersen
2023-03-10 1:16 ` Dan Helmick
2023-03-10 7:59 ` Javier González
2023-03-08 19:35 ` Luis Chamberlain
2023-03-08 19:55 ` Bart Van Assche
2023-03-03 2:54 ` Martin K. Petersen
2023-03-03 3:29 ` Keith Busch
2023-03-03 4:20 ` Theodore Ts'o
2023-07-16 4:09 BELINDA Goodpaster kelly
2025-09-22 17:49 Belinda R Goodpaster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZAJvu2hZrHu816gj@kbusch-mbp.dhcp.thefacebook.com \
--to=kbusch@kernel.org \
--cc=da.gomez@samsung.com \
--cc=javier.gonz@samsung.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=mcgrof@kernel.org \
--cc=p.raghav@samsung.com \
--cc=tytso@mit.edu \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox