From mboxrd@z Thu Jan 1 00:00:00 1970 Message-Id: <200005232158.OAA77313@getafix.engr.sgi.com> Subject: Re: PATCH: Enhance queueing/scsi-midlayer to handle kiobufs. [Re: Request splits] In-reply-to: Your message of "Fri, 19 May 2000 16:09:58 BST." <20000519160958.C9961@redhat.com> Date: Tue, 23 May 2000 14:58:34 -0700 From: Chaitanya Tumuluri Sender: owner-linux-mm@kvack.org Return-Path: To: "Stephen C. Tweedie" Cc: chait@sgi.com, Eric Youngdale , Alan Cox , Douglas Gilbert , Brian Pomerantz , linux-scsi@vger.rutgers.edu, linux-mm@kvack.org List-ID: On Fri, 19 May 2000 16:09:58 BST, "Stephen C. Tweedie" wrote: >Hi, > >On Thu, May 18, 2000 at 12:55:04PM -0700, Chaitanya Tumuluri wrote: > > < stuff deleted > > >> So, I enhanced Stephen Tweedie's >> raw I/O and the queueing/scsi layers to handle kiobufs-based requests. This is >> in addition to the current buffer_head based request processing. > >The "current" kiobuf code is in ftp.uk.linux.org:/pub/linux/sct/fs/raw-io/. >It includes a number of bug fixes (mainly rationalising the error returns), >plus a few new significant bits of functionality. If you can get me a >patch against those diffs, I'll include your new code in the main kiobuf >patchset. (I'm still maintaining the different kiobuf patches as >separate patches within that patchset tarball.) > Stephen and others, Here's my patch against the 2.3.99.pre9-2 patchset from your site. The main differences from my earlier post are: - removed the #ifdefs around my code as Stephen Tweedie suggested, - corrected indentation problems pointed out earlier (Eric/Alan). Finally, I'd like to repeat that given the consensus about moving away from buffer-head based I/O in the future, it makes sense for me to retain the little bit of code duplication. This is in the interests of easy surgery when we do remove the buffer-head I/O paths. While I see decent (upto 10%) improvement in b/w and turnaround time for I/O to a single disk, the biggest impact is the (almost 40%) reduction in CPU utilization with the new codepath. These are from simple `lmdd' tests timed with /usr/bin/time. Based on further feedback from this audience, I would like to propose this change to Linus at some point as a general scsi mechanism to handle kiobuf-based requests. Thanks much, -Chait. ----------------------------CUT HERE--------------------------------------- --- pre9.2-sct/drivers/block/ll_rw_blk.c Tue May 23 14:24:22 2000 +++ pre9.2-sct+mine/drivers/block/ll_rw_blk.c Tue May 23 14:38:20 2000 @@ -4,6 +4,7 @@ * Copyright (C) 1991, 1992 Linus Torvalds * Copyright (C) 1994, Karl Keyte: Added support for disk statistics * Elevator latency, (C) 2000 Andrea Arcangeli SuSE + * Support for kiobuf-based I/O requests: Chaitanya Tumuluri [chait@sgi.com] */ /* @@ -639,7 +640,8 @@ starving = 1; if (latency < 0) continue; - + if (req->kiobuf) + continue; if (req->sem) continue; if (req->cmd != rw) @@ -744,6 +746,7 @@ req->nr_hw_segments = 1; /* Always 1 for a new request. */ req->buffer = bh->b_data; req->sem = NULL; + req->kiobuf = NULL; req->bh = bh; req->bhtail = bh; req->q = q; @@ -886,6 +889,311 @@ __ll_rw_block(rw, nr, bh, 1); } +/* + * Function: __make_kio_request() + * + * Purpose: Construct a kiobuf-based request and insert into request queue. + * + * Arguments: q - request queue of device + * rw - read/write + * kiobuf - collection of pages + * dev - device against which I/O requested + * blocknr - dev block number at which to start I/O + * blksize - units (512B or other) of blocknr + * + * Lock status: No lock held upon entry. + * + * Returns: Nothing + * + * Notes: Requests generated by this function should _NOT_ be merged by + * the __make_request() (new check for `req->kiobuf') + * + * All (relevant) req->Y parameters are expressed in sector size + * of 512B for kiobuf based I/O. This is assumed in the scsi + * mid-layer as well. + */ +static inline void __make_kio_request(request_queue_t * q, + int rw, + struct kiobuf * kiobuf, + kdev_t dev, + unsigned long blocknr, + size_t blksize) +{ + int major = MAJOR(dev); + unsigned int sector, count, nr_bytes, total_bytes, nr_seg; + struct request * req; + int rw_ahead, max_req; + unsigned long flags; + struct list_head * head = &q->queue_head; + size_t curr_offset; + int orig_latency; + elevator_t * elevator; + int correct_size, i, kioind; + + /* + * Sanity Tests: + * + * The input arg. `blocknr' is in units of the + * input arg. `blksize' (inode->i_sb->s_blocksize). + * Convert to 512B unit used in blk_size[] array. + */ + count = kiobuf->length >> 9; + sector = blocknr * (blksize >> 9); + + if (blk_size[major]) { + unsigned long maxsector = (blk_size[major][MINOR(dev)] << 1) + 1; + + if (maxsector < count || maxsector - count < sector) { + if (!blk_size[major][MINOR(dev)]) { + kiobuf->errno = -EINVAL; + goto end_io; + } + /* This may well happen - the kernel calls bread() + without checking the size of the device, e.g., + when mounting a device. */ + printk(KERN_INFO + "attempt to access beyond end of device\n"); + printk(KERN_INFO "%s: rw=%d, want=%d, limit=%d\n", + kdevname(dev), rw, + (sector + count)>>1, + blk_size[major][MINOR(dev)]); + kiobuf->errno = -ESPIPE; + goto end_io; + } + } + /* + * Allow only basic block size multiples in the + * kiobuf->length. + */ + correct_size = BLOCK_SIZE; + if (blksize_size[major]) { + i = blksize_size[major][MINOR(dev)]; + if (i) + correct_size = i; + } + if ((kiobuf->length % correct_size) != 0) { + printk(KERN_NOTICE "ll_rw_kio: " + "request size [%d] not a multiple of device [%s] block-size [%d]\n", + kiobuf->length, + kdevname(dev), + correct_size); + kiobuf->errno = -EINVAL; + goto end_io; + } + rw_ahead = 0; /* normal case; gets changed below for READA */ + switch (rw) { + case READA: + rw_ahead = 1; + rw = READ; /* drop into READ */ + case READ: + kstat.pgpgin++; + max_req = NR_REQUEST; /* reads take precedence */ + break; + case WRITERAW: + rw = WRITE; + goto do_write; /* Skip the buffer refile */ + case WRITE: + do_write: + /* + * We don't allow the write-requests to fill up the + * queue completely: we want some room for reads, + * as they take precedence. The last third of the + * requests are only for reads. + */ + kstat.pgpgout++; + max_req = (NR_REQUEST * 2) / 3; + break; + default: + BUG(); + kiobuf->errno = -EINVAL; + goto end_io; + } + + /* + * Creation of bounce buffers for data in high memory + * should (is) be handled lower in the food-chain. + * Ccurrently done in scsi_merge.c for scsi disks. + * + * Look for a free request with spinlock held. + * Apart from atomic queue access, it prevents + * another thread that has already queued a kiobuf-request + * into this queue from starting it, till we are done. + */ + elevator = &q->elevator; + orig_latency = elevator_request_latency(elevator, rw); + spin_lock_irqsave(&io_request_lock,flags); + + if (list_empty(head)) + q->plug_device_fn(q, dev); + /* + * The scsi disk and cdrom drivers completely remove the request + * from the queue when they start processing an entry. For this + * reason it is safe to continue to add links to the top entry + * for those devices. + * + * All other drivers need to jump over the first entry, as that + * entry may be busy being processed and we thus can't change + * it. + */ + if (q->head_active && !q->plugged) + head = head->next; + + /* find an unused request. */ + req = get_request(max_req, dev); + + /* + * if no request available: if rw_ahead, forget it, + * otherwise try again blocking.. + */ + if (!req) { + spin_unlock_irqrestore(&io_request_lock,flags); + if (rw_ahead){ + kiobuf->errno = -EBUSY; + goto end_io; + } + req = __get_request_wait(max_req, dev); + spin_lock_irqsave(&io_request_lock,flags); + + /* revalidate elevator */ + head = &q->queue_head; + if (q->head_active && !q->plugged) + head = head->next; + } + + /* fill up the request-info, and add it to the queue */ + req->cmd = rw; + req->errors = 0; + req->sector = sector; + req->nr_hw_segments = 1; /* Always 1 for a new request. */ + req->nr_sectors = count; /* Length of kiobuf */ + req->sem = NULL; + req->kiobuf = kiobuf; + req->bh = NULL; + req->bhtail = NULL; + req->q = q; + /* Calculate req->buffer */ + curr_offset = kiobuf->offset; + for (kioind=0; kioindnr_pages; kioind++) + if (curr_offset >= PAGE_SIZE) + curr_offset -= PAGE_SIZE; + else + break; + req->buffer = (char *) page_address(kiobuf->maplist[kioind]) + + curr_offset; + + /* Calculate current_nr_sectors and # of scatter gather segments needed */ + total_bytes = kiobuf->length; + nr_bytes = (PAGE_SIZE - curr_offset) > total_bytes ? + total_bytes : (PAGE_SIZE - curr_offset); + req->current_nr_sectors = nr_bytes >> 9; + + for (nr_seg = 1; + kioindnr_pages && nr_bytes != total_bytes; + kioind++) { + ++nr_seg; + if((nr_bytes + PAGE_SIZE) > total_bytes){ + break; + } else { + nr_bytes += PAGE_SIZE; + } + } + req->nr_segments = nr_seg; + + add_request(q, req, head, orig_latency); + elevator_account_request(elevator, req); + + spin_unlock_irqrestore(&io_request_lock, flags); + +end_io: + return; +} + + + +/* + * Function: ll_rw_kio() + * + * Purpose: Insert kiobuf-based request into request queue. + * + * Arguments: rw - read/write + * kiobuf - collection of pages + * dev - device against which I/O requested + * blocknr - dev block number at which to start I/O + * sector - units (512B or other) of blocknr + * error - return status + * + * Lock status: Assumed no lock held upon entry. + * Assumed that the pages in the kiobuf ___ARE LOCKED DOWN___. + * + * Returns: Nothing + * + * Notes: This function is called from any subsystem using kiovec[] + * collection of kiobufs for I/O (e.g. `pagebufs', raw-io). + * Relies on "kiobuf" field in the request structure. + */ +void ll_rw_kio(int rw, + struct kiobuf *kiobuf, + kdev_t dev, + unsigned long blocknr, + size_t sector, + int *error) +{ + request_queue_t *q; + /* + * Only support SCSI disk for now. + * + * ENOSYS to indicate caller + * should try ll_rw_block() + * for non-SCSI (e.g. IDE) disks + * and for MD requests. + */ + if (!SCSI_DISK_MAJOR(MAJOR(dev)) || + (MAJOR(dev) == MD_MAJOR)) { + *error = -ENOSYS; + goto end_io; + } + /* + * Sanity checks + */ + q = blk_get_queue(dev); + if (!q) { + printk(KERN_ERR + "ll_rw_kio: Nnonexistent block-device %s\n", + kdevname(dev)); + *error = -ENODEV; + goto end_io; + } + if ((rw & WRITE) && is_read_only(dev)) { + printk(KERN_NOTICE "Can't write to read-only device %s\n", + kdevname(dev)); + *error = -EPERM; + goto end_io; + } + if (q->make_request_fn) { + printk(KERN_ERR + "ll_rw_kio: Unexpected device [%s] queueing function encountered\n", + kdevname(dev)); + *error = -ENOSYS; + goto end_io; + } + + __make_kio_request(q, rw, kiobuf, dev, blocknr, sector); + if (kiobuf->errno != 0) { + *error = kiobuf->errno; + goto end_io; + } + + return; +end_io: + /* + * We come here only on an error so, just set + * kiobuf->errno and call the completion fn. + */ + if(kiobuf->errno == 0) + kiobuf->errno = *error; +} + + #ifdef CONFIG_STRAM_SWAP extern int stram_device_init (void); #endif @@ -1079,3 +1387,5 @@ EXPORT_SYMBOL(blk_queue_pluggable); EXPORT_SYMBOL(blk_queue_make_request); EXPORT_SYMBOL(generic_make_request); +EXPORT_SYMBOL(__make_kio_request); +EXPORT_SYMBOL(ll_rw_kio); --- pre9.2-sct/drivers/char/raw.c Tue May 23 14:25:36 2000 +++ pre9.2-sct+mine/drivers/char/raw.c Mon May 22 19:00:09 2000 @@ -238,6 +238,63 @@ #define SECTOR_SIZE (1U << SECTOR_BITS) #define SECTOR_MASK (SECTOR_SIZE - 1) +/* + * IO completion routine for a kiobuf-based request. + */ +static void end_kiobuf_io_kiobuf(struct kiobuf *kiobuf) +{ + kiobuf->locked = 0; + if (atomic_dec_and_test(&kiobuf->io_count)) + wake_up(&kiobuf->wait_queue); +} + +/* + * Send I/O down the ll_rw_kio() path first. + * It is assumed that any requisite locking + * and unlocking of pages in the kiobuf has + * been taken care of by the caller. + * + * Return 0 if I/O should be retried on buffer_head path. + * Return number of transferred bytes if successful. + * Return -1 value, if there was an I/O error. + */ +static inline int try_kiobuf_io(struct kiobuf *iobuf, + int rw, + unsigned long blocknr, + kdev_t dev, + char *buf, + size_t sector_size) +{ + int err, retval; + + iobuf->end_io = end_kiobuf_io_kiobuf; + iobuf->errno = 0; + iobuf->locked = 1; + atomic_inc(&iobuf->io_count); + err = 0; + ll_rw_kio(rw, iobuf, dev, blocknr, sector_size, &err); + + if ( err == 0 ) { + kiobuf_wait_for_io(iobuf); + if (iobuf->errno == 0) { + retval = iobuf->length; /* Success */ + } else { + retval = -1; /* I/O error */ + } + } else { + atomic_dec(&iobuf->io_count); + if ( err == -ENOSYS ) { + retval = 0; /* Retry the buffer_head path */ + } else { + retval = -1; /* I/O error */ + } + } + + iobuf->locked = 0; + return retval; +} + + ssize_t rw_raw_dev(int rw, struct file *filp, char *buf, size_t size, loff_t *offp) { @@ -254,7 +311,7 @@ int sector_size, sector_bits, sector_mask; int max_sectors; - + int kiobuf_io = 1; /* * First, a few checks on device size limits */ @@ -290,17 +347,17 @@ if (err) return err; + blocknr = *offp >> sector_bits; /* - * Split the IO into KIO_MAX_SECTORS chunks, mapping and - * unmapping the single kiobuf as we go to perform each chunk of - * IO. + * Try sending down the entire kiobuf first via ll_rw_kio(). + * If not successful then, split the IO into KIO_MAX_SECTORS + * chunks, mapping and unmapping the single kiobuf as we go + * to perform each chunk of IO. */ - - transferred = 0; - blocknr = *offp >> sector_bits; + err = transferred = 0; while (size > 0) { blocks = size >> sector_bits; - if (blocks > max_sectors) + if ((blocks > max_sectors) && (kiobuf_io == 0)) blocks = max_sectors; if (blocks > limit - blocknr) blocks = limit - blocknr; @@ -318,11 +375,19 @@ if (err) break; #endif - - for (i=0; i < blocks; i++) - b[i] = blocknr++; - - err = brw_kiovec(rw, 1, &iobuf, dev, b, sector_size); + if (kiobuf_io == 0) { + for (i=0; i < blocks; i++) + b[i] = blocknr++; + err = brw_kiovec(rw, 1, &iobuf, dev, b, sector_size); + } else { + err = try_kiobuf_io(iobuf, rw, blocknr, dev, buf, sector_size); + if ( err > 0 ) { + blocknr += (err >> sector_bits); + } else if ( err == 0 ) { + kiobuf_io = 0; + continue; + } /* else (err<0) => (err!=iosize); exit loop below */ + } if (err >= 0) { transferred += err; --- pre9.2-sct/drivers/scsi/scsi_lib.c Tue May 23 14:24:21 2000 +++ pre9.2-sct+mine/drivers/scsi/scsi_lib.c Tue May 23 14:42:31 2000 @@ -15,6 +15,8 @@ * a low-level driver if they wished. Note however that this file also * contains the "default" versions of these functions, as we don't want to * go through and retrofit queueing functions into all 30 some-odd drivers. + * + * Support for kiobuf-based I/O requests. [Chaitanya Tumuluri, chait@sgi.com] */ #define __NO_VERSION__ @@ -370,6 +372,161 @@ spin_unlock_irqrestore(&io_request_lock, flags); } + +/* + * Function: __scsi_collect_bh_sectors() + * + * Purpose: Helper routine for __scsi_end_request() to mark some number + * (or all, if that is the case) of sectors complete. + * + * Arguments: req - request struct. from scsi command block. + * uptodate - 1 if I/O indicates success, 0 for I/O error. + * sectors - number of sectors we want to mark. + * leftovers- indicates if any sectors were not done. + * + * Lock status: Assumed that lock is not held upon entry. + * + * Returns: Nothing + * + * Notes: Separate buffer-head processing from kiobuf processing + */ +__inline static void __scsi_collect_bh_sectors(struct request *req, + int uptodate, + int sectors, + char **leftovers) +{ + struct buffer_head *bh; + + do { + if ((bh = req->bh) != NULL) { + req->bh = bh->b_reqnext; + req->nr_sectors -= bh->b_size >> 9; + req->sector += bh->b_size >> 9; + bh->b_reqnext = NULL; + sectors -= bh->b_size >> 9; + bh->b_end_io(bh, uptodate); + if ((bh = req->bh) != NULL) { + req->current_nr_sectors = bh->b_size >> 9; + if (req->nr_sectors < req->current_nr_sectors) { + req->nr_sectors = req->current_nr_sectors; + printk("collect_bh: buffer-list destroyed\n"); + } + } + } + } while (sectors && bh); + + /* Check for leftovers */ + if (req->bh) { + *leftovers = req->bh->b_data; + } + return; + +} + + +/* + * Function: __scsi_collect_kio_sectors() + * + * Purpose: Helper routine for __scsi_end_request() to mark some number + * (or all) of the I/O sectors and attendant pages complete. + * Updates the request nr_segments, nr_sectors accordingly. + * + * Arguments: req - request struct. from scsi command block. + * uptodate - 1 if I/O indicates success, 0 for I/O error. + * sectors - number of sectors we want to mark. + * leftovers- indicates if any sectors were not done. + * + * Lock status: Assumed that lock is not held upon entry. + * + * Returns: Nothing + * + * Notes: Separate buffer-head processing from kiobuf processing. + * We don't know if this was a single or multi-segment sgl + * request. Treat it as though it were a multi-segment one. + */ +__inline static void __scsi_collect_kio_sectors(struct request *req, + int uptodate, + int sectors, + char **leftovers) +{ + int pgcnt, nr_pages; + size_t curr_offset; + unsigned long va = 0; + unsigned int nr_bytes, total_bytes, page_sectors; + + nr_pages = req->kiobuf->nr_pages; + total_bytes = (req->nr_sectors << 9); + curr_offset = req->kiobuf->offset; + + /* + * In the case of leftover requests, the kiobuf->length + * remains the same, but req->nr_sectors would be smaller. + * Adjust curr_offset in this case. If not a leftover, + * the following makes no difference. + */ + curr_offset += (((req->kiobuf->length >> 9) - req->nr_sectors) << 9); + + /* How far into the kiobuf is the offset? */ + for (pgcnt=0; pgcnt= PAGE_SIZE) { + curr_offset -= PAGE_SIZE; + continue; + } else { + break; + } + } + /* + * Reusing the pgcnt and va value from above: + * Harvest pages to account for number of sectors + * passed into function. + */ + for (nr_bytes = 0; + pgcntkiobuf->maplist[pgcnt]) + + curr_offset; + /* First page or final page? Partial page? */ + if (curr_offset != 0) { + page_sectors = (PAGE_SIZE - curr_offset) > total_bytes ? + total_bytes >> 9 : (PAGE_SIZE - curr_offset) >> 9; + curr_offset = 0; + } else if((nr_bytes + PAGE_SIZE) > total_bytes) { + page_sectors = (total_bytes - nr_bytes) >> 9; + } else { + page_sectors = PAGE_SIZE >> 9; + } + nr_bytes += (page_sectors << 9); + /* Leftover sectors in this page (onward)? */ + if (sectors < page_sectors) { + req->nr_sectors -= sectors; + req->sector += sectors; + req->current_nr_sectors = page_sectors - sectors; + va += (sectors << 9); /* Update for req->buffer */ + sectors = 0; + break; + } else { + /* Mark this page as done */ + req->nr_segments--; /* No clustering for kiobuf */ + req->nr_sectors -= page_sectors; + req->sector += page_sectors; + if (!uptodate && (req->kiobuf->errno != 0)){ + req->kiobuf->errno = -EIO; + } + sectors -= page_sectors; + } + } + + /* Check for leftovers */ + if (req->nr_sectors) { + *leftovers = (char *)va; + } else if (req->kiobuf->end_io) { + req->kiobuf->end_io(req->kiobuf); + } + + return; +} + + /* * Function: scsi_end_request() * @@ -397,7 +554,7 @@ int requeue) { struct request *req; - struct buffer_head *bh; + char * leftovers = NULL; ASSERT_LOCK(&io_request_lock, 0); @@ -407,39 +564,29 @@ printk(" I/O error: dev %s, sector %lu\n", kdevname(req->rq_dev), req->sector); } - do { - if ((bh = req->bh) != NULL) { - req->bh = bh->b_reqnext; - req->nr_sectors -= bh->b_size >> 9; - req->sector += bh->b_size >> 9; - bh->b_reqnext = NULL; - sectors -= bh->b_size >> 9; - bh->b_end_io(bh, uptodate); - if ((bh = req->bh) != NULL) { - req->current_nr_sectors = bh->b_size >> 9; - if (req->nr_sectors < req->current_nr_sectors) { - req->nr_sectors = req->current_nr_sectors; - printk("scsi_end_request: buffer-list destroyed\n"); - } - } - } - } while (sectors && bh); + leftovers = NULL; + if (req->bh != NULL) { /* Buffer head based request */ + __scsi_collect_bh_sectors(req, uptodate, sectors, &leftovers); + } else if (req->kiobuf != NULL) { /* Kiobuf based request */ + __scsi_collect_kio_sectors(req, uptodate, sectors, &leftovers); + } else { + panic("Both bh and kiobuf pointers are unset in request!\n"); + } /* * If there are blocks left over at the end, set up the command * to queue the remainder of them. */ - if (req->bh) { + if (leftovers != NULL) { request_queue_t *q; - if( !requeue ) - { + if( !requeue ) { return SCpnt; } q = &SCpnt->device->request_queue; - req->buffer = bh->b_data; + req->buffer = leftovers; /* * Bleah. Leftovers again. Stick the leftovers in * the front of the queue, and goose the queue again. --- pre9.2-sct/drivers/scsi/scsi_merge.c Tue May 23 14:24:22 2000 +++ pre9.2-sct+mine/drivers/scsi/scsi_merge.c Tue May 23 14:23:29 2000 @@ -6,6 +6,7 @@ * Based upon conversations with large numbers * of people at Linux Expo. * Support for dynamic DMA mapping: Jakub Jelinek (jakub@redhat.com). + * Support for kiobuf-based I/O requests. [Chaitanya Tumuluri, chait@sgi.com] */ /* @@ -90,12 +91,13 @@ printk("nr_segments is %x\n", req->nr_segments); printk("counted segments is %x\n", segments); printk("Flags %d %d\n", use_clustering, dma_host); - for (bh = req->bh; bh->b_reqnext != NULL; bh = bh->b_reqnext) - { - printk("Segment 0x%p, blocks %d, addr 0x%lx\n", - bh, - bh->b_size >> 9, - virt_to_phys(bh->b_data - 1)); + if (req->bh != NULL) { + for (bh = req->bh; bh->b_reqnext != NULL; bh = bh->b_reqnext) { + printk("Segment 0x%p, blocks %d, addr 0x%lx\n", + bh, + bh->b_size >> 9, + virt_to_phys(bh->b_data - 1)); + } } panic("Ththththaats all folks. Too dangerous to continue.\n"); } @@ -298,9 +300,22 @@ SHpnt = SCpnt->host; SDpnt = SCpnt->device; - req->nr_segments = __count_segments(req, - CLUSTERABLE_DEVICE(SHpnt, SDpnt), - SHpnt->unchecked_isa_dma, NULL); + if (req->kiobuf) { + /* Since there is no clustering/merging in kiobuf + * requests, the nr_segments is simply a count of + * the number of pages needing I/O. nr_segments is + * updated in __scsi_collect_kio_sectors() called + * from scsi_end_request(), for the leftover case. + * [chait@sgi.com] + */ + return; + } else if (req->bh) { + req->nr_segments = __count_segments(req, + CLUSTERABLE_DEVICE(SHpnt, SDpnt), + SHpnt->unchecked_isa_dma, NULL); + } else { + panic("Both kiobuf and bh pointers are NULL!"); + } } #define MERGEABLE_BUFFERS(X,Y) \ @@ -745,6 +760,191 @@ MERGEREQFCT(scsi_merge_requests_fn_, 0, 0) MERGEREQFCT(scsi_merge_requests_fn_c, 1, 0) MERGEREQFCT(scsi_merge_requests_fn_dc, 1, 1) + + + +/* + * Function: scsi_bh_sgl() + * + * Purpose: Helper routine to construct S(catter) G(ather) L(ist) + * assuming buffer_head-based request in the Scsi_Cmnd. + * + * Arguments: SCpnt - Command descriptor + * use_clustering - 1 if host uses clustering + * dma_host - 1 if this host has ISA DMA issues (bus doesn't + * expose all of the address lines, so that DMA cannot + * be done from an arbitrary address). + * sgpnt - pointer to sgl + * + * Returns: Number of sg segments in the sgl. + * + * Notes: Only the SCpnt argument should be a non-constant variable. + * This functionality was abstracted out of the original code + * in __init_io(). + */ +__inline static int scsi_bh_sgl(Scsi_Cmnd * SCpnt, + int use_clustering, + int dma_host, + struct scatterlist * sgpnt) +{ + int count; + struct buffer_head * bh; + struct buffer_head * bhprev; + + bhprev = NULL; + + for (count = 0, bh = SCpnt->request.bh; + bh; bh = bh->b_reqnext) { + if (use_clustering && bhprev != NULL) { + if (dma_host && + virt_to_phys(bhprev->b_data) - 1 == ISA_DMA_THRESHOLD) { + /* Nothing - fall through */ + } else if (CONTIGUOUS_BUFFERS(bhprev, bh)) { + /* + * This one is OK. Let it go. Note that we + * do not have the ability to allocate + * bounce buffer segments > PAGE_SIZE, so + * for now we limit the thing. + */ + if( dma_host ) { +#ifdef DMA_SEGMENT_SIZE_LIMITED + if( virt_to_phys(bh->b_data) - 1 < ISA_DMA_THRESHOLD + || sgpnt[count - 1].length + bh->b_size <= PAGE_SIZE ) { + sgpnt[count - 1].length += bh->b_size; + bhprev = bh; + continue; + } +#else + sgpnt[count - 1].length += bh->b_size; + bhprev = bh; + continue; +#endif + } else { + sgpnt[count - 1].length += bh->b_size; + SCpnt->request_bufflen += bh->b_size; + bhprev = bh; + continue; + } + } + } + count++; + sgpnt[count - 1].address = bh->b_data; + sgpnt[count - 1].length += bh->b_size; + if (!dma_host) { + SCpnt->request_bufflen += bh->b_size; + } + bhprev = bh; + } + + return count; +} + + +/* + * Function: scsi_kio_sgl() + * + * Purpose: Helper routine to construct S(catter) G(ather) L(ist) + * assuming kiobuf-based request in the Scsi_Cmnd. + * + * Arguments: SCpnt - Command descriptor + * dma_host - 1 if this host has ISA DMA issues (bus doesn't + * expose all of the address lines, so that DMA cannot + * be done from an arbitrary address). + * sgpnt - pointer to sgl + * + * Returns: Number of sg segments in the sgl. + * + * Notes: Only the SCpnt argument should be a non-constant variable. + * This functionality was created out of __ini_io() in the + * original implementation for constructing the sgl for + * kiobuf-based I/Os as well. + * + * Constructs SCpnt->use_sg sgl segments for the kiobuf. + * + * No clustering of pages is attempted unlike the buffer_head + * case. Primarily because the pages in a kiobuf are unlikely to + * be contiguous. Bears checking. + */ +__inline static int scsi_kio_sgl(Scsi_Cmnd * SCpnt, + int dma_host, + struct scatterlist * sgpnt) +{ + int pgcnt, nr_seg, curr_seg, nr_sectors; + size_t curr_offset; + unsigned long va; + unsigned int nr_bytes, total_bytes, sgl_seg_bytes; + + curr_seg = SCpnt->use_sg; /* This many sgl segments */ + nr_sectors = SCpnt->request.nr_sectors; + total_bytes = (nr_sectors << 9); + curr_offset = SCpnt->request.kiobuf->offset; + + /* + * In the case of leftover requests, the kiobuf->length + * remains the same, but req->nr_sectors would be smaller. + * Use this difference to adjust curr_offset in this case. + * If not a leftover, the following makes no difference. + */ + curr_offset += (((SCpnt->request.kiobuf->length >> 9) - nr_sectors) << 9); + /* How far into the kiobuf is the offset? */ + for (pgcnt=0; pgcntrequest.kiobuf->nr_pages; pgcnt++) { + if(curr_offset >= PAGE_SIZE) { + curr_offset -= PAGE_SIZE; + continue; + } else { + break; + } + } + /* + * Reusing the pgcnt value from above: + * Starting at the right page and offset, build curr_seg + * sgl segments (one per page). Account for both a + * potentially partial last page and unrequired pages + * at the end of the kiobuf. + */ + nr_bytes = 0; + for (nr_seg = 0; nr_seg < curr_seg; nr_seg++) { + va = page_address(SCpnt->request.kiobuf->maplist[pgcnt]) + + curr_offset; + ++pgcnt; + + /* + * If this is the first page, account for offset. + * If this the final (maybe partial) page, get remainder. + */ + if (curr_offset != 0) { + sgl_seg_bytes = PAGE_SIZE - curr_offset; + curr_offset = 0; + } else if((nr_bytes + PAGE_SIZE) > total_bytes) { + sgl_seg_bytes = total_bytes - nr_bytes; + } else { + sgl_seg_bytes = PAGE_SIZE; + } + + nr_bytes += sgl_seg_bytes; + sgpnt[nr_seg].address = (char *)va; + sgpnt[nr_seg].alt_address = 0; + sgpnt[nr_seg].length = sgl_seg_bytes; + + if (!dma_host) { + SCpnt->request_bufflen += sgl_seg_bytes; + } + } + /* Sanity Check */ + if ((nr_bytes > total_bytes) || + (pgcnt > SCpnt->request.kiobuf->nr_pages)) { + printk(KERN_ERR + "scsi_kio_sgl: sgl bytes[%d], request bytes[%d]\n" + "scsi_kio_sgl: pgcnt[%d], kiobuf->pgcnt[%d]!\n", + nr_bytes, total_bytes, pgcnt, SCpnt->request.kiobuf->nr_pages); + BUG(); + } + return nr_seg; + +} + + + /* * Function: __init_io() * @@ -777,6 +977,9 @@ * gather list, the sg count in the request won't be valid * (mainly because we don't need queue management functions * which keep the tally uptodate. + * + * Modified to handle kiobuf argument in the SCpnt->request + * structure. */ __inline static int __init_io(Scsi_Cmnd * SCpnt, int sg_count_valid, @@ -784,7 +987,6 @@ int dma_host) { struct buffer_head * bh; - struct buffer_head * bhprev; char * buff; int count; int i; @@ -799,11 +1001,11 @@ * needed any more. Need to play with it and see if we hit the * panic. If not, then don't bother. */ - if (!SCpnt->request.bh) { + if ((!SCpnt->request.bh && !SCpnt->request.kiobuf) || + (SCpnt->request.bh && SCpnt->request.kiobuf)) { /* - * Case of page request (i.e. raw device), or unlinked buffer - * Typically used for swapping, but this isn't how we do - * swapping any more. + * Case of unlinked buffer. Typically used for swapping, + * but this isn't how we do swapping any more. */ panic("I believe this is dead code. If we hit this, I was wrong"); #if 0 @@ -819,6 +1021,12 @@ req = &SCpnt->request; /* * First we need to know how many scatter gather segments are needed. + * + * Redundant test per comment below indicating sg_count_valid is always + * set to 1.(ll_rw_blk.c's estimate of req->nr_segments is always trusted). + * + * count is initialized in ll_rw_kio() for the kiobuf path and since these + * requests are never merged, the counts are stay valid. */ if (!sg_count_valid) { count = __count_segments(req, use_clustering, dma_host, NULL); @@ -842,12 +1050,24 @@ this_count = SCpnt->request.nr_sectors; goto single_segment; } + /* Check if size of the sgl would be greater than the size + * of the host sgl table. In which case, limit the sgl size. + * When the request sectors are harvested after completion of + * I/O in __scsi_collect_kio_sectors, the additional sectors + * will be reinjected into the request queue as a special cmd. + * This will be done till all the request sectors are done. + * [chait@sgi.com] + */ + if((SCpnt->request.kiobuf != NULL) && + (count > SCpnt->host->sg_tablesize)) { + count = SCpnt->host->sg_tablesize - 1; + } SCpnt->use_sg = count; - /* * Allocate the actual scatter-gather table itself. * scsi_malloc can only allocate in chunks of 512 bytes */ + SCpnt->sglist_len = (SCpnt->use_sg * sizeof(struct scatterlist) + 511) & ~511; @@ -872,51 +1092,14 @@ memset(sgpnt, 0, SCpnt->use_sg * sizeof(struct scatterlist)); SCpnt->request_buffer = (char *) sgpnt; SCpnt->request_bufflen = 0; - bhprev = NULL; - for (count = 0, bh = SCpnt->request.bh; - bh; bh = bh->b_reqnext) { - if (use_clustering && bhprev != NULL) { - if (dma_host && - virt_to_phys(bhprev->b_data) - 1 == ISA_DMA_THRESHOLD) { - /* Nothing - fall through */ - } else if (CONTIGUOUS_BUFFERS(bhprev, bh)) { - /* - * This one is OK. Let it go. Note that we - * do not have the ability to allocate - * bounce buffer segments > PAGE_SIZE, so - * for now we limit the thing. - */ - if( dma_host ) { -#ifdef DMA_SEGMENT_SIZE_LIMITED - if( virt_to_phys(bh->b_data) - 1 < ISA_DMA_THRESHOLD - || sgpnt[count - 1].length + bh->b_size <= PAGE_SIZE ) { - sgpnt[count - 1].length += bh->b_size; - bhprev = bh; - continue; - } -#else - sgpnt[count - 1].length += bh->b_size; - bhprev = bh; - continue; -#endif - } else { - sgpnt[count - 1].length += bh->b_size; - SCpnt->request_bufflen += bh->b_size; - bhprev = bh; - continue; - } - } - } - count++; - sgpnt[count - 1].address = bh->b_data; - sgpnt[count - 1].length += bh->b_size; - if (!dma_host) { - SCpnt->request_bufflen += bh->b_size; - } - bhprev = bh; + if (SCpnt->request.bh){ + count = scsi_bh_sgl(SCpnt, use_clustering, dma_host, sgpnt); + } else if (SCpnt->request.kiobuf) { + count = scsi_kio_sgl(SCpnt, dma_host, sgpnt); + } else { + panic("Yowza! Both kiobuf and buffer_head pointers are null!"); } - /* * Verify that the count is correct. */ @@ -1009,6 +1192,17 @@ scsi_free(SCpnt->request_buffer, SCpnt->sglist_len); /* + * Shouldn't ever get here for a kiobuf request. + * + * Since each segment is a page and also, we couldn't + * allocate bounce buffers for even the first page, + * this means that the DMA buffer pool is exhausted! + */ + if (SCpnt->request.kiobuf){ + dma_exhausted(SCpnt, 0); + } + + /* * Make an attempt to pick up as much as we reasonably can. * Just keep adding sectors until the pool starts running kind of * low. The limit of 30 is somewhat arbitrary - the point is that @@ -1043,7 +1237,6 @@ * segment. Possibly the entire request, or possibly a small * chunk of the entire request. */ - bh = SCpnt->request.bh; buff = SCpnt->request.buffer; if (dma_host) { @@ -1052,7 +1245,7 @@ * back and allocate a really small one - enough to satisfy * the first buffer. */ - if (virt_to_phys(SCpnt->request.bh->b_data) + if (virt_to_phys(SCpnt->request.buffer) + (this_count << 9) - 1 > ISA_DMA_THRESHOLD) { buff = (char *) scsi_malloc(this_count << 9); if (!buff) { @@ -1152,3 +1345,21 @@ SDpnt->scsi_init_io_fn = scsi_init_io_vdc; } } +/* + * Overrides for Emacs so that we almost follow Linus's tabbing style. + * Emacs will notice this stuff at the end of the file and automatically + * adjust the settings for this buffer only. This must remain at the end + * of the file. + * --------------------------------------------------------------------------- + * Local variables: + * c-indent-level: 4 + * c-brace-imaginary-offset: 0 + * c-brace-offset: -4 + * c-argdecl-indent: 4 + * c-label-offset: -4 + * c-continued-statement-offset: 4 + * c-continued-brace-offset: 0 + * indent-tabs-mode: nil + * tab-width: 8 + * End: + */ --- pre9.2-sct/drivers/scsi/sd.c Tue May 23 14:24:21 2000 +++ pre9.2-sct+mine/drivers/scsi/sd.c Mon May 22 17:53:29 2000 @@ -546,6 +546,7 @@ static void rw_intr(Scsi_Cmnd * SCpnt) { int result = SCpnt->result; + #if CONFIG_SCSI_LOGGING char nbuff[6]; #endif @@ -575,8 +576,14 @@ (SCpnt->sense_buffer[4] << 16) | (SCpnt->sense_buffer[5] << 8) | SCpnt->sense_buffer[6]; - if (SCpnt->request.bh != NULL) - block_sectors = SCpnt->request.bh->b_size >> 9; + + /* Tweak to support kiobuf-based I/O requests, [chait@sgi.com] */ + if (SCpnt->request.kiobuf != NULL) + block_sectors = SCpnt->request.kiobuf->length >> 9; + else if (SCpnt->request.bh != NULL) + block_sectors = SCpnt->request.bh->b_size >> 9; + else + panic("Both kiobuf and bh pointers are null!\n"); switch (SCpnt->device->sector_size) { case 1024: error_sector <<= 1; --- pre9.2-sct/include/linux/blkdev.h Tue May 23 14:24:35 2000 +++ pre9.2-sct+mine/include/linux/blkdev.h Tue May 23 13:48:35 2000 @@ -6,6 +6,7 @@ #include #include #include +#include struct request_queue; typedef struct request_queue request_queue_t; @@ -39,6 +40,7 @@ void * special; char * buffer; struct semaphore * sem; + struct kiobuf * kiobuf; struct buffer_head * bh; struct buffer_head * bhtail; request_queue_t * q; --- pre9.2-sct/include/linux/elevator.h Tue May 23 14:24:36 2000 +++ pre9.2-sct+mine/include/linux/elevator.h Mon May 22 19:05:15 2000 @@ -107,7 +107,12 @@ elevator->sequence++; if (req->cmd == READ) elevator->read_pendings++; - elevator->nr_segments++; + + if (req->kiobuf != NULL) { + elevator->nr_segments += req->nr_segments; + } else { + elevator->nr_segments++; + } } static inline int elevator_request_latency(elevator_t * elevator, int rw) --- pre9.2-sct/include/linux/fs.h Tue May 23 14:24:34 2000 +++ pre9.2-sct+mine/include/linux/fs.h Mon May 22 17:56:47 2000 @@ -1063,6 +1063,7 @@ extern struct buffer_head * get_hash_table(kdev_t, int, int); extern struct buffer_head * getblk(kdev_t, int, int); extern void ll_rw_block(int, int, struct buffer_head * bh[]); +extern void ll_rw_kio(int , struct kiobuf *, kdev_t, unsigned long, size_t, int *); extern int is_read_only(kdev_t); extern void __brelse(struct buffer_head *); static inline void brelse(struct buffer_head *buf) --- pre9.2-sct/include/linux/iobuf.h Tue May 23 14:25:30 2000 +++ pre9.2-sct+mine/include/linux/iobuf.h Mon May 22 18:01:30 2000 @@ -56,6 +56,7 @@ atomic_t io_count; /* IOs still in progress */ int errno; /* Status of completed IO */ void (*end_io) (struct kiobuf *); /* Completion callback */ + void *k_dev_id; /* Store kiovec (or pagebuf) here */ wait_queue_head_t wait_queue; }; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/