From: Chaitanya Tumuluri <chait@getafix.engr.sgi.com>
To: "Stephen C. Tweedie" <sct@redhat.com>
Cc: chait@sgi.com, Eric Youngdale <eric@andante.org>,
Alan Cox <alan@lxorguk.ukuu.org.uk>,
Douglas Gilbert <dgilbert@interlog.com>,
Brian Pomerantz <bapper@piratehaven.org>,
linux-scsi@vger.rutgers.edu, linux-mm@kvack.org
Subject: Re: PATCH: Enhance queueing/scsi-midlayer to handle kiobufs. [Re: Request splits]
Date: Tue, 23 May 2000 14:58:34 -0700 [thread overview]
Message-ID: <200005232158.OAA77313@getafix.engr.sgi.com> (raw)
In-Reply-To: Your message of "Fri, 19 May 2000 16:09:58 BST." <20000519160958.C9961@redhat.com>
On Fri, 19 May 2000 16:09:58 BST, "Stephen C. Tweedie" <sct@redhat.com> wrote:
>Hi,
>
>On Thu, May 18, 2000 at 12:55:04PM -0700, Chaitanya Tumuluri wrote:
>
> < stuff deleted >
>
>> So, I enhanced Stephen Tweedie's
>> raw I/O and the queueing/scsi layers to handle kiobufs-based requests. This is
>> in addition to the current buffer_head based request processing.
>
>The "current" kiobuf code is in ftp.uk.linux.org:/pub/linux/sct/fs/raw-io/.
>It includes a number of bug fixes (mainly rationalising the error returns),
>plus a few new significant bits of functionality. If you can get me a
>patch against those diffs, I'll include your new code in the main kiobuf
>patchset. (I'm still maintaining the different kiobuf patches as
>separate patches within that patchset tarball.)
>
Stephen and others,
Here's my patch against the 2.3.99.pre9-2 patchset from your site. The main
differences from my earlier post are:
- removed the #ifdefs around my code as Stephen Tweedie suggested,
- corrected indentation problems pointed out earlier (Eric/Alan).
Finally, I'd like to repeat that given the consensus about moving away from
buffer-head based I/O in the future, it makes sense for me to retain the
little bit of code duplication. This is in the interests of easy surgery
when we do remove the buffer-head I/O paths.
While I see decent (upto 10%) improvement in b/w and turnaround time for
I/O to a single disk, the biggest impact is the (almost 40%) reduction
in CPU utilization with the new codepath. These are from simple `lmdd' tests
timed with /usr/bin/time.
Based on further feedback from this audience, I would like to propose this
change to Linus at some point as a general scsi mechanism to handle
kiobuf-based requests.
Thanks much,
-Chait.
----------------------------CUT HERE---------------------------------------
--- pre9.2-sct/drivers/block/ll_rw_blk.c Tue May 23 14:24:22 2000
+++ pre9.2-sct+mine/drivers/block/ll_rw_blk.c Tue May 23 14:38:20 2000
@@ -4,6 +4,7 @@
* Copyright (C) 1991, 1992 Linus Torvalds
* Copyright (C) 1994, Karl Keyte: Added support for disk statistics
* Elevator latency, (C) 2000 Andrea Arcangeli <andrea@suse.de> SuSE
+ * Support for kiobuf-based I/O requests: Chaitanya Tumuluri [chait@sgi.com]
*/
/*
@@ -639,7 +640,8 @@
starving = 1;
if (latency < 0)
continue;
-
+ if (req->kiobuf)
+ continue;
if (req->sem)
continue;
if (req->cmd != rw)
@@ -744,6 +746,7 @@
req->nr_hw_segments = 1; /* Always 1 for a new request. */
req->buffer = bh->b_data;
req->sem = NULL;
+ req->kiobuf = NULL;
req->bh = bh;
req->bhtail = bh;
req->q = q;
@@ -886,6 +889,311 @@
__ll_rw_block(rw, nr, bh, 1);
}
+/*
+ * Function: __make_kio_request()
+ *
+ * Purpose: Construct a kiobuf-based request and insert into request queue.
+ *
+ * Arguments: q - request queue of device
+ * rw - read/write
+ * kiobuf - collection of pages
+ * dev - device against which I/O requested
+ * blocknr - dev block number at which to start I/O
+ * blksize - units (512B or other) of blocknr
+ *
+ * Lock status: No lock held upon entry.
+ *
+ * Returns: Nothing
+ *
+ * Notes: Requests generated by this function should _NOT_ be merged by
+ * the __make_request() (new check for `req->kiobuf')
+ *
+ * All (relevant) req->Y parameters are expressed in sector size
+ * of 512B for kiobuf based I/O. This is assumed in the scsi
+ * mid-layer as well.
+ */
+static inline void __make_kio_request(request_queue_t * q,
+ int rw,
+ struct kiobuf * kiobuf,
+ kdev_t dev,
+ unsigned long blocknr,
+ size_t blksize)
+{
+ int major = MAJOR(dev);
+ unsigned int sector, count, nr_bytes, total_bytes, nr_seg;
+ struct request * req;
+ int rw_ahead, max_req;
+ unsigned long flags;
+ struct list_head * head = &q->queue_head;
+ size_t curr_offset;
+ int orig_latency;
+ elevator_t * elevator;
+ int correct_size, i, kioind;
+
+ /*
+ * Sanity Tests:
+ *
+ * The input arg. `blocknr' is in units of the
+ * input arg. `blksize' (inode->i_sb->s_blocksize).
+ * Convert to 512B unit used in blk_size[] array.
+ */
+ count = kiobuf->length >> 9;
+ sector = blocknr * (blksize >> 9);
+
+ if (blk_size[major]) {
+ unsigned long maxsector = (blk_size[major][MINOR(dev)] << 1) + 1;
+
+ if (maxsector < count || maxsector - count < sector) {
+ if (!blk_size[major][MINOR(dev)]) {
+ kiobuf->errno = -EINVAL;
+ goto end_io;
+ }
+ /* This may well happen - the kernel calls bread()
+ without checking the size of the device, e.g.,
+ when mounting a device. */
+ printk(KERN_INFO
+ "attempt to access beyond end of device\n");
+ printk(KERN_INFO "%s: rw=%d, want=%d, limit=%d\n",
+ kdevname(dev), rw,
+ (sector + count)>>1,
+ blk_size[major][MINOR(dev)]);
+ kiobuf->errno = -ESPIPE;
+ goto end_io;
+ }
+ }
+ /*
+ * Allow only basic block size multiples in the
+ * kiobuf->length.
+ */
+ correct_size = BLOCK_SIZE;
+ if (blksize_size[major]) {
+ i = blksize_size[major][MINOR(dev)];
+ if (i)
+ correct_size = i;
+ }
+ if ((kiobuf->length % correct_size) != 0) {
+ printk(KERN_NOTICE "ll_rw_kio: "
+ "request size [%d] not a multiple of device [%s] block-size [%d]\n",
+ kiobuf->length,
+ kdevname(dev),
+ correct_size);
+ kiobuf->errno = -EINVAL;
+ goto end_io;
+ }
+ rw_ahead = 0; /* normal case; gets changed below for READA */
+ switch (rw) {
+ case READA:
+ rw_ahead = 1;
+ rw = READ; /* drop into READ */
+ case READ:
+ kstat.pgpgin++;
+ max_req = NR_REQUEST; /* reads take precedence */
+ break;
+ case WRITERAW:
+ rw = WRITE;
+ goto do_write; /* Skip the buffer refile */
+ case WRITE:
+ do_write:
+ /*
+ * We don't allow the write-requests to fill up the
+ * queue completely: we want some room for reads,
+ * as they take precedence. The last third of the
+ * requests are only for reads.
+ */
+ kstat.pgpgout++;
+ max_req = (NR_REQUEST * 2) / 3;
+ break;
+ default:
+ BUG();
+ kiobuf->errno = -EINVAL;
+ goto end_io;
+ }
+
+ /*
+ * Creation of bounce buffers for data in high memory
+ * should (is) be handled lower in the food-chain.
+ * Ccurrently done in scsi_merge.c for scsi disks.
+ *
+ * Look for a free request with spinlock held.
+ * Apart from atomic queue access, it prevents
+ * another thread that has already queued a kiobuf-request
+ * into this queue from starting it, till we are done.
+ */
+ elevator = &q->elevator;
+ orig_latency = elevator_request_latency(elevator, rw);
+ spin_lock_irqsave(&io_request_lock,flags);
+
+ if (list_empty(head))
+ q->plug_device_fn(q, dev);
+ /*
+ * The scsi disk and cdrom drivers completely remove the request
+ * from the queue when they start processing an entry. For this
+ * reason it is safe to continue to add links to the top entry
+ * for those devices.
+ *
+ * All other drivers need to jump over the first entry, as that
+ * entry may be busy being processed and we thus can't change
+ * it.
+ */
+ if (q->head_active && !q->plugged)
+ head = head->next;
+
+ /* find an unused request. */
+ req = get_request(max_req, dev);
+
+ /*
+ * if no request available: if rw_ahead, forget it,
+ * otherwise try again blocking..
+ */
+ if (!req) {
+ spin_unlock_irqrestore(&io_request_lock,flags);
+ if (rw_ahead){
+ kiobuf->errno = -EBUSY;
+ goto end_io;
+ }
+ req = __get_request_wait(max_req, dev);
+ spin_lock_irqsave(&io_request_lock,flags);
+
+ /* revalidate elevator */
+ head = &q->queue_head;
+ if (q->head_active && !q->plugged)
+ head = head->next;
+ }
+
+ /* fill up the request-info, and add it to the queue */
+ req->cmd = rw;
+ req->errors = 0;
+ req->sector = sector;
+ req->nr_hw_segments = 1; /* Always 1 for a new request. */
+ req->nr_sectors = count; /* Length of kiobuf */
+ req->sem = NULL;
+ req->kiobuf = kiobuf;
+ req->bh = NULL;
+ req->bhtail = NULL;
+ req->q = q;
+ /* Calculate req->buffer */
+ curr_offset = kiobuf->offset;
+ for (kioind=0; kioind<kiobuf->nr_pages; kioind++)
+ if (curr_offset >= PAGE_SIZE)
+ curr_offset -= PAGE_SIZE;
+ else
+ break;
+ req->buffer = (char *) page_address(kiobuf->maplist[kioind]) +
+ curr_offset;
+
+ /* Calculate current_nr_sectors and # of scatter gather segments needed */
+ total_bytes = kiobuf->length;
+ nr_bytes = (PAGE_SIZE - curr_offset) > total_bytes ?
+ total_bytes : (PAGE_SIZE - curr_offset);
+ req->current_nr_sectors = nr_bytes >> 9;
+
+ for (nr_seg = 1;
+ kioind<kiobuf->nr_pages && nr_bytes != total_bytes;
+ kioind++) {
+ ++nr_seg;
+ if((nr_bytes + PAGE_SIZE) > total_bytes){
+ break;
+ } else {
+ nr_bytes += PAGE_SIZE;
+ }
+ }
+ req->nr_segments = nr_seg;
+
+ add_request(q, req, head, orig_latency);
+ elevator_account_request(elevator, req);
+
+ spin_unlock_irqrestore(&io_request_lock, flags);
+
+end_io:
+ return;
+}
+
+
+
+/*
+ * Function: ll_rw_kio()
+ *
+ * Purpose: Insert kiobuf-based request into request queue.
+ *
+ * Arguments: rw - read/write
+ * kiobuf - collection of pages
+ * dev - device against which I/O requested
+ * blocknr - dev block number at which to start I/O
+ * sector - units (512B or other) of blocknr
+ * error - return status
+ *
+ * Lock status: Assumed no lock held upon entry.
+ * Assumed that the pages in the kiobuf ___ARE LOCKED DOWN___.
+ *
+ * Returns: Nothing
+ *
+ * Notes: This function is called from any subsystem using kiovec[]
+ * collection of kiobufs for I/O (e.g. `pagebufs', raw-io).
+ * Relies on "kiobuf" field in the request structure.
+ */
+void ll_rw_kio(int rw,
+ struct kiobuf *kiobuf,
+ kdev_t dev,
+ unsigned long blocknr,
+ size_t sector,
+ int *error)
+{
+ request_queue_t *q;
+ /*
+ * Only support SCSI disk for now.
+ *
+ * ENOSYS to indicate caller
+ * should try ll_rw_block()
+ * for non-SCSI (e.g. IDE) disks
+ * and for MD requests.
+ */
+ if (!SCSI_DISK_MAJOR(MAJOR(dev)) ||
+ (MAJOR(dev) == MD_MAJOR)) {
+ *error = -ENOSYS;
+ goto end_io;
+ }
+ /*
+ * Sanity checks
+ */
+ q = blk_get_queue(dev);
+ if (!q) {
+ printk(KERN_ERR
+ "ll_rw_kio: Nnonexistent block-device %s\n",
+ kdevname(dev));
+ *error = -ENODEV;
+ goto end_io;
+ }
+ if ((rw & WRITE) && is_read_only(dev)) {
+ printk(KERN_NOTICE "Can't write to read-only device %s\n",
+ kdevname(dev));
+ *error = -EPERM;
+ goto end_io;
+ }
+ if (q->make_request_fn) {
+ printk(KERN_ERR
+ "ll_rw_kio: Unexpected device [%s] queueing function encountered\n",
+ kdevname(dev));
+ *error = -ENOSYS;
+ goto end_io;
+ }
+
+ __make_kio_request(q, rw, kiobuf, dev, blocknr, sector);
+ if (kiobuf->errno != 0) {
+ *error = kiobuf->errno;
+ goto end_io;
+ }
+
+ return;
+end_io:
+ /*
+ * We come here only on an error so, just set
+ * kiobuf->errno and call the completion fn.
+ */
+ if(kiobuf->errno == 0)
+ kiobuf->errno = *error;
+}
+
+
#ifdef CONFIG_STRAM_SWAP
extern int stram_device_init (void);
#endif
@@ -1079,3 +1387,5 @@
EXPORT_SYMBOL(blk_queue_pluggable);
EXPORT_SYMBOL(blk_queue_make_request);
EXPORT_SYMBOL(generic_make_request);
+EXPORT_SYMBOL(__make_kio_request);
+EXPORT_SYMBOL(ll_rw_kio);
--- pre9.2-sct/drivers/char/raw.c Tue May 23 14:25:36 2000
+++ pre9.2-sct+mine/drivers/char/raw.c Mon May 22 19:00:09 2000
@@ -238,6 +238,63 @@
#define SECTOR_SIZE (1U << SECTOR_BITS)
#define SECTOR_MASK (SECTOR_SIZE - 1)
+/*
+ * IO completion routine for a kiobuf-based request.
+ */
+static void end_kiobuf_io_kiobuf(struct kiobuf *kiobuf)
+{
+ kiobuf->locked = 0;
+ if (atomic_dec_and_test(&kiobuf->io_count))
+ wake_up(&kiobuf->wait_queue);
+}
+
+/*
+ * Send I/O down the ll_rw_kio() path first.
+ * It is assumed that any requisite locking
+ * and unlocking of pages in the kiobuf has
+ * been taken care of by the caller.
+ *
+ * Return 0 if I/O should be retried on buffer_head path.
+ * Return number of transferred bytes if successful.
+ * Return -1 value, if there was an I/O error.
+ */
+static inline int try_kiobuf_io(struct kiobuf *iobuf,
+ int rw,
+ unsigned long blocknr,
+ kdev_t dev,
+ char *buf,
+ size_t sector_size)
+{
+ int err, retval;
+
+ iobuf->end_io = end_kiobuf_io_kiobuf;
+ iobuf->errno = 0;
+ iobuf->locked = 1;
+ atomic_inc(&iobuf->io_count);
+ err = 0;
+ ll_rw_kio(rw, iobuf, dev, blocknr, sector_size, &err);
+
+ if ( err == 0 ) {
+ kiobuf_wait_for_io(iobuf);
+ if (iobuf->errno == 0) {
+ retval = iobuf->length; /* Success */
+ } else {
+ retval = -1; /* I/O error */
+ }
+ } else {
+ atomic_dec(&iobuf->io_count);
+ if ( err == -ENOSYS ) {
+ retval = 0; /* Retry the buffer_head path */
+ } else {
+ retval = -1; /* I/O error */
+ }
+ }
+
+ iobuf->locked = 0;
+ return retval;
+}
+
+
ssize_t rw_raw_dev(int rw, struct file *filp, char *buf,
size_t size, loff_t *offp)
{
@@ -254,7 +311,7 @@
int sector_size, sector_bits, sector_mask;
int max_sectors;
-
+ int kiobuf_io = 1;
/*
* First, a few checks on device size limits
*/
@@ -290,17 +347,17 @@
if (err)
return err;
+ blocknr = *offp >> sector_bits;
/*
- * Split the IO into KIO_MAX_SECTORS chunks, mapping and
- * unmapping the single kiobuf as we go to perform each chunk of
- * IO.
+ * Try sending down the entire kiobuf first via ll_rw_kio().
+ * If not successful then, split the IO into KIO_MAX_SECTORS
+ * chunks, mapping and unmapping the single kiobuf as we go
+ * to perform each chunk of IO.
*/
-
- transferred = 0;
- blocknr = *offp >> sector_bits;
+ err = transferred = 0;
while (size > 0) {
blocks = size >> sector_bits;
- if (blocks > max_sectors)
+ if ((blocks > max_sectors) && (kiobuf_io == 0))
blocks = max_sectors;
if (blocks > limit - blocknr)
blocks = limit - blocknr;
@@ -318,11 +375,19 @@
if (err)
break;
#endif
-
- for (i=0; i < blocks; i++)
- b[i] = blocknr++;
-
- err = brw_kiovec(rw, 1, &iobuf, dev, b, sector_size);
+ if (kiobuf_io == 0) {
+ for (i=0; i < blocks; i++)
+ b[i] = blocknr++;
+ err = brw_kiovec(rw, 1, &iobuf, dev, b, sector_size);
+ } else {
+ err = try_kiobuf_io(iobuf, rw, blocknr, dev, buf, sector_size);
+ if ( err > 0 ) {
+ blocknr += (err >> sector_bits);
+ } else if ( err == 0 ) {
+ kiobuf_io = 0;
+ continue;
+ } /* else (err<0) => (err!=iosize); exit loop below */
+ }
if (err >= 0) {
transferred += err;
--- pre9.2-sct/drivers/scsi/scsi_lib.c Tue May 23 14:24:21 2000
+++ pre9.2-sct+mine/drivers/scsi/scsi_lib.c Tue May 23 14:42:31 2000
@@ -15,6 +15,8 @@
* a low-level driver if they wished. Note however that this file also
* contains the "default" versions of these functions, as we don't want to
* go through and retrofit queueing functions into all 30 some-odd drivers.
+ *
+ * Support for kiobuf-based I/O requests. [Chaitanya Tumuluri, chait@sgi.com]
*/
#define __NO_VERSION__
@@ -370,6 +372,161 @@
spin_unlock_irqrestore(&io_request_lock, flags);
}
+
+/*
+ * Function: __scsi_collect_bh_sectors()
+ *
+ * Purpose: Helper routine for __scsi_end_request() to mark some number
+ * (or all, if that is the case) of sectors complete.
+ *
+ * Arguments: req - request struct. from scsi command block.
+ * uptodate - 1 if I/O indicates success, 0 for I/O error.
+ * sectors - number of sectors we want to mark.
+ * leftovers- indicates if any sectors were not done.
+ *
+ * Lock status: Assumed that lock is not held upon entry.
+ *
+ * Returns: Nothing
+ *
+ * Notes: Separate buffer-head processing from kiobuf processing
+ */
+__inline static void __scsi_collect_bh_sectors(struct request *req,
+ int uptodate,
+ int sectors,
+ char **leftovers)
+{
+ struct buffer_head *bh;
+
+ do {
+ if ((bh = req->bh) != NULL) {
+ req->bh = bh->b_reqnext;
+ req->nr_sectors -= bh->b_size >> 9;
+ req->sector += bh->b_size >> 9;
+ bh->b_reqnext = NULL;
+ sectors -= bh->b_size >> 9;
+ bh->b_end_io(bh, uptodate);
+ if ((bh = req->bh) != NULL) {
+ req->current_nr_sectors = bh->b_size >> 9;
+ if (req->nr_sectors < req->current_nr_sectors) {
+ req->nr_sectors = req->current_nr_sectors;
+ printk("collect_bh: buffer-list destroyed\n");
+ }
+ }
+ }
+ } while (sectors && bh);
+
+ /* Check for leftovers */
+ if (req->bh) {
+ *leftovers = req->bh->b_data;
+ }
+ return;
+
+}
+
+
+/*
+ * Function: __scsi_collect_kio_sectors()
+ *
+ * Purpose: Helper routine for __scsi_end_request() to mark some number
+ * (or all) of the I/O sectors and attendant pages complete.
+ * Updates the request nr_segments, nr_sectors accordingly.
+ *
+ * Arguments: req - request struct. from scsi command block.
+ * uptodate - 1 if I/O indicates success, 0 for I/O error.
+ * sectors - number of sectors we want to mark.
+ * leftovers- indicates if any sectors were not done.
+ *
+ * Lock status: Assumed that lock is not held upon entry.
+ *
+ * Returns: Nothing
+ *
+ * Notes: Separate buffer-head processing from kiobuf processing.
+ * We don't know if this was a single or multi-segment sgl
+ * request. Treat it as though it were a multi-segment one.
+ */
+__inline static void __scsi_collect_kio_sectors(struct request *req,
+ int uptodate,
+ int sectors,
+ char **leftovers)
+{
+ int pgcnt, nr_pages;
+ size_t curr_offset;
+ unsigned long va = 0;
+ unsigned int nr_bytes, total_bytes, page_sectors;
+
+ nr_pages = req->kiobuf->nr_pages;
+ total_bytes = (req->nr_sectors << 9);
+ curr_offset = req->kiobuf->offset;
+
+ /*
+ * In the case of leftover requests, the kiobuf->length
+ * remains the same, but req->nr_sectors would be smaller.
+ * Adjust curr_offset in this case. If not a leftover,
+ * the following makes no difference.
+ */
+ curr_offset += (((req->kiobuf->length >> 9) - req->nr_sectors) << 9);
+
+ /* How far into the kiobuf is the offset? */
+ for (pgcnt=0; pgcnt<nr_pages; pgcnt++) {
+ if(curr_offset >= PAGE_SIZE) {
+ curr_offset -= PAGE_SIZE;
+ continue;
+ } else {
+ break;
+ }
+ }
+ /*
+ * Reusing the pgcnt and va value from above:
+ * Harvest pages to account for number of sectors
+ * passed into function.
+ */
+ for (nr_bytes = 0;
+ pgcnt<nr_pages && nr_bytes != total_bytes;
+ pgcnt++) {
+ va = page_address(req->kiobuf->maplist[pgcnt])
+ + curr_offset;
+ /* First page or final page? Partial page? */
+ if (curr_offset != 0) {
+ page_sectors = (PAGE_SIZE - curr_offset) > total_bytes ?
+ total_bytes >> 9 : (PAGE_SIZE - curr_offset) >> 9;
+ curr_offset = 0;
+ } else if((nr_bytes + PAGE_SIZE) > total_bytes) {
+ page_sectors = (total_bytes - nr_bytes) >> 9;
+ } else {
+ page_sectors = PAGE_SIZE >> 9;
+ }
+ nr_bytes += (page_sectors << 9);
+ /* Leftover sectors in this page (onward)? */
+ if (sectors < page_sectors) {
+ req->nr_sectors -= sectors;
+ req->sector += sectors;
+ req->current_nr_sectors = page_sectors - sectors;
+ va += (sectors << 9); /* Update for req->buffer */
+ sectors = 0;
+ break;
+ } else {
+ /* Mark this page as done */
+ req->nr_segments--; /* No clustering for kiobuf */
+ req->nr_sectors -= page_sectors;
+ req->sector += page_sectors;
+ if (!uptodate && (req->kiobuf->errno != 0)){
+ req->kiobuf->errno = -EIO;
+ }
+ sectors -= page_sectors;
+ }
+ }
+
+ /* Check for leftovers */
+ if (req->nr_sectors) {
+ *leftovers = (char *)va;
+ } else if (req->kiobuf->end_io) {
+ req->kiobuf->end_io(req->kiobuf);
+ }
+
+ return;
+}
+
+
/*
* Function: scsi_end_request()
*
@@ -397,7 +554,7 @@
int requeue)
{
struct request *req;
- struct buffer_head *bh;
+ char * leftovers = NULL;
ASSERT_LOCK(&io_request_lock, 0);
@@ -407,39 +564,29 @@
printk(" I/O error: dev %s, sector %lu\n",
kdevname(req->rq_dev), req->sector);
}
- do {
- if ((bh = req->bh) != NULL) {
- req->bh = bh->b_reqnext;
- req->nr_sectors -= bh->b_size >> 9;
- req->sector += bh->b_size >> 9;
- bh->b_reqnext = NULL;
- sectors -= bh->b_size >> 9;
- bh->b_end_io(bh, uptodate);
- if ((bh = req->bh) != NULL) {
- req->current_nr_sectors = bh->b_size >> 9;
- if (req->nr_sectors < req->current_nr_sectors) {
- req->nr_sectors = req->current_nr_sectors;
- printk("scsi_end_request: buffer-list destroyed\n");
- }
- }
- }
- } while (sectors && bh);
+ leftovers = NULL;
+ if (req->bh != NULL) { /* Buffer head based request */
+ __scsi_collect_bh_sectors(req, uptodate, sectors, &leftovers);
+ } else if (req->kiobuf != NULL) { /* Kiobuf based request */
+ __scsi_collect_kio_sectors(req, uptodate, sectors, &leftovers);
+ } else {
+ panic("Both bh and kiobuf pointers are unset in request!\n");
+ }
/*
* If there are blocks left over at the end, set up the command
* to queue the remainder of them.
*/
- if (req->bh) {
+ if (leftovers != NULL) {
request_queue_t *q;
- if( !requeue )
- {
+ if( !requeue ) {
return SCpnt;
}
q = &SCpnt->device->request_queue;
- req->buffer = bh->b_data;
+ req->buffer = leftovers;
/*
* Bleah. Leftovers again. Stick the leftovers in
* the front of the queue, and goose the queue again.
--- pre9.2-sct/drivers/scsi/scsi_merge.c Tue May 23 14:24:22 2000
+++ pre9.2-sct+mine/drivers/scsi/scsi_merge.c Tue May 23 14:23:29 2000
@@ -6,6 +6,7 @@
* Based upon conversations with large numbers
* of people at Linux Expo.
* Support for dynamic DMA mapping: Jakub Jelinek (jakub@redhat.com).
+ * Support for kiobuf-based I/O requests. [Chaitanya Tumuluri, chait@sgi.com]
*/
/*
@@ -90,12 +91,13 @@
printk("nr_segments is %x\n", req->nr_segments);
printk("counted segments is %x\n", segments);
printk("Flags %d %d\n", use_clustering, dma_host);
- for (bh = req->bh; bh->b_reqnext != NULL; bh = bh->b_reqnext)
- {
- printk("Segment 0x%p, blocks %d, addr 0x%lx\n",
- bh,
- bh->b_size >> 9,
- virt_to_phys(bh->b_data - 1));
+ if (req->bh != NULL) {
+ for (bh = req->bh; bh->b_reqnext != NULL; bh = bh->b_reqnext) {
+ printk("Segment 0x%p, blocks %d, addr 0x%lx\n",
+ bh,
+ bh->b_size >> 9,
+ virt_to_phys(bh->b_data - 1));
+ }
}
panic("Ththththaats all folks. Too dangerous to continue.\n");
}
@@ -298,9 +300,22 @@
SHpnt = SCpnt->host;
SDpnt = SCpnt->device;
- req->nr_segments = __count_segments(req,
- CLUSTERABLE_DEVICE(SHpnt, SDpnt),
- SHpnt->unchecked_isa_dma, NULL);
+ if (req->kiobuf) {
+ /* Since there is no clustering/merging in kiobuf
+ * requests, the nr_segments is simply a count of
+ * the number of pages needing I/O. nr_segments is
+ * updated in __scsi_collect_kio_sectors() called
+ * from scsi_end_request(), for the leftover case.
+ * [chait@sgi.com]
+ */
+ return;
+ } else if (req->bh) {
+ req->nr_segments = __count_segments(req,
+ CLUSTERABLE_DEVICE(SHpnt, SDpnt),
+ SHpnt->unchecked_isa_dma, NULL);
+ } else {
+ panic("Both kiobuf and bh pointers are NULL!");
+ }
}
#define MERGEABLE_BUFFERS(X,Y) \
@@ -745,6 +760,191 @@
MERGEREQFCT(scsi_merge_requests_fn_, 0, 0)
MERGEREQFCT(scsi_merge_requests_fn_c, 1, 0)
MERGEREQFCT(scsi_merge_requests_fn_dc, 1, 1)
+
+
+
+/*
+ * Function: scsi_bh_sgl()
+ *
+ * Purpose: Helper routine to construct S(catter) G(ather) L(ist)
+ * assuming buffer_head-based request in the Scsi_Cmnd.
+ *
+ * Arguments: SCpnt - Command descriptor
+ * use_clustering - 1 if host uses clustering
+ * dma_host - 1 if this host has ISA DMA issues (bus doesn't
+ * expose all of the address lines, so that DMA cannot
+ * be done from an arbitrary address).
+ * sgpnt - pointer to sgl
+ *
+ * Returns: Number of sg segments in the sgl.
+ *
+ * Notes: Only the SCpnt argument should be a non-constant variable.
+ * This functionality was abstracted out of the original code
+ * in __init_io().
+ */
+__inline static int scsi_bh_sgl(Scsi_Cmnd * SCpnt,
+ int use_clustering,
+ int dma_host,
+ struct scatterlist * sgpnt)
+{
+ int count;
+ struct buffer_head * bh;
+ struct buffer_head * bhprev;
+
+ bhprev = NULL;
+
+ for (count = 0, bh = SCpnt->request.bh;
+ bh; bh = bh->b_reqnext) {
+ if (use_clustering && bhprev != NULL) {
+ if (dma_host &&
+ virt_to_phys(bhprev->b_data) - 1 == ISA_DMA_THRESHOLD) {
+ /* Nothing - fall through */
+ } else if (CONTIGUOUS_BUFFERS(bhprev, bh)) {
+ /*
+ * This one is OK. Let it go. Note that we
+ * do not have the ability to allocate
+ * bounce buffer segments > PAGE_SIZE, so
+ * for now we limit the thing.
+ */
+ if( dma_host ) {
+#ifdef DMA_SEGMENT_SIZE_LIMITED
+ if( virt_to_phys(bh->b_data) - 1 < ISA_DMA_THRESHOLD
+ || sgpnt[count - 1].length + bh->b_size <= PAGE_SIZE ) {
+ sgpnt[count - 1].length += bh->b_size;
+ bhprev = bh;
+ continue;
+ }
+#else
+ sgpnt[count - 1].length += bh->b_size;
+ bhprev = bh;
+ continue;
+#endif
+ } else {
+ sgpnt[count - 1].length += bh->b_size;
+ SCpnt->request_bufflen += bh->b_size;
+ bhprev = bh;
+ continue;
+ }
+ }
+ }
+ count++;
+ sgpnt[count - 1].address = bh->b_data;
+ sgpnt[count - 1].length += bh->b_size;
+ if (!dma_host) {
+ SCpnt->request_bufflen += bh->b_size;
+ }
+ bhprev = bh;
+ }
+
+ return count;
+}
+
+
+/*
+ * Function: scsi_kio_sgl()
+ *
+ * Purpose: Helper routine to construct S(catter) G(ather) L(ist)
+ * assuming kiobuf-based request in the Scsi_Cmnd.
+ *
+ * Arguments: SCpnt - Command descriptor
+ * dma_host - 1 if this host has ISA DMA issues (bus doesn't
+ * expose all of the address lines, so that DMA cannot
+ * be done from an arbitrary address).
+ * sgpnt - pointer to sgl
+ *
+ * Returns: Number of sg segments in the sgl.
+ *
+ * Notes: Only the SCpnt argument should be a non-constant variable.
+ * This functionality was created out of __ini_io() in the
+ * original implementation for constructing the sgl for
+ * kiobuf-based I/Os as well.
+ *
+ * Constructs SCpnt->use_sg sgl segments for the kiobuf.
+ *
+ * No clustering of pages is attempted unlike the buffer_head
+ * case. Primarily because the pages in a kiobuf are unlikely to
+ * be contiguous. Bears checking.
+ */
+__inline static int scsi_kio_sgl(Scsi_Cmnd * SCpnt,
+ int dma_host,
+ struct scatterlist * sgpnt)
+{
+ int pgcnt, nr_seg, curr_seg, nr_sectors;
+ size_t curr_offset;
+ unsigned long va;
+ unsigned int nr_bytes, total_bytes, sgl_seg_bytes;
+
+ curr_seg = SCpnt->use_sg; /* This many sgl segments */
+ nr_sectors = SCpnt->request.nr_sectors;
+ total_bytes = (nr_sectors << 9);
+ curr_offset = SCpnt->request.kiobuf->offset;
+
+ /*
+ * In the case of leftover requests, the kiobuf->length
+ * remains the same, but req->nr_sectors would be smaller.
+ * Use this difference to adjust curr_offset in this case.
+ * If not a leftover, the following makes no difference.
+ */
+ curr_offset += (((SCpnt->request.kiobuf->length >> 9) - nr_sectors) << 9);
+ /* How far into the kiobuf is the offset? */
+ for (pgcnt=0; pgcnt<SCpnt->request.kiobuf->nr_pages; pgcnt++) {
+ if(curr_offset >= PAGE_SIZE) {
+ curr_offset -= PAGE_SIZE;
+ continue;
+ } else {
+ break;
+ }
+ }
+ /*
+ * Reusing the pgcnt value from above:
+ * Starting at the right page and offset, build curr_seg
+ * sgl segments (one per page). Account for both a
+ * potentially partial last page and unrequired pages
+ * at the end of the kiobuf.
+ */
+ nr_bytes = 0;
+ for (nr_seg = 0; nr_seg < curr_seg; nr_seg++) {
+ va = page_address(SCpnt->request.kiobuf->maplist[pgcnt])
+ + curr_offset;
+ ++pgcnt;
+
+ /*
+ * If this is the first page, account for offset.
+ * If this the final (maybe partial) page, get remainder.
+ */
+ if (curr_offset != 0) {
+ sgl_seg_bytes = PAGE_SIZE - curr_offset;
+ curr_offset = 0;
+ } else if((nr_bytes + PAGE_SIZE) > total_bytes) {
+ sgl_seg_bytes = total_bytes - nr_bytes;
+ } else {
+ sgl_seg_bytes = PAGE_SIZE;
+ }
+
+ nr_bytes += sgl_seg_bytes;
+ sgpnt[nr_seg].address = (char *)va;
+ sgpnt[nr_seg].alt_address = 0;
+ sgpnt[nr_seg].length = sgl_seg_bytes;
+
+ if (!dma_host) {
+ SCpnt->request_bufflen += sgl_seg_bytes;
+ }
+ }
+ /* Sanity Check */
+ if ((nr_bytes > total_bytes) ||
+ (pgcnt > SCpnt->request.kiobuf->nr_pages)) {
+ printk(KERN_ERR
+ "scsi_kio_sgl: sgl bytes[%d], request bytes[%d]\n"
+ "scsi_kio_sgl: pgcnt[%d], kiobuf->pgcnt[%d]!\n",
+ nr_bytes, total_bytes, pgcnt, SCpnt->request.kiobuf->nr_pages);
+ BUG();
+ }
+ return nr_seg;
+
+}
+
+
+
/*
* Function: __init_io()
*
@@ -777,6 +977,9 @@
* gather list, the sg count in the request won't be valid
* (mainly because we don't need queue management functions
* which keep the tally uptodate.
+ *
+ * Modified to handle kiobuf argument in the SCpnt->request
+ * structure.
*/
__inline static int __init_io(Scsi_Cmnd * SCpnt,
int sg_count_valid,
@@ -784,7 +987,6 @@
int dma_host)
{
struct buffer_head * bh;
- struct buffer_head * bhprev;
char * buff;
int count;
int i;
@@ -799,11 +1001,11 @@
* needed any more. Need to play with it and see if we hit the
* panic. If not, then don't bother.
*/
- if (!SCpnt->request.bh) {
+ if ((!SCpnt->request.bh && !SCpnt->request.kiobuf) ||
+ (SCpnt->request.bh && SCpnt->request.kiobuf)) {
/*
- * Case of page request (i.e. raw device), or unlinked buffer
- * Typically used for swapping, but this isn't how we do
- * swapping any more.
+ * Case of unlinked buffer. Typically used for swapping,
+ * but this isn't how we do swapping any more.
*/
panic("I believe this is dead code. If we hit this, I was wrong");
#if 0
@@ -819,6 +1021,12 @@
req = &SCpnt->request;
/*
* First we need to know how many scatter gather segments are needed.
+ *
+ * Redundant test per comment below indicating sg_count_valid is always
+ * set to 1.(ll_rw_blk.c's estimate of req->nr_segments is always trusted).
+ *
+ * count is initialized in ll_rw_kio() for the kiobuf path and since these
+ * requests are never merged, the counts are stay valid.
*/
if (!sg_count_valid) {
count = __count_segments(req, use_clustering, dma_host, NULL);
@@ -842,12 +1050,24 @@
this_count = SCpnt->request.nr_sectors;
goto single_segment;
}
+ /* Check if size of the sgl would be greater than the size
+ * of the host sgl table. In which case, limit the sgl size.
+ * When the request sectors are harvested after completion of
+ * I/O in __scsi_collect_kio_sectors, the additional sectors
+ * will be reinjected into the request queue as a special cmd.
+ * This will be done till all the request sectors are done.
+ * [chait@sgi.com]
+ */
+ if((SCpnt->request.kiobuf != NULL) &&
+ (count > SCpnt->host->sg_tablesize)) {
+ count = SCpnt->host->sg_tablesize - 1;
+ }
SCpnt->use_sg = count;
-
/*
* Allocate the actual scatter-gather table itself.
* scsi_malloc can only allocate in chunks of 512 bytes
*/
+
SCpnt->sglist_len = (SCpnt->use_sg
* sizeof(struct scatterlist) + 511) & ~511;
@@ -872,51 +1092,14 @@
memset(sgpnt, 0, SCpnt->use_sg * sizeof(struct scatterlist));
SCpnt->request_buffer = (char *) sgpnt;
SCpnt->request_bufflen = 0;
- bhprev = NULL;
- for (count = 0, bh = SCpnt->request.bh;
- bh; bh = bh->b_reqnext) {
- if (use_clustering && bhprev != NULL) {
- if (dma_host &&
- virt_to_phys(bhprev->b_data) - 1 == ISA_DMA_THRESHOLD) {
- /* Nothing - fall through */
- } else if (CONTIGUOUS_BUFFERS(bhprev, bh)) {
- /*
- * This one is OK. Let it go. Note that we
- * do not have the ability to allocate
- * bounce buffer segments > PAGE_SIZE, so
- * for now we limit the thing.
- */
- if( dma_host ) {
-#ifdef DMA_SEGMENT_SIZE_LIMITED
- if( virt_to_phys(bh->b_data) - 1 < ISA_DMA_THRESHOLD
- || sgpnt[count - 1].length + bh->b_size <= PAGE_SIZE ) {
- sgpnt[count - 1].length += bh->b_size;
- bhprev = bh;
- continue;
- }
-#else
- sgpnt[count - 1].length += bh->b_size;
- bhprev = bh;
- continue;
-#endif
- } else {
- sgpnt[count - 1].length += bh->b_size;
- SCpnt->request_bufflen += bh->b_size;
- bhprev = bh;
- continue;
- }
- }
- }
- count++;
- sgpnt[count - 1].address = bh->b_data;
- sgpnt[count - 1].length += bh->b_size;
- if (!dma_host) {
- SCpnt->request_bufflen += bh->b_size;
- }
- bhprev = bh;
+ if (SCpnt->request.bh){
+ count = scsi_bh_sgl(SCpnt, use_clustering, dma_host, sgpnt);
+ } else if (SCpnt->request.kiobuf) {
+ count = scsi_kio_sgl(SCpnt, dma_host, sgpnt);
+ } else {
+ panic("Yowza! Both kiobuf and buffer_head pointers are null!");
}
-
/*
* Verify that the count is correct.
*/
@@ -1009,6 +1192,17 @@
scsi_free(SCpnt->request_buffer, SCpnt->sglist_len);
/*
+ * Shouldn't ever get here for a kiobuf request.
+ *
+ * Since each segment is a page and also, we couldn't
+ * allocate bounce buffers for even the first page,
+ * this means that the DMA buffer pool is exhausted!
+ */
+ if (SCpnt->request.kiobuf){
+ dma_exhausted(SCpnt, 0);
+ }
+
+ /*
* Make an attempt to pick up as much as we reasonably can.
* Just keep adding sectors until the pool starts running kind of
* low. The limit of 30 is somewhat arbitrary - the point is that
@@ -1043,7 +1237,6 @@
* segment. Possibly the entire request, or possibly a small
* chunk of the entire request.
*/
- bh = SCpnt->request.bh;
buff = SCpnt->request.buffer;
if (dma_host) {
@@ -1052,7 +1245,7 @@
* back and allocate a really small one - enough to satisfy
* the first buffer.
*/
- if (virt_to_phys(SCpnt->request.bh->b_data)
+ if (virt_to_phys(SCpnt->request.buffer)
+ (this_count << 9) - 1 > ISA_DMA_THRESHOLD) {
buff = (char *) scsi_malloc(this_count << 9);
if (!buff) {
@@ -1152,3 +1345,21 @@
SDpnt->scsi_init_io_fn = scsi_init_io_vdc;
}
}
+/*
+ * Overrides for Emacs so that we almost follow Linus's tabbing style.
+ * Emacs will notice this stuff at the end of the file and automatically
+ * adjust the settings for this buffer only. This must remain at the end
+ * of the file.
+ * ---------------------------------------------------------------------------
+ * Local variables:
+ * c-indent-level: 4
+ * c-brace-imaginary-offset: 0
+ * c-brace-offset: -4
+ * c-argdecl-indent: 4
+ * c-label-offset: -4
+ * c-continued-statement-offset: 4
+ * c-continued-brace-offset: 0
+ * indent-tabs-mode: nil
+ * tab-width: 8
+ * End:
+ */
--- pre9.2-sct/drivers/scsi/sd.c Tue May 23 14:24:21 2000
+++ pre9.2-sct+mine/drivers/scsi/sd.c Mon May 22 17:53:29 2000
@@ -546,6 +546,7 @@
static void rw_intr(Scsi_Cmnd * SCpnt)
{
int result = SCpnt->result;
+
#if CONFIG_SCSI_LOGGING
char nbuff[6];
#endif
@@ -575,8 +576,14 @@
(SCpnt->sense_buffer[4] << 16) |
(SCpnt->sense_buffer[5] << 8) |
SCpnt->sense_buffer[6];
- if (SCpnt->request.bh != NULL)
- block_sectors = SCpnt->request.bh->b_size >> 9;
+
+ /* Tweak to support kiobuf-based I/O requests, [chait@sgi.com] */
+ if (SCpnt->request.kiobuf != NULL)
+ block_sectors = SCpnt->request.kiobuf->length >> 9;
+ else if (SCpnt->request.bh != NULL)
+ block_sectors = SCpnt->request.bh->b_size >> 9;
+ else
+ panic("Both kiobuf and bh pointers are null!\n");
switch (SCpnt->device->sector_size) {
case 1024:
error_sector <<= 1;
--- pre9.2-sct/include/linux/blkdev.h Tue May 23 14:24:35 2000
+++ pre9.2-sct+mine/include/linux/blkdev.h Tue May 23 13:48:35 2000
@@ -6,6 +6,7 @@
#include <linux/genhd.h>
#include <linux/tqueue.h>
#include <linux/list.h>
+#include <linux/iobuf.h>
struct request_queue;
typedef struct request_queue request_queue_t;
@@ -39,6 +40,7 @@
void * special;
char * buffer;
struct semaphore * sem;
+ struct kiobuf * kiobuf;
struct buffer_head * bh;
struct buffer_head * bhtail;
request_queue_t * q;
--- pre9.2-sct/include/linux/elevator.h Tue May 23 14:24:36 2000
+++ pre9.2-sct+mine/include/linux/elevator.h Mon May 22 19:05:15 2000
@@ -107,7 +107,12 @@
elevator->sequence++;
if (req->cmd == READ)
elevator->read_pendings++;
- elevator->nr_segments++;
+
+ if (req->kiobuf != NULL) {
+ elevator->nr_segments += req->nr_segments;
+ } else {
+ elevator->nr_segments++;
+ }
}
static inline int elevator_request_latency(elevator_t * elevator, int rw)
--- pre9.2-sct/include/linux/fs.h Tue May 23 14:24:34 2000
+++ pre9.2-sct+mine/include/linux/fs.h Mon May 22 17:56:47 2000
@@ -1063,6 +1063,7 @@
extern struct buffer_head * get_hash_table(kdev_t, int, int);
extern struct buffer_head * getblk(kdev_t, int, int);
extern void ll_rw_block(int, int, struct buffer_head * bh[]);
+extern void ll_rw_kio(int , struct kiobuf *, kdev_t, unsigned long, size_t, int *);
extern int is_read_only(kdev_t);
extern void __brelse(struct buffer_head *);
static inline void brelse(struct buffer_head *buf)
--- pre9.2-sct/include/linux/iobuf.h Tue May 23 14:25:30 2000
+++ pre9.2-sct+mine/include/linux/iobuf.h Mon May 22 18:01:30 2000
@@ -56,6 +56,7 @@
atomic_t io_count; /* IOs still in progress */
int errno; /* Status of completed IO */
void (*end_io) (struct kiobuf *); /* Completion callback */
+ void *k_dev_id; /* Store kiovec (or pagebuf) here */
wait_queue_head_t wait_queue;
};
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
prev parent reply other threads:[~2000-05-23 21:58 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <00c201bfc0d7$56664db0$4d0310ac@fairfax.datafocus.com>
[not found] ` <200005181955.MAA71492@getafix.engr.sgi.com>
2000-05-19 15:09 ` Stephen C. Tweedie
2000-05-19 15:48 ` Brian Pomerantz
2000-05-19 15:55 ` Stephen C. Tweedie
2000-05-19 16:17 ` Brian Pomerantz
2000-05-19 18:00 ` Chaitanya Tumuluri
2000-05-19 18:11 ` Gérard Roudier
2000-05-19 19:24 ` Brian Pomerantz
2000-05-19 20:43 ` Gérard Roudier
2000-05-20 9:10 ` Change direct I/O memory model? [Was Re: PATCH: Enhance queueing/scsi-midlayer to handle kiobufs] Mark Mokryn
2000-05-19 17:53 ` PATCH: Enhance queueing/scsi-midlayer to handle kiobufs. [Re: Request splits] Chaitanya Tumuluri
2000-05-19 17:38 ` Chaitanya Tumuluri
2000-05-23 21:58 ` Chaitanya Tumuluri [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200005232158.OAA77313@getafix.engr.sgi.com \
--to=chait@getafix.engr.sgi.com \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=bapper@piratehaven.org \
--cc=chait@sgi.com \
--cc=dgilbert@interlog.com \
--cc=eric@andante.org \
--cc=linux-mm@kvack.org \
--cc=linux-scsi@vger.rutgers.edu \
--cc=sct@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox