Skip to content

Commit 6c51038

Browse files
committed
Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits) Documentation/iostats.txt: bit-size reference etc. cfq-iosched: removing unnecessary think time checking cfq-iosched: Don't clear queue stats when preempt. blk-throttle: Reset group slice when limits are changed blk-cgroup: Only give unaccounted_time under debug cfq-iosched: Don't set active queue in preempt block: fix non-atomic access to genhd inflight structures block: attempt to merge with existing requests on plug flush block: NULL dereference on error path in __blkdev_get() cfq-iosched: Don't update group weights when on service tree fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away block: Require subsystems to explicitly allocate bio_set integrity mempool jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging fs: make fsync_buffers_list() plug mm: make generic_writepages() use plugging blk-cgroup: Add unaccounted time to timeslice_used. block: fixup plugging stubs for !CONFIG_BLOCK block: remove obsolete comments for blkdev_issue_zeroout. blktrace: Use rq->cmd_flags directly in blk_add_trace_rq. ... Fix up conflicts in fs/{aio.c,super.c}
2 parents 3dab04e + 9d2e157 commit 6c51038

File tree

172 files changed

+1520
-2112
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

172 files changed

+1520
-2112
lines changed

Documentation/block/biodoc.txt

-5
Original file line numberDiff line numberDiff line change
@@ -963,11 +963,6 @@ elevator_dispatch_fn* fills the dispatch queue with ready requests.
963963

964964
elevator_add_req_fn* called to add a new request into the scheduler
965965

966-
elevator_queue_empty_fn returns true if the merge queue is empty.
967-
Drivers shouldn't use this, but rather check
968-
if elv_next_request is NULL (without losing the
969-
request if one exists!)
970-
971966
elevator_former_req_fn
972967
elevator_latter_req_fn These return the request before or after the
973968
one specified in disk sort order. Used by the

Documentation/cgroups/blkio-controller.txt

+1-29
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ Proportional weight policy files
140140
- Specifies per cgroup weight. This is default weight of the group
141141
on all the devices until and unless overridden by per device rule.
142142
(See blkio.weight_device).
143-
Currently allowed range of weights is from 100 to 1000.
143+
Currently allowed range of weights is from 10 to 1000.
144144

145145
- blkio.weight_device
146146
- One can specify per cgroup per device rules using this interface.
@@ -343,34 +343,6 @@ Common files among various policies
343343

344344
CFQ sysfs tunable
345345
=================
346-
/sys/block/<disk>/queue/iosched/group_isolation
347-
-----------------------------------------------
348-
349-
If group_isolation=1, it provides stronger isolation between groups at the
350-
expense of throughput. By default group_isolation is 0. In general that
351-
means that if group_isolation=0, expect fairness for sequential workload
352-
only. Set group_isolation=1 to see fairness for random IO workload also.
353-
354-
Generally CFQ will put random seeky workload in sync-noidle category. CFQ
355-
will disable idling on these queues and it does a collective idling on group
356-
of such queues. Generally these are slow moving queues and if there is a
357-
sync-noidle service tree in each group, that group gets exclusive access to
358-
disk for certain period. That means it will bring the throughput down if
359-
group does not have enough IO to drive deeper queue depths and utilize disk
360-
capacity to the fullest in the slice allocated to it. But the flip side is
361-
that even a random reader should get better latencies and overall throughput
362-
if there are lots of sequential readers/sync-idle workload running in the
363-
system.
364-
365-
If group_isolation=0, then CFQ automatically moves all the random seeky queues
366-
in the root group. That means there will be no service differentiation for
367-
that kind of workload. This leads to better throughput as we do collective
368-
idling on root sync-noidle tree.
369-
370-
By default one should run with group_isolation=0. If that is not sufficient
371-
and one wants stronger isolation between groups, then set group_isolation=1
372-
but this will come at cost of reduced throughput.
373-
374346
/sys/block/<disk>/queue/iosched/slice_idle
375347
------------------------------------------
376348
On a faster hardware CFQ can be slow, especially with sequential workload.

Documentation/iostats.txt

+8-9
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11
I/O statistics fields
22
---------------
33

4-
Last modified Sep 30, 2003
5-
64
Since 2.4.20 (and some versions before, with patches), and 2.5.45,
75
more extensive disk statistics have been introduced to help measure disk
86
activity. Tools such as sar and iostat typically interpret these and do
@@ -46,11 +44,12 @@ the above example, the first field of statistics would be 446216.
4644
By contrast, in 2.6 if you look at /sys/block/hda/stat, you'll
4745
find just the eleven fields, beginning with 446216. If you look at
4846
/proc/diskstats, the eleven fields will be preceded by the major and
49-
minor device numbers, and device name. Each of these formats provide
47+
minor device numbers, and device name. Each of these formats provides
5048
eleven fields of statistics, each meaning exactly the same things.
5149
All fields except field 9 are cumulative since boot. Field 9 should
52-
go to zero as I/Os complete; all others only increase. Yes, these are
53-
32 bit unsigned numbers, and on a very busy or long-lived system they
50+
go to zero as I/Os complete; all others only increase (unless they
51+
overflow and wrap). Yes, these are (32-bit or 64-bit) unsigned long
52+
(native word size) numbers, and on a very busy or long-lived system they
5453
may wrap. Applications should be prepared to deal with that; unless
5554
your observations are measured in large numbers of minutes or hours,
5655
they should not wrap twice before you notice them.
@@ -96,11 +95,11 @@ introduced when changes collide, so (for instance) adding up all the
9695
read I/Os issued per partition should equal those made to the disks ...
9796
but due to the lack of locking it may only be very close.
9897

99-
In 2.6, there are counters for each cpu, which made the lack of locking
100-
almost a non-issue. When the statistics are read, the per-cpu counters
101-
are summed (possibly overflowing the unsigned 32-bit variable they are
98+
In 2.6, there are counters for each CPU, which make the lack of locking
99+
almost a non-issue. When the statistics are read, the per-CPU counters
100+
are summed (possibly overflowing the unsigned long variable they are
102101
summed to) and the result given to the user. There is no convenient
103-
user interface for accessing the per-cpu counters themselves.
102+
user interface for accessing the per-CPU counters themselves.
104103

105104
Disks vs Partitions
106105
-------------------

block/blk-cgroup.c

+15-1
Original file line numberDiff line numberDiff line change
@@ -371,12 +371,14 @@ void blkiocg_update_io_remove_stats(struct blkio_group *blkg,
371371
}
372372
EXPORT_SYMBOL_GPL(blkiocg_update_io_remove_stats);
373373

374-
void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time)
374+
void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time,
375+
unsigned long unaccounted_time)
375376
{
376377
unsigned long flags;
377378

378379
spin_lock_irqsave(&blkg->stats_lock, flags);
379380
blkg->stats.time += time;
381+
blkg->stats.unaccounted_time += unaccounted_time;
380382
spin_unlock_irqrestore(&blkg->stats_lock, flags);
381383
}
382384
EXPORT_SYMBOL_GPL(blkiocg_update_timeslice_used);
@@ -604,6 +606,9 @@ static uint64_t blkio_get_stat(struct blkio_group *blkg,
604606
return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
605607
blkg->stats.sectors, cb, dev);
606608
#ifdef CONFIG_DEBUG_BLK_CGROUP
609+
if (type == BLKIO_STAT_UNACCOUNTED_TIME)
610+
return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
611+
blkg->stats.unaccounted_time, cb, dev);
607612
if (type == BLKIO_STAT_AVG_QUEUE_SIZE) {
608613
uint64_t sum = blkg->stats.avg_queue_size_sum;
609614
uint64_t samples = blkg->stats.avg_queue_size_samples;
@@ -1125,6 +1130,9 @@ static int blkiocg_file_read_map(struct cgroup *cgrp, struct cftype *cft,
11251130
return blkio_read_blkg_stats(blkcg, cft, cb,
11261131
BLKIO_STAT_QUEUED, 1);
11271132
#ifdef CONFIG_DEBUG_BLK_CGROUP
1133+
case BLKIO_PROP_unaccounted_time:
1134+
return blkio_read_blkg_stats(blkcg, cft, cb,
1135+
BLKIO_STAT_UNACCOUNTED_TIME, 0);
11281136
case BLKIO_PROP_dequeue:
11291137
return blkio_read_blkg_stats(blkcg, cft, cb,
11301138
BLKIO_STAT_DEQUEUE, 0);
@@ -1382,6 +1390,12 @@ struct cftype blkio_files[] = {
13821390
BLKIO_PROP_dequeue),
13831391
.read_map = blkiocg_file_read_map,
13841392
},
1393+
{
1394+
.name = "unaccounted_time",
1395+
.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
1396+
BLKIO_PROP_unaccounted_time),
1397+
.read_map = blkiocg_file_read_map,
1398+
},
13851399
#endif
13861400
};
13871401

block/blk-cgroup.h

+11-3
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ enum stat_type {
4949
/* All the single valued stats go below this */
5050
BLKIO_STAT_TIME,
5151
BLKIO_STAT_SECTORS,
52+
/* Time not charged to this cgroup */
53+
BLKIO_STAT_UNACCOUNTED_TIME,
5254
#ifdef CONFIG_DEBUG_BLK_CGROUP
5355
BLKIO_STAT_AVG_QUEUE_SIZE,
5456
BLKIO_STAT_IDLE_TIME,
@@ -81,6 +83,7 @@ enum blkcg_file_name_prop {
8183
BLKIO_PROP_io_serviced,
8284
BLKIO_PROP_time,
8385
BLKIO_PROP_sectors,
86+
BLKIO_PROP_unaccounted_time,
8487
BLKIO_PROP_io_service_time,
8588
BLKIO_PROP_io_wait_time,
8689
BLKIO_PROP_io_merged,
@@ -114,6 +117,8 @@ struct blkio_group_stats {
114117
/* total disk time and nr sectors dispatched by this group */
115118
uint64_t time;
116119
uint64_t sectors;
120+
/* Time not charged to this cgroup */
121+
uint64_t unaccounted_time;
117122
uint64_t stat_arr[BLKIO_STAT_QUEUED + 1][BLKIO_STAT_TOTAL];
118123
#ifdef CONFIG_DEBUG_BLK_CGROUP
119124
/* Sum of number of IOs queued across all samples */
@@ -240,7 +245,7 @@ static inline char *blkg_path(struct blkio_group *blkg) { return NULL; }
240245

241246
#endif
242247

243-
#define BLKIO_WEIGHT_MIN 100
248+
#define BLKIO_WEIGHT_MIN 10
244249
#define BLKIO_WEIGHT_MAX 1000
245250
#define BLKIO_WEIGHT_DEFAULT 500
246251

@@ -293,7 +298,8 @@ extern int blkiocg_del_blkio_group(struct blkio_group *blkg);
293298
extern struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg,
294299
void *key);
295300
void blkiocg_update_timeslice_used(struct blkio_group *blkg,
296-
unsigned long time);
301+
unsigned long time,
302+
unsigned long unaccounted_time);
297303
void blkiocg_update_dispatch_stats(struct blkio_group *blkg, uint64_t bytes,
298304
bool direction, bool sync);
299305
void blkiocg_update_completion_stats(struct blkio_group *blkg,
@@ -319,7 +325,9 @@ blkiocg_del_blkio_group(struct blkio_group *blkg) { return 0; }
319325
static inline struct blkio_group *
320326
blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key) { return NULL; }
321327
static inline void blkiocg_update_timeslice_used(struct blkio_group *blkg,
322-
unsigned long time) {}
328+
unsigned long time,
329+
unsigned long unaccounted_time)
330+
{}
323331
static inline void blkiocg_update_dispatch_stats(struct blkio_group *blkg,
324332
uint64_t bytes, bool direction, bool sync) {}
325333
static inline void blkiocg_update_completion_stats(struct blkio_group *blkg,

0 commit comments

Comments
 (0)