[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new WRR sched, updated TBF and "real" ingres implementation




Sorry, couldnt respond sooner, part of that survivor island ;->
(now watch me disappear again;->)

On Mon, 14 Aug 2000 devik@cdi.cz wrote:

> Ahhh..  Just now I took look at 8390.c (used by ne2000) and there is
> function:
> ei_tx_intr(struct device *dev) which is called after packet completion
> and it contains mark_bh (NET_BH) call. BH's are then called at end of ISR so 
> that net_bh should be called soon after packet finish. 

A bit out of the ordinary. Do you know any one else doing this?
So in addition to input interrupts, they also generate tx_complete
interupts per packet? doesnt sound very sensible unless there is another
reason hidden for this particular case.

> And thus dequeue event too.

Not really. It _might_ initiate a packet dequeue if there was a packet
enqueued (otherwise it is 'wasted effort' -- its like the chicken and egg,
which came first)
But yes if you had a way to have everything 'pulled' by the tx_complete
interupt only, then you are set. The above scenario obviously doesnt make
sense (two interupts per packet, and how do you bootstrap it etc)

> But you are right that there is too many factors which can delay dequeue
> event from previous packet finish.
> 
> Let's talk about current CBQ impl. Let's assume it is correct.
> I spend some time to understand ANK's time integrator. Now I'm
> nearly sure that it is used to minimize errors in EOS times.
> These errors are introduced by queuing inside of drivers.

Errors are not introduced by queueing inside the drivers for this case, 
but rather because you cant do accurate shaping without approximations
with variable size packets. Errors are caused by: deviations of estimated
interpacket departures vs real departure times and deviations of estimated
average packet size vs real packet size etc.
Like i said earlier, it is easy to shape for cells, you can make them
depart at predefined interpacket times and you dont have to worry about
packet size variance
 
> The q->now time is always max(real_time, expected_send_time).
> So that q->now==q->now_rt in case that dequeue events come in
> rate of HW device bw (at most). When they come faster (because of
> empty driver's queues being filled) q->now > q->now_rt to ensure
> that q->now rate doesn't exceed HW rate.
> After some time q->now will again catch the q->now_rt up. 
> Ok so now q->now is a bit corrected time of dequeue event. But
> still it IS based on dequeue event ! But ANK calls it EOS time 
> and uses it as EOS time.

It is EOS mostly because you cant determine when the last bit of 
the packet hit the wire (i.e left the NIC).

> Yes it is workaround (as mentioned by AKN's note) but he also
> said that "it is very close to ideal solution".

Under moderate to high input loads, you are very close to the ideal
solution. Imagine a 1Mbps input.

> Moreover the dequeue could be called more often than EOSes occur
> because net_bh is activated by all net drivers, by timers and
> by packet reieves. But calling of qdisc's dequeue is bounded
> by dev->tbusy test so that the mis-dequeues should not be 
> a big problem.
> So that when you say that I can't use dequeue event as EOS time
> why ANK's cbq code uses it ??

Under a constant packet arrival, there is almost no issue, the 
dequeue event is almost good because it is kicked by arriving packets.
Under low loads you have to resort to the timer/HZ granularity and the
dequeue event is a bad thing to trust. CBQ makes that assumption (as you
point out), which is very valid.

> Ahh yes. If I understood you correctly, TBF with peak set has still
> problem. Peak has bucket size set to MTU, when two smaller packets
> arive, they are still send at wire's speed, aren't it ?
> 

No.
If you specify the peak rate, then there is further bounding "in the short
term" by the peak rate otherwise, that idle-flow problem i described
earlier hits you and you send at wire speed otherwise you will be sending
at the specified rate.
The peak bucket is not restricted to MTU. Infact by default it is 2KB
but you can change this. 
In summary, "over a long period of time", you will see a flow rate
equivalent to the rate you requested. In the short term, you will see
what the peak rate defines. If you dont define the peak rate, then in the
short term you'll see wire-rate output i.e peak rate becomes
wire-rate. That is not to say at bit-times the packets are not sent at
wire speed. Not sure this helped or confused you further.

Werner, i think we need to update the doc ;->

cheers,
jamal