[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new WRR sched, updated TBF and "real" ingres implementation



Hello Jamal ! Let's continue in our discussion. Probably
this thread can help anyone who wants to know internals
of QoS and linux networking ;-).

> > > Did you limit the peak rate as well on TBF? I think it would work nicely
> > > if you did. Infact i think peak rate control would help a lot.
> >
> > not now, probably I will in the near future
> >
> 
> If you are using TBF as is, it is a config option.

Yes, but because I only used TBF estimation inside of CBQ
I had to reimplement it. I implemented only one rate
limiter yet. It is mainly because I used existing CBQ code
and there was only one rate per class in it.
But as I will probably reimplement it as completely new
qdisc with independent tc command set.

> At high input rates this is true. Dequeueing will also be caused/kicked
> by arriving packets; under low arrival rates you are dependent mostly
> on the timer. At Hz levels, this is not a very precise thing.
> Definetly tx-complete interupts forcing a dequeue at this point would help
> bit will screw too much of the existing infrastructure its not worth it.

Ahhh..  Just now I took look at 8390.c (used by ne2000) and there is
function:
ei_tx_intr(struct device *dev) which is called after packet completion
and 
it contains mark_bh (NET_BH) call. BH's are then called at end of ISR so 
that net_bh should be called soon after packet finish. And thus dequeue
event too.
But you are right that there is too many factors which can delay dequeue
event from previous packet finish.

Let's talk about current CBQ impl. Let's assume it is correct.
I spend some time to understand ANK's time integrator. Now I'm
nearly sure that it is used to minimize errors in EOS times.
These errors are introduced by queuing inside of drivers. 
The q->now time is always max(real_time, expected_send_time).
So that q->now==q->now_rt in case that dequeue events come in
rate of HW device bw (at most). When they come faster (because of
empty driver's queues being filled) q->now > q->now_rt to ensure
that q->now rate doesn't exceed HW rate.
After some time q->now will again catch the q->now_rt up. 
Ok so now q->now is a bit corrected time of dequeue event. But
still it IS based on dequeue event ! But ANK calls it EOS time 
and uses it as EOS time.
Yes it is workaround (as mentioned by AKN's note) but he also
said that "it is very close to ideal solution".
Moreover the dequeue could be called more often than EOSes occur
because net_bh is activated by all net drivers, by timers and
by packet reieves. But calling of qdisc's dequeue is bounded
by dev->tbusy test so that the mis-dequeues should not be 
a big problem.
So that when you say that I can't use dequeue event as EOS time
why ANK's cbq code uses it ??
By the way this is the second (and last) place where ANK uses
HW rate : to "simulate" correct q->now in case the queues are
being filled.

But you ARE right that left edge estimation will not prevent us
from need of knowledge of device's HW rate. I can eliminate use
of HW rate in undertime computation but CAN'T eliminate it from
integrator. And integrator will be still needed in left edge 
estimator. Furthermore the left edge estimation will delay
overlimit actions by one packet and thanks to you now I understand
that shaper MUST be precise also in scope of one packet.

> Token bucket for example has a problem in that if a flow is idle for a
> period of time such that the bucket becomes full, then a big burst of
> packets arrive and claim all the tokens, the packets might be sent out at
> wire speed instead of the allocated rate (i.e during that small period
> from bucket full to bucket empty). For this you need that extra controller
> either a leaky bucket or even another token bucket would do; in Linux this
> is the "peak rate parameter" -- this is why i said in my earlier email you
> need the peak control to do 'shaping'.

Ahh yes. If I understood you correctly, TBF with peak set has still
problem. Peak has bucket size set to MTU, when two smaller packets
arive,
they are still send at wire's speed, aren't it ?

regards,
devik