[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new WRR sched, updated TBF and "real" ingres implementation

> Sorry, couldnt respond sooner, part of that survivor island ;->
> (now watch me disappear again;->)

Hello Jamal,

also sorry for my last sentence in private mail. I did not
notice this mail ;-)

> > and it contains mark_bh (NET_BH) call. BH's are then called at end of ISR so
> > that net_bh should be called soon after packet finish.
> A bit out of the ordinary. Do you know any one else doing this?

Ok, I've 2.3.99pre7 here, so driver for 3c509 for example.
Interrupt handler at line 704 test whether we are handling
RX or TX irq. At line 736 netif_wake_queue is called when
irq was due to output FIFO is ready to transmit new packet.
It has 2 FIFO positions so that it is EOS event.
I looked thru 3 netdrivers in both 2.2 and 2.3 and all have
RX & TX irq handlers which calls netif_wake_queue (resp.
net_bh in 2.2).

> So in addition to input interrupts, they also generate tx_complete
> interupts per packet? doesnt sound very sensible unless there is another
> reason hidden for this particular case.

Yes exactly. The reson is simple. When application is sending
packets (UDP without acks for example), after packet is sent
you need some event to tell you that new packet can be sent.
Yes, in simple router it may be overhead but it is how it
works. One irq from recieving iface and second from tx iface
per each packet.

> > And thus dequeue event too.
> Not really. It _might_ initiate a packet dequeue if there was a packet
> enqueued (otherwise it is 'wasted effort' -- its like the chicken and egg,
> which came first)

yes it is wasted effort when tx queue is empty. But only one
irq is wasted - no other will come.

> But yes if you had a way to have everything 'pulled' by the tx_complete
> interupt only, then you are set. The above scenario obviously doesnt make
> sense (two interupts per packet, and how do you bootstrap it etc)

Does it sense now ? Bootstrap is simple: process which want
to send, qdisc timer, net watchdog timer or RX irq. TX irqs
are necessary to send back-to-back packets without delays.

> > nearly sure that it is used to minimize errors in EOS times.
> > These errors are introduced by queuing inside of drivers.
> Errors are not introduced by queueing inside the drivers for this case,
> but rather because you cant do accurate shaping without approximations
> with variable size packets. 

sorry but I've to disagree with you. first, NS implementation
doesn't contain the integrator and it works. And ANK's comments
in the code says that he used it to minimize errors from 
internal queuing.
Packet sizes are used in virtual idle time computation. When
I studied materials about FSCQ (you sent me) I found description
of virtual time based algorithms. They all compute virtual
time of packet finish, compare it with real times and takes
decisions about send times.
There was no word about need of time integrator.

> Errors are caused by: deviations of estimated
> interpacket departures vs real departure times and deviations of estimated
> average packet size vs real packet size etc.
> Like i said earlier, it is easy to shape for cells, you can make them
> depart at predefined interpacket times and you dont have to worry about
> packet size variance

Yes. But it is accomplished by another code. It is in
cbq_update. I was talking about integrator in cbq_dequeue.
> > Ok so now q->now is a bit corrected time of dequeue event. But
> > still it IS based on dequeue event ! But ANK calls it EOS time
> > and uses it as EOS time.
> It is EOS mostly because you cant determine when the last bit of
> the packet hit the wire (i.e left the NIC).
> > Yes it is workaround (as mentioned by AKN's note) but he also
> > said that "it is very close to ideal solution".
> Under moderate to high input loads, you are very close to the ideal
> solution. Imagine a 1Mbps input.

Oh. Now you tell me that it is ok. Uff. I think that in all
mails I tried to understand why can't I use dequeue as
EOS approx. and now I don't understand what we talked
about. ;-}
But in any case it is very useful talk !
> > Ahh yes. If I understood you correctly, TBF with peak set has still
> > problem. Peak has bucket size set to MTU, when two smaller packets
> > arive, they are still send at wire's speed, aren't it ?
> No.
> If you specify the peak rate, then there is further bounding "in the short
> term" by the peak rate otherwise, that idle-flow problem i described
> earlier hits you and you send at wire speed otherwise you will be sending
> at the specified rate.
> The peak bucket is not restricted to MTU. Infact by default it is 2KB
> but you can change this.
> In summary, "over a long period of time", you will see a flow rate
> equivalent to the rate you requested. In the short term, you will see
> what the peak rate defines. If you dont define the peak rate, then in the
> short term you'll see wire-rate output i.e peak rate becomes
> wire-rate. That is not to say at bit-times the packets are not sent at
> wire speed. Not sure this helped or confused you further.

Ahh yes. You are right. I was speaking about limiter
accuracy when peak bucket > 0. Then still two 56 byte
packets will be sent together as oposed from virtual
clock queues (FSCQ) where it can insert delay also
between these packets.
But as you said - at MTU sizes you are approaching the
bit-times and you can't regulate it more precizely.
Right. Thanks I understand this part (TBF).

have a nice day, devik