[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new WRR sched, updated TBF and "real" ingres implementation




Martin,

On Wed, 23 Aug 2000 devik@cdi.cz wrote:

> Ok, I've 2.3.99pre7 here, so driver for 3c509 for example.
> Interrupt handler at line 704 test whether we are handling
> RX or TX irq. At line 736 netif_wake_queue is called when
> irq was due to output FIFO is ready to transmit new packet.
> It has 2 FIFO positions so that it is EOS event.
> I looked thru 3 netdrivers in both 2.2 and 2.3 and all have
> RX & TX irq handlers which calls netif_wake_queue (resp.
> net_bh in 2.2).
> 

3 drivers is really in the minority.
Take a Look at Donald Becker's driver programming style (which is
generally copied across most network drivers). Tx interupt is
done on as needed basis. Not on each packet completion. This is for very
sane reasons as i stated earlier (to avoid interupt overload on the CPU)

> > So in addition to input interrupts, they also generate tx_complete
> > interupts per packet? doesnt sound very sensible unless there is another
> > reason hidden for this particular case.
> 
> Yes exactly. The reson is simple. When application is sending
> packets (UDP without acks for example), after packet is sent
> you need some event to tell you that new packet can be sent.
> Yes, in simple router it may be overhead but it is how it
> works. One irq from recieving iface and second from tx iface
> per each packet.
> 

Very broken is the driver which does this. As i said earlier, i
suspect there could be a good reason only for that particular driver which
is not clear on first inspection. Email the author/maintainer and ask. 

> yes it is wasted effort when tx queue is empty. But only one
> irq is wasted - no other will come.
> 

If you DMA ring holds 32 packets and generates 32 tx complete interupts
and there are no more packets to dequeue then all those interupts are
'wasted'. This is a bad example for me to give but serves to illustrate
the point.

> > But yes if you had a way to have everything 'pulled' by the tx_complete
> > interupt only, then you are set. The above scenario obviously doesnt make
> > sense (two interupts per packet, and how do you bootstrap it etc)
> 
> Does it sense now ? Bootstrap is simple: process which want
> to send, qdisc timer, net watchdog timer or RX irq. TX irqs
> are necessary to send back-to-back packets without delays.
> 

I think interupt overload is a big problem in general. CPU overload is
generally the issue. Additional interupts are unnecessary when you can
live without them; more interupts implies less load capacity: You can
choose to trade between scalability + 99.9% accuracy and  99.999%
accuracy + less load capacity; your choice ;->

Summary: yes, you get a much better granularity/accuracy with a driver
that does per-packet tx-interupt in addition to per-packet rx-interupt but
you loose because you are overloading the CPU. This is not evident until
you start going above the 10Mbps probably.
 
> > > nearly sure that it is used to minimize errors in EOS times.
> > > These errors are introduced by queuing inside of drivers.
> > 
> > Errors are not introduced by queueing inside the drivers for this case,
> > but rather because you cant do accurate shaping without approximations
> > with variable size packets. 
> 
> sorry but I've to disagree with you. first, NS implementation
> doesn't contain the integrator and it works. And ANK's comments
> in the code says that he used it to minimize errors from 
> internal queuing.

investigate their event handling mechanism; i have a feeling it is
designed such that you dequeue exactly on the dot. It would be evil to
think of academics being capable of designing a system that is not ideal;-> 
Unfortunately, the world is not what theory says.

> Packet sizes are used in virtual idle time computation. When
> I studied materials about FSCQ (you sent me) I found description
> of virtual time based algorithms. They all compute virtual
> time of packet finish, compare it with real times and takes
> decisions about send times.
> There was no word about need of time integrator.
> 

Sorry, i havent looked at FSCQ in details; but you could be right that the
issue is not there.
I'll read it carefully if you plan on implementing it (so i can give you
feedback)
Generally CBQ's question is answered on the dequeue i.e "is this packet
conforming?" whereas it would make sense that FSCQ is probably more wisely
answered by enqueue events: "When is the next conforming packet going to
be here?" and the dequeue does very little thinking, it sends all
conforming packets. I am just hypothesizing.
If you do it as above, you dont have to worry about EOS.
Using variable sized packets you probably also dont need to worry about
CBR-ish type of evenly space packet units (or worry about what kind of
jitter the traffic stream introduces etc)

>> Under moderate to high input loads, you are very close to the ideal 
>> solution. Imagine a 1Mbps input.
> Oh. Now you tell me that it is ok. Uff. I think that in all
> mails I tried to understand why can't I use dequeue as
> EOS approx. and now I don't understand what we talked
> about. ;-}
> But in any case it is very useful talk !

Here's what we talked about:
1) Under moderate to high load the rx interupts will make the dequeue
event be a good enough approximation, almost as good as if packets were
being 'pulled' by the tx complete interupt. 
2) Under low loads the dequeue is not good enough since you will resort to
the timer's help in the average case (at HZ granularity). 

Now if you only design for case 1) then i ma afraid, your design is broken
simply because you didnt take care of 2). You MUST factor in 2)

cheers,
jamal