Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Loop Bandwidth for 64B66B





Hi Vipul,

> It seems to me that the use of 64B66B code may adversely affect the
> design of a Clock Multiplier Unit in a Serial PMD, though I have not
> quantified it to decide the magnitude of the problem.  This is not a
> flaw in the 64B66B code, but rather an unfortunate consequence of the
> way frequency multiplication works out.  I think this problem can also
> occur if we use other low-overhead codes. 

Yes.  This is a tricky bit.  As you point out, I also think it is inherent
in any low overhead code.  If we only add a small overhead, then the clock
multiplication ratio is necessarily a rational ratio close to 1, and
involving big numbers.

> To simplify, let's take a specific example.  Suppose a Serial PMD's
> transmit path is designed such that data received at the HARI
> interface is decoded to 8B, then again encoded using 64B66B for
> optical transmission.  Width of SerDes data path is 8. So the output
> of the 64B66B encoder must find a way to ship out data in chunks of 8
> bits.  This requires a clock frequency conversion from f_in to f_out =
> f_in * 66/8 = f_in * 33/4. 

I agree.

> A typical Clock Multiplier Unit implementation will have a phase
> detector, comparing the phases of f_in/4 and f_out/33, where f_out is
> the output of a VCO.  In our example, f_in will be 156.25 MHz, and
> f_out will be 1.289 GHz.  The input to 8:1 SerDes will be this 1.289
> GHz clock, and 8-wide data. 

I would do it a bit differently. 

The HARI 1:10 demux generates 10 bit words at a rate of 312.5 GW/s. 
This is the same byte rate after 10:8 decoding. 

The ratio of the recovered HARI word rate and the 66/64 serial rate is
then 33:1.

I would run the final mux as a 16:1 mux, but preface it with a digital
"gearbox" that does a 66:33:16 conversion.  This operation takes two blocks
of 16 bits from the first 33 bit word, and saves the extra bit for the
next cycle.  It then concatenates the saved bit with 15 bits of the second
input word to produce a third output word.  Another 16 bits is then stripped
off, leaving 2 bits to be saved for the next input word...  and so on. This
approach trades off circuit complexity for low latency. 

Perhaps a simpler implementation uses a dual port memory of size = 66 bytes. 

Since 66*8 is divisible by both 16 and 33, then the transfers become
synchronous.  The phase-locked input source writes 8 encoded 66-bit blocks
at 312.5 MHz, and the TX data path reads them out as 33 16-bit words at
a rate of 644.53125 MHz.

> It is the integer 33 that is at the heart of this problem.  One can
> argue that this makes for a "stiff" PLL - the VCO, whose job is to put
> out f_out, is being refreshed at a rate of f_out/33 - resulting in an
> unusually low loop bandwidth.  

Let's figure this out.  The reference frequency is 312.5 MHz, and we
are locking our PLL to 33x the reference frequency.  The general rule
of thumb is that the PLL loop BW can be about 1/20th of the reference
frequency.  In this case, the loop BW can be 312.5/20 = 15.625 MHz.

My experience says that a loop BW of 2 MHz is adequate to completely
dominate the 1/f noise of a bipolar ring oscillator.  Other designers
could conceivably use higher-Q LC oscillators with even less BW
requirements.

So, I disagree with your feeling that the loop BW is "unusually low".  
 
From my experience, we have about 8x more BW than is strictly required.

> This may be generally regarded as a Bad
> Thing because:
> 
> 1. It increases the probability of VCO drift, making it difficult to
> meet the frequency tolerance specification. 

A 2MHz BW is wide enough to tame the 1/f noise of the worst VCO that
anyone is likely to use.  We can have a 15 MHz BW if we use a classical
linear loop, and could effectively have 150 MHz small signal BW if we
use a bang-bang loop. 

I see no problem here.

> 2. Alternatively, it forces the use of a large capacitor in the low
> pass filter that will reside at the output of the phase detector. 
> Large capacitors have to be external.  That increases noise
> susceptibility. 

This is likely to be true for a pure bipolar implementation due to the
low Rout of bipolar devices.  In a CMOS or SiGe/CMOS implementation, the
capacitor can be put on-chip due to the high charge-pump impedance.

> 3. It increases low-frequency jitter.  Phase noise in VCO output at
> all frequencies above the loop bandwidth will reach SerDes. 

I don't think so.  The low frequency jitter will very nicely track
to the incoming reference within the 4-5 MHz loop BW.  

> 4. It increases lock acquisition time of a PLL. 

I would imagine using some kind of a frequency aided loop to address the
startup issue.  In any case, I don't think even up to a millisecond
power-on delay is significant for this application. 

> My question: is this a big problem, or is this a small issue,
> routinely handled using careful design practices?  If a
> non-proprietary solution is known, what is it? 

Let me know if my reasoning makes sense to you.  I'll be happy to try to
hash it out in more detail if anything seems to be unclear. 

Short term, I agree that it is a bit of a hassle that no commercial
parts exist to implement this circuit directly.  However, I believe
that prototypes can be relatively easily made with FPGA devices using
off-the-shelf 16:1 serializers.  The FPGA does all the tricky bits,
barrel shifting at a relatively low clock rate.  The 16:1 clock gen is
phase locked with a 33:1 divider to the recovered HARI word clock. 

This is what we are looking into for a demonstration vehicle.

Long term, I imagine a single BiCMOS chip doing everything: HARI RX/TX
+ coding/decoding + 10.3Gb/s RX/TX.

Best regards,
--
Rick Walker