Abstract

In January [1] various architectures for 2.4 GHz PHY high-rate equalization were presented. Both a decision feedback equalizer (DFE) architecture and a combination Viterbi-DFE architecture were presented. These tracking-mode techniques were shown to have relatively-low complexity due to the impulse response shortening provided by the add/subtract feedback taps. This submission presents a simple technique for greatly improving the packet-error-rate performance of the DFE through selective alignment with the channel impulse response. Complexity is not increased. It will be shown that RMS delay spreads in excess of 200 nsec can be accommodated using only 2 feedforward (FF) DFE taps. It is not necessary to use a combination-MLSE/DFE to reach RMS delay spreads of 200 nsec as suggested in [1]. It will also be shown that a DFE without FF taps and only using 5 feedback (FB) taps can mitigate 100 nsec of RMS delay spread.
1. INTRODUCTION

This submission extends DFE equalization techniques used with the QMBOK waveform [1,2]. It will be shown how greatly-improved performance can be attained without an associated increase in complexity. The extra performance is obtained by selectively setting-up the DFE on impulse response peaks which sit in front of the global impulse response peak. Simple DFE acquisition is provided by block calculating the weights using a channel impulse response estimate [1,2]. Neither time-consuming recursive (LMS) nor high-complexity (RLS) training is needed.

2. TX/RX FILTER SELECTION

Rather than present performance data using idealized matched filters, this submission presents packet-error-rate and bit-error-rate data using low-cost filters. It is hoped this perspective provides a more useful viewpoint. The reader will know that the exhibited performance can be comfortably attained, removing implementation-complexity uncertainty. Yet, the freedom to increase performance through increased complexity is still an option, moving performance closer to the theoretical bound.

Using matched filters for the transmitter and receiver gives optimal performance in an additive white Gaussian noise environment (AWGN). However, the implementation complexity and power-draw required may be too high for the consumer marketplace. The most common matched filter pairs are the square-root raised-cosine family. Often these filters must be digitally implemented using an FIR structure, spanning several chip intervals with multiple samples per chip, to provide adequate transmit spectrum shaping (802.11 mask requirement), adequate inter-chip interference mitigation and adequate noise limiting. The multiply accumulate rate must match the chip rate, number of filter taps and the number of samples per chip. Both in the modulator and the demodulator.

In contrast, the dominant signal-shaping transmit-and-receive filters used in this submission are 5\textsuperscript{th} -order Butterworth which can be implemented in low-complexity/low-power analog-chip technology. This submission’s end-to-end filter chain is shown in Fig. 2.1. The filter chain is stimulated
with bi-level NRZ pulses. The Butterworth filters are the dominant in-close spectral-shaping and noise-limiting filters. The SAW filters limit farther-out components. The noise-limiting ability is inferior to a matched filter. The ICI limiting is inferior to a matched filter. However, the implementation complexity is low.

**Figure 2.1** Simulated transmit receive filter chain.

In fact, to minimize cost even further for half-duplex communications, a single pair of Butterworth filters are shared between transmit and receive. They are multiplexed in a half-duplex fashion. This concept is illustrated in Fig. 2.2. The performance drawback of this approach is the inability to select independent 3 dB corner frequencies for the both the transmit and receive filters. The SAW filters can be half-duplex shared also.

**Figure 2.2** Low-pass filter sharing using a multiplexing architecture in half-duplex communications.
The performance loss suffered for using this simpler implementation is shown in Fig. 2.3. The bit-error-rate is contrasted for the theoretically-ideal root-raised-cosine filters (matched) and the sub-optimal Butterworth filters. The curves on the left are for the classical AWGN channel. The curves on the right are for the frequency-nonselective Rayleigh flat-fading channel. At a AWGN channel bit-error-rate of $1.0\times10^{-5}$ the loss is less than 1.5 dB.

**Figure 2.3** Simulated performance comparison between the matched filter implementation and the simpler, transmit/receive-shared Butterworth filters. The AWGN curves are shown on the left-side along with the flat Rayleigh fading curves on the right-side.
3. BRIEF RECAP OF DFE-BASED MULTIPATH MITIGATION

This section briefly reviews the architecture and performance presented in IEEE 802.11-98/37 and IEEE 802.11-98/47. This is information which will be used as a foundation for the innovations presented in succeeding sections.

A potent equalizer for combating multipath is the decision feedback equalizer (DFE). The basic architecture is shown in Fig. 3.1. The equalizer removes multipath degradation caused by inter-symbol and inter-chip interference. The feedforward weights must use multipliers, but the feedback weights can be implemented with adds/subtracts for BPSK and QPSK chips.

![DFE Structure](image)

**Figure 3.1** DFE architecture shown two feedforward taps \( w_{-1}, w_0 \), and three feedback taps \( w_1, w_2 \) and \( w_3 \).

A typical channel impulse response for the indoor radio channel is shown in Fig. 3.2. It has a largely exponential power delay profile [7]. As shown in [3] and [5], a DFE with an infinite number of taps has a minimum mean-squared-error (MMSE) or zero-forcing (ZF) solution where the feedforward taps target the non-minimum phase portion (zeros outside unit circle) of the channel impulse response. The feedback taps target the minimum phase portion (poles and zeros inside unit circle) of the channel impulse response. The peak of the channel impulse response does not necessarily correspond to the ideal decision point for a DFE with an infinite number of taps.
It is computationally expensive to calculate the roots of the channel impulse response to identify the minimum phase and nonminimum phase components. In the absence of pole/zero knowledge, a conservative set-up for the DFE uses the feedback taps to remove interference from the channel impulse response (CIR) components which follow the peak. This is an easy heuristic to follow. In [1] and [2], the DFE was centered to form its decision about the channel impulse response peak.

As shown in [1] and [2], the zero-forcing DFE solution can be easily computed given an estimate of the channel impulse response if only a small number of feed-forward taps are used.

Given the channel test conditions described in [6], DFE performance was measured for the multipath-only condition. Thermal noise was not included. In [1] and [2] various architectures were examined. One particular architecture used 3 FF taps calculated using the zero-forcing condition. The January’s packet-error-rate (PER) results of [1] are duplicated in Fig. 3.3. As the number of feedback taps is increased it appears that a limiting condition appears where the performance does not improve further.
Figure 3.3  Multipath-only mitigation results for a zero-forcing DFE using 3 feedforward taps and a variable number of feedback taps. Old algorithm data presented in January [1].

Since feedback taps can be implemented with relatively low complexity (adds/subtracts), it is convenient to let the number of feedback taps equal the number of postcursor channel impulse response taps. Here the number of feedback taps increases proportional to the RMS delay spread. Given this dynamic FB taps case, the test conditions shown in Fig. 3.3 now give the results shown in Fig. 3.4. Fig. 3.4 shows the true limited performance while Fig. 3.3 shows the performance trend as the number of feedback taps is increased.
Figure 3.4 Multipath-only mitigation results for a zero-forcing DFE using 3 feedforward taps and a sufficient number of feedback taps to totally eliminate postcursor interference. 1000 byte packets were used.

When noise is included, this limited performance is observed as Eb/No increases. A packet-error-rate (PER) floor exists where the PER does not improve as Eb/No is increased. This effect is shown in Fig. 3.5.
The next section explains the failure mechanism producing the error floor. Section 5 will explain how to lower the error floor.

4. IDENTIFYING THE DOMINATE PACKET-ERROR MECHANISM

This section identifies the packet-error mechanism which produces the noise-free packet-error floor. In general, packet errors result from both thermal noise and uncompensated channel components. In a RAKE receiver, the channel components are used in matched filter combining. In an equalized receiver, the extra channel components are subtracted from the selected chief component.
When operating noise free, the DFE based system will make a packet error when the equalized channel still has a closed eye pattern. For the proposed DFE, an adequate number of feedback taps are used to remove postcursor interference. The implementation complexity of feedback taps is small. However, only the use of 1, 2 or 3 feedforward taps is suggested, since a full multiplier is required for each FF tap.

With only a few feedforward taps, the DFE may not adequately span the precursor portion of the channel impulse response as shown in Fig. 4.1. When the 3 FF taps DFE aligns the decision tap with the channel impulse response peak, the feedforward taps only have a three-chip precursor range. The feedforward taps zero-force over their span, but cannot eliminate preceding components. The probability of this unspanned, large-precursor, event is equal to the noise-free packet error rate.

![Figure 4.1](image)

**Figure 4.1** A DFE packet error occurs when significant energy precedes the span of the feedforward taps.

An example taken from the simulation environment is shown in Fig. 4.2. Shown is the end-to-end channel impulse response comprised by the modem filters cascaded with the exponential power delay Rayleigh fading
channel. Eight samples per chip were used in the simulation. The DFE set-up the decision element on the channel impulse response (CIR) peak. The CIR is then decimated to one sample per chip. In other words, the DFE is T-spaced. The peak is scaled to one by an AGC function. The circled points in Fig. 4.2 are the decimation points to one sample/chip. The RMS delay spread of the channel was 200 nsec.

![CIR DFE Set-up](image)

**Figure 4.2** A channel impulse response which caused a packet error. The DFE set-up on the CIR peak is shown with circles. The three feedforward taps were insufficient to eliminate the eye closure caused by the precursor taps. A 200 nsec RMS delay channel was used.
5. SELECTIVE PRECURSOR SLIDING

This section presents a technique for minimizing packet errors by setting-up the DFE about a point other than the channel impulse response peak. It will be shown that much better performance is provided without much increase in complexity.

Optimally, the DFE would be designed about the point where the decision error is minimized as shown in Fig. 5.1. In theory, one would compute the mean-squared-error (MSE) at each sampling phase of the CIR. The sampling phase possessing the minimum mean-squared-error (MMSE) would be used. This would be extremely complex to implement in practice.

![Figure 5.1](image)

**Figure 5.1** Ideal DFE set-up minimizes decision errors.

Fortunately, high-performing simple techniques can be used, even though they are not optimum. A good description of a particular heuristic technique is presented in [4]. Others are easily devised.

The basic concept involves searching for alternative impulse response peaks sitting in front of the global peak. These alternative peaks become candidates for DFE set-up points as shown in Fig. 5.2. A particular peak is selected for overall maximization of a quasi-SNR. The quasi-signal-power is the power of the selected peak. The quasi-noise-power is the thermal noise variance plus the power in any uncompensated precursor taps. The quasi-SNR is the ratio of the two. Crafting variant algorithms is easy.
Figure 5.2 Selectively setting-up the DFE on a peak other than the global peak.

A representative quasi-SNR metric which should be maximized is shown in the following equation. The numerator is CIR component corresponding to the DFE decision point. \( N_0 \) is the noise variance.

\[
SNR_{DFE} \approx \frac{|h_k|^2}{2N_o + \sum |precursors|^2}
\]
The simulation results of a particular sliding algorithm is shown in Fig. 5.3. Rather than selecting the global peak (the second peak), the algorithm selected the first peak. A point was chosen on the first peak which minimized precursor noise yet kept the signal well out of the thermal noise. The signal was then AGC’d to set the decision point to unity.

![CIR DFE Set-up](image)

**Figure 5.3** A simulation result with a sliding DFE. A packet error would have occurred otherwise.

A demonstration of the effectiveness of DFE sliding is shown in Fig. 5.4. This DFE has only 3 feedforward taps and the number of feedback taps are adequate to eliminate postcursor ISI. This packet-error-rate plot versus Eb/No was taken under the same simulation conditions as Fig. 3.5. The top curve corresponds to the old-algorithm’s CIR peak-selecting performance. For this case a PER floor is clearly observed. For the new sliding-algorithm a PER floor does not exist.
6. PERFORMANCE GALLERY GIVEN AN ADEQUATE NUMBER OF FB TAPS

This section presents performance data provided by DFE sliding for increasing values of RMS delay spread. As the RMS delay increases, the number of feedback taps is allowed to float to an adequate number needed to eliminate postcursor ISI. This is not a restrictive assumption because the implementation complexity is low for feedback taps. For a particular application realization, the system designer would specify the required RMS delay. This in-turn specifies the number of feedback taps. For example, 10 feedback taps can accommodate 150 nsec of delay spread with PER floors below 0.1%.

The simulations in this section will be for 3 feedforward taps.
Fig. 6-1 shows the performance for RMS delay spreads of 25, 50 and 100 nsec. The PER for the flat Rayleigh fading channel is also shown. Fig. 6-2 shows the performance for RMS delay spreads of 150, 200 and 300 nsec. In all cases an error floor is not observed.

![Figure 6-1](image)

**Figure 6-1** PER versus Eb/No for RMS delays of 25, 50 and 100 nsec. 1000 byte packets were used.
7. ESTIMATING THERMAL NOISE LEVELS

Any algorithm used for sliding the DFE forward on the channel impulse response precursor requires an estimate of the thermal noise level. This section describes methods for accomplishing this objective.

The ability to achieve good performance using precursor sliding is not critically dependent on highly-accurate noise-power estimates. This makes sense when one considers the techniques are largely adhoc anyhow. Optimal calculations are too complex to be considered for most applications.

One simple way to roughly gage SNR is to use a received signal strength indicator (RSSI) which is derived from a power detector. Prior to
packet arrival, RSSI gives an indication of background noise level. When a packet is detected, RSSI gives an indication of the signal level. The difference can be used to estimate the thermal noise level.

Another simple way to gage the noise level as part of the channel-impulse-response estimation process. Once an CIR estimate is made, it is usually easy to measure how well the CIR matches a set of receive signal samples. In low-noise environments the match will be good. In high noise environments the match will be poor. The mean-squared CIR estimation error is related to the noise level.

8. 2 FF TAPS 10 FB TAPS

This section emphasizes the performance that can be provided by a low complexity design using DFE sliding. Here only two feedforward taps are used and 10 feedback taps. The same precursor sliding algorithm presented above is used. With only 10 feedback taps, the number of feedback taps becomes inadequate at some increased-level of RMS delay spread. The postcursor ISI not eliminated by the feedback taps with increasing RMS delay-spread eventually closes the eye as shown in Fig. 8.1. Note that this causes the noise-free PER floor to return.
Figure 8.1 The packet-error-rate floor reemerges at high RMS delay spreads when a fixed number of feedback taps is used.

Fig. 8.2 shows the noise-free performance that can be obtained with only 2 FF taps and 10 FB taps. Note that the RMS delay spread becomes too large for the 10 FB taps at a certain point. At 10% PER the RMS delay spread is 183 nsec. This is excellent performance for only 2 FF taps and 10 FB taps. This is much better performance than the 100 nsec RMS delay 10% PER performance presented in January [1] duplicated in Fig. 8.3 without DFE sliding.


**Figure 8.2** The thermal-noise-free multipath performance provide using DFE sliding, 2 feedforward taps and 10 feedback taps. 1000 byte packets were used.
9. 1 FF TAP WITH 5 OR 10 FB TAPS

This section emphasizes the performance that can be provided by an ultra-low complexity design using DFE sliding. Here there is only one feedforward tap with either 5 or 10 feedback taps. The same precursor sliding algorithm presented above is used. With only 5 or 10 feedback taps, the number of feedback taps becomes inadequate at some increased-level of RMS delay spread. The postcursor ISI not eliminated by the feedback taps eventually closes the eye as shown in Fig. 9.1.

Fig. 9.1 shows the noise-free performance that can be obtained with only 1 FF taps and 5 or 10 FB taps. Note that the RMS delay spread becomes too large for the 5 or 10 FB taps at a certain point. At 10% PER the RMS delay spread is 180 nsec for the 10 FB tap case. At 10% PER the RMS delay spread is 100 nsec for the 5 FB tap case. This is excellent.
performance for only 1 FF tap. This is much better performance than the 10% PER performance presented in January [1] duplicated in Fig. 9.2 without DFE sliding.

![Diagram](image)

**Figure 9.1** The thermal-noise-free multipath performance provided using DFE sliding, 1 feedforward tap and 5 or 10 feedback taps. 1000 byte packets were used.
10. CONCLUSION

This submission has presented a new DFE algorithm which greatly extends the multipath performance through intelligent positioning on the channel impulse response. This provides ultra-low complexity for a given targeted level of performance.

REFERENCES


