## MISMATCH INSENSITIVE VOLTAGE TO TIME CONVERSION AND CLOCK DISTRIBUTION TOPOLOGIES FOR THREE DIMENSIONAL INTEGRATED CIRCUITS

by

Tejinder Singh Sandhu

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy

 $\operatorname{at}$ 

Dalhousie University Halifax, Nova Scotia September 2018

© Copyright by Tejinder Singh Sandhu, 2018

# Contents

| List of | Tables                                                       | • • • • • • • • • • • • • • • • • • • •                                                                                                                                                                                        | v                                                                            |
|---------|--------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|
| List of | Figure                                                       | 25                                                                                                                                                                                                                             | vi                                                                           |
| Abstra  | .ct                                                          |                                                                                                                                                                                                                                | ix                                                                           |
| List of | Abbre                                                        | viations Used                                                                                                                                                                                                                  | x                                                                            |
| Acknow  | wledge                                                       | ments                                                                                                                                                                                                                          | xi                                                                           |
| Chapte  | er 1                                                         | Introduction                                                                                                                                                                                                                   | 1                                                                            |
| 1.1     | Thesis                                                       | Objectives                                                                                                                                                                                                                     | 2                                                                            |
| 1.2     | Thesis<br>1 2 1                                              | Contribution                                                                                                                                                                                                                   | 2                                                                            |
|         | 1.2.1                                                        | Clock Synchronization in 3-D ICs                                                                                                                                                                                               | 3                                                                            |
|         | 1.2.2                                                        | Supply Compensated Digitally Controlled Delay Lines for 3D-         IC Clock Synchronization Topologies                                                                                                                        | 4                                                                            |
|         | 1.2.3                                                        | Beyond Rail-to-Rail Compliant Current Sources for Mismatch<br>Insensitive Voltage to Time Conversion                                                                                                                           | 4                                                                            |
| 1.3     | Thesis                                                       | Outline                                                                                                                                                                                                                        | 5                                                                            |
| Chapte  | er 2                                                         | A Mismatch Insensitive Skew Compensation Architec-<br>ture for Clock Synchronization in 3D ICs                                                                                                                                 | 6                                                                            |
| 2.1     | Introdu                                                      | uction                                                                                                                                                                                                                         | 7                                                                            |
| 2.2     | DTD c                                                        | operation and limitations                                                                                                                                                                                                      | 8                                                                            |
| 2.3     | Propos<br>2.3.1<br>2.3.2<br>2.3.3<br>2.3.4<br>2.3.5<br>2.3.6 | sed MISC Architecture and Operation $\dots$ $\dots$ State 1 $\dots$ $\dots$ State 2 $\dots$ $\dots$ State 3 $\dots$ $\dots$ State 22 $\dots$ $\dots$ Tracking after synchronization $\dots$ Inverse locking resolution $\dots$ | $10 \\ 12 \\ 13 \\ 14 \\ 15 \\ 17 \\ 17 \\ 17 \\ 17 \\ 17 \\ 10 \\ 10 \\ 10$ |
| 2.4     | Additie<br>2.4.1<br>2.4.2<br>2.4.3                           | onal skew sources in MISCMismatch in Buffers and Frequency dividersPhase Detector dead zoneInput jitter                                                                                                                        | 19<br>19<br>20<br>21                                                         |

| 2.5                                       | circuit                                                                                                                                    | implementation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 22                                                                                                                                                                     |
|-------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                           | 2.5.1                                                                                                                                      | Delay lines                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 22                                                                                                                                                                     |
|                                           | 2.5.2                                                                                                                                      | Time to digital converter                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 24                                                                                                                                                                     |
|                                           | 2.5.3                                                                                                                                      | Phase detector                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 25                                                                                                                                                                     |
|                                           | 2.5.4                                                                                                                                      | MISC detailed circuit block                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 26                                                                                                                                                                     |
| 2.6                                       | Result                                                                                                                                     | s and Analysis                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 27                                                                                                                                                                     |
|                                           | 2.6.1                                                                                                                                      | MISC Post Synthesis Synchronization Cycle                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 27                                                                                                                                                                     |
|                                           | 2.6.2                                                                                                                                      | MISC Vs DTD under worst case conditions                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 29                                                                                                                                                                     |
|                                           |                                                                                                                                            | 2.6.2.1 Why simulate a large number of fabrication runs?                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 29                                                                                                                                                                     |
|                                           |                                                                                                                                            | 2.6.2.2 Testbench setup                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 30                                                                                                                                                                     |
|                                           |                                                                                                                                            | 2.6.2.3 Testbench skew results                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 33                                                                                                                                                                     |
|                                           | 2.6.3                                                                                                                                      | Comparative Analysis                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 36                                                                                                                                                                     |
| 2.7                                       | TSV d                                                                                                                                      | efects and lock time                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 37                                                                                                                                                                     |
|                                           | 2.7.1                                                                                                                                      | Defects in TSVf and TSVr                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 38                                                                                                                                                                     |
|                                           | 2.7.2                                                                                                                                      | Defects in TSVp                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 38                                                                                                                                                                     |
|                                           | 2.7.3                                                                                                                                      | Process variation and catastrophic TSV failure                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 39                                                                                                                                                                     |
|                                           | 2.7.4                                                                                                                                      | Lock time                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 39                                                                                                                                                                     |
| 2.8                                       | Conclu                                                                                                                                     | nsion                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 40                                                                                                                                                                     |
|                                           |                                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                                                                                                                                                                        |
| Chapte                                    | er 3                                                                                                                                       | Supply Compensated Digitally Controlled Delay Lines                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                        |
| Chapte                                    | er 3                                                                                                                                       | Supply Compensated Digitally Controlled Delay Lines<br>for 3D-IC Clock Synchronization Topologies                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 41                                                                                                                                                                     |
| Chapte<br>3.1                             | e <b>r 3</b><br>Introd                                                                                                                     | Supply Compensated Digitally Controlled Delay Lines<br>for 3D-IC Clock Synchronization Topologies                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | <b>41</b><br>42                                                                                                                                                        |
| Chapte<br>3.1<br>3.2                      | er <b>3</b><br>Introd<br>Propos                                                                                                            | Supply Compensated Digitally Controlled Delay Lines<br>for 3D-IC Clock Synchronization Topologies                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | <b>41</b><br>42<br>43                                                                                                                                                  |
| Chapte<br>3.1<br>3.2                      | er <b>3</b><br>Introd<br>Propos<br>3.2.1                                                                                                   | Supply Compensated Digitally Controlled Delay Lines         for 3D-IC Clock Synchronization Topologies         uction         sed Supply Compensated Delay Line         Clock Buffer Compensation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | <b>41</b><br>42<br>43<br>43                                                                                                                                            |
| Chapte<br>3.1<br>3.2                      | er <b>3</b><br>Introd<br>Propos<br>3.2.1<br>3.2.2                                                                                          | Supply Compensated Digitally Controlled Delay Lines         for 3D-IC Clock Synchronization Topologies         uction         sed Supply Compensated Delay Line         Clock Buffer Compensation         Supply Auto-Tuning Algorithm                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | <b>41</b><br>42<br>43<br>43<br>45                                                                                                                                      |
| Chapte<br>3.1<br>3.2<br>3.3               | Introd<br>Propos<br>3.2.1<br>3.2.2<br>Supply                                                                                               | Supply Compensated Digitally Controlled Delay Lines         for 3D-IC Clock Synchronization Topologies         uction         sed Supply Compensated Delay Line         Clock Buffer Compensation         Supply Auto-Tuning Algorithm         V Auto-tuning integration in the MISC 3D-IC clock synchroniza-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | <b>41</b><br>42<br>43<br>43<br>45                                                                                                                                      |
| Chapte<br>3.1<br>3.2<br>3.3               | Introd<br>Propos<br>3.2.1<br>3.2.2<br>Supply<br>tion an                                                                                    | Supply Compensated Digitally Controlled Delay Lines         for 3D-IC Clock Synchronization Topologies         uction         sed Supply Compensated Delay Line         Sed Supply Compensated Delay Line         Clock Buffer Compensation         Supply Auto-Tuning Algorithm         V Auto-tuning integration in the MISC 3D-IC clock synchroniza-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | <ul> <li>41</li> <li>42</li> <li>43</li> <li>43</li> <li>45</li> <li>47</li> </ul>                                                                                     |
| Chapte<br>3.1<br>3.2<br>3.3               | Introd<br>Propos<br>3.2.1<br>3.2.2<br>Supply<br>tion an<br>3.3.1                                                                           | Supply Compensated Digitally Controlled Delay Lines         for 3D-IC Clock Synchronization Topologies         uction         sed Supply Compensated Delay Line         Clock Buffer Compensation         Supply Auto-Tuning Algorithm         V Auto-tuning integration in the MISC 3D-IC clock synchroniza-         State 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | <ul> <li>41</li> <li>42</li> <li>43</li> <li>43</li> <li>45</li> <li>47</li> <li>49</li> </ul>                                                                         |
| Chapte<br>3.1<br>3.2<br>3.3               | Introd<br>Propos<br>3.2.1<br>3.2.2<br>Supply<br>tion an<br>3.3.1<br>3.3.2                                                                  | Supply Compensated Digitally Controlled Delay Lines         for 3D-IC Clock Synchronization Topologies         uction         sed Supply Compensated Delay Line         Clock Buffer Compensation         Supply Auto-Tuning Algorithm         V Auto-tuning integration in the MISC 3D-IC clock synchroniza-         State 1         State 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | <ul> <li>41</li> <li>42</li> <li>43</li> <li>43</li> <li>45</li> <li>47</li> <li>49</li> <li>49</li> </ul>                                                             |
| Chapte<br>3.1<br>3.2<br>3.3               | Introd<br>Propos<br>3.2.1<br>3.2.2<br>Supply<br>tion an<br>3.3.1<br>3.3.2<br>Circui                                                        | Supply Compensated Digitally Controlled Delay Lines         for 3D-IC Clock Synchronization Topologies         uction         sed Supply Compensated Delay Line         Clock Buffer Compensation         Supply Auto-Tuning Algorithm         v Auto-tuning integration in the MISC 3D-IC clock synchroniza-         State 1         State 2         t Implementation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | <ul> <li>41</li> <li>42</li> <li>43</li> <li>43</li> <li>45</li> <li>47</li> <li>49</li> <li>49</li> <li>50</li> </ul>                                                 |
| Chapte<br>3.1<br>3.2<br>3.3<br>3.4        | er <b>3</b><br>Introd<br>Propos<br>3.2.1<br>3.2.2<br>Supply<br>tion an<br>3.3.1<br>3.3.2<br>Circuit<br>3.4.1                               | Supply Compensated Digitally Controlled Delay Lines         for 3D-IC Clock Synchronization Topologies         uction         sed Supply Compensated Delay Line         Sed Supply Compensated Delay Line         Clock Buffer Compensation         Supply Auto-Tuning Algorithm         v Auto-tuning integration in the MISC 3D-IC clock synchroniza-         State 1         State 2         t Implementation         Delay Lines DLF and DLR                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | <ul> <li>41</li> <li>42</li> <li>43</li> <li>43</li> <li>45</li> <li>47</li> <li>49</li> <li>49</li> <li>50</li> <li>50</li> </ul>                                     |
| Chapte<br>3.1<br>3.2<br>3.3<br>3.4        | Introd<br>Propos<br>3.2.1<br>3.2.2<br>Supply<br>tion an<br>3.3.1<br>3.3.2<br>Circui<br>3.4.1<br>3.4.2                                      | Supply Compensated Digitally Controlled Delay Lines<br>for 3D-IC Clock Synchronization Topologies       Image: Clock Synchronization Topologies         uction       Image: Clock Synchronization Topologies       Image: Clock Synchronization Topologies         sed Supply Compensated Delay Line       Image: Clock Synchronization Topologies       Image: Clock Synchronization Topologies         sed Supply Compensated Delay Line       Image: Clock Synchronization Topologies       Image: Clock Synchronization Topologies         Supply Auto-Tuning Algorithm       Image: Clock Synchronization Topologies       Image: Clock Synchronization Topologies         V Auto-tuning integration in the MISC 3D-IC clock synchronization       Image: Clock Synchronization Topologies       Image: Clock Synchronization Topologies         State 1       Image: Clock Synchronization Topologies       Image: Clock Synchronization Topologies       Image: Clock Synchronization Topologies         Image: Clock Synchronization Topologies       Image: Clock Synchronization Topologies       Image: Clock Synchronization Topologies         State 1       Image: Clock Synchronization Topologies       Image: Clock Synchronization Topologies       Image: Clock Synchronization Topologies         Image: Clock Synchronization Topologies       Image: Clock Synchronization Topologies       Image: Clock Synchronization Topologies         Image: Clock Synchronization Topologies       Image: Clock Synchronization Topologies       Image: Clock Synchronization Topologies | <ul> <li>41</li> <li>42</li> <li>43</li> <li>43</li> <li>45</li> <li>47</li> <li>49</li> <li>49</li> <li>50</li> <li>50</li> <li>52</li> </ul>                         |
| Chapte<br>3.1<br>3.2<br>3.3<br>3.4<br>3.4 | er <b>3</b><br>Introd<br>Propos<br>3.2.1<br>3.2.2<br>Supply<br>tion an<br>3.3.1<br>3.3.2<br>Circuir<br>3.4.1<br>3.4.2<br>Result            | Supply Compensated Digitally Controlled Delay Lines<br>for 3D-IC Clock Synchronization Topologies                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | <ul> <li>41</li> <li>42</li> <li>43</li> <li>43</li> <li>45</li> <li>47</li> <li>49</li> <li>49</li> <li>50</li> <li>50</li> <li>50</li> <li>52</li> <li>54</li> </ul> |
| Chapte<br>3.1<br>3.2<br>3.3<br>3.4<br>3.5 | er 3<br>Introd<br>Propos<br>3.2.1<br>3.2.2<br>Supply<br>tion an<br>3.3.1<br>3.3.2<br>Circuir<br>3.4.1<br>3.4.2<br>Result<br>3.5.1          | Supply Compensated Digitally Controlled Delay Lines<br>for 3D-IC Clock Synchronization Topologies         uction         sed Supply Compensated Delay Line         Clock Buffer Compensation         Supply Auto-Tuning Algorithm         V Auto-tuning integration in the MISC 3D-IC clock synchroniza-         State 1         State 2         V Implementation         Delay Lines DLF and DLR         Overview of the supply compensated MISC architecture         Supply noise compensation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | <ul> <li>41</li> <li>42</li> <li>43</li> <li>43</li> <li>45</li> <li>47</li> <li>49</li> <li>49</li> <li>50</li> <li>50</li> <li>52</li> <li>54</li> <li>55</li> </ul> |
| Chapte<br>3.1<br>3.2<br>3.3<br>3.4<br>3.5 | er 3<br>Introd<br>Propos<br>3.2.1<br>3.2.2<br>Supply<br>tion an<br>3.3.1<br>3.3.2<br>Circuir<br>3.4.1<br>3.4.2<br>Result<br>3.5.1<br>3.5.2 | Supply Compensated Digitally Controlled Delay Lines         for 3D-IC Clock Synchronization Topologies         uction         sed Supply Compensated Delay Line         Clock Buffer Compensation         Supply Auto-Tuning Algorithm         v Auto-tuning integration in the MISC 3D-IC clock synchroniza-         rchitecture         State 1         State 2         t Implementation         Overview of the supply compensated MISC architecture         s         Supply noise compensation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 41<br>42<br>43<br>43<br>45<br>47<br>49<br>49<br>50<br>50<br>52<br>50<br>52<br>54<br>55<br>57                                                                           |

| Chapter 4    |                         | Beyond Rail-to-Rail Compliant Current Sources for Mis-<br>match Insensitive Voltage to Time Conversion 62                                                                        |
|--------------|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 4.1          | Introd                  | uction                                                                                                                                                                           |
| 4.2          | Propo<br>4.2.1<br>4.2.2 | sed BR2R Current Source64Compliance Voltage66Output Impedance66                                                                                                                  |
| 4.3          | BR2R<br>4.3.1<br>4.3.2  | integration in a Differential Voltage to Time Converter67Calibration Phase684.3.1.1Offset Null684.3.1.2Current Equalization684.3.1.3Current Calibration68Time Conversion Phase69 |
| 4.4          | Result                  | 5SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS                                                                                                                                            |
| 4.5          | Conclu                  | usion                                                                                                                                                                            |
| Chapte       | er 5                    | Conclusion                                                                                                                                                                       |
| 5.1          | Conclu                  | usions                                                                                                                                                                           |
| 5.2          | Future                  | e Work                                                                                                                                                                           |
| Bibliography |                         |                                                                                                                                                                                  |
| Appen        | dix A                   | : Copyright Permission                                                                                                                                                           |

# List of Tables

| Table 2.1 | Residual skew in the DTD and MISC topologies $\ldots \ldots \ldots$ | 32 |
|-----------|---------------------------------------------------------------------|----|
| Table 2.2 | MISC comparison with prior topologies                               | 36 |
| Table 3.1 | MISC measurement results                                            | 61 |
| Table 4.1 | BR2R DVT measurement results                                        | 75 |

# List of Figures

| Figure 2.1  | Die-to-Die (DTD) synchronization topology                                      | 8  |
|-------------|--------------------------------------------------------------------------------|----|
| Figure 2.2  | Proposed MISC clock sycnhronization architecture                               | 10 |
| Figure 2.3  | MISC state flow chart                                                          | 13 |
| Figure 2.4  | Timing diagram of the MISC architecture                                        | 14 |
| Figure 2.5  | MISC timing diagram in states 1 and 2                                          | 18 |
| Figure 2.6  | Phase detector dead zone                                                       | 20 |
| Figure 2.7  | Jitter induced skew                                                            | 21 |
| Figure 2.8  | Delay line 'DLF' using coarse and fine delay elements, DLR is identical to DLF | 23 |
| Figure 2.9  | Mismatch between identical delay lines                                         | 23 |
| Figure 2.10 | Source time to digital converter                                               | 24 |
| Figure 2.11 | Load phase detector                                                            | 25 |
| Figure 2.12 | Detailed circuit blocks for the source equalization phase                      | 26 |
| Figure 2.13 | Post synthesis timing simulation of the MISC architecture                      | 27 |
| Figure 2.14 | Complete MISC layout                                                           | 28 |
| Figure 2.15 | Permutation sum and digital test flow                                          | 30 |
| Figure 2.16 | MISC testbench setup                                                           | 31 |
| Figure 2.17 | Residual skew                                                                  | 34 |
| Figure 2.18 | Residual skew with delay lines mismatch                                        | 34 |
| Figure 2.19 | Peak residual skew considering all mismatch error sources                      | 35 |
| Figure 2.20 | MISC delay paths                                                               | 38 |
| Figure 3.1  | Ideal power supply compensation in digital buffers                             | 44 |
| Figure 3.2  | Supply compensated digital buffer                                              | 44 |
| Figure 3.3  | Block diagram of the proposed auto-tuned supply compensated delay line.        | 45 |

| Figure 3.4  | Flow chart of the proposed supply auto-tuning algorithm                                | 46 |
|-------------|----------------------------------------------------------------------------------------|----|
| Figure 3.5  | Block diagram of the supply compensated MISC architecture .                            | 47 |
| Figure 3.6  | Flow chart of the supply compensated MISC architecture                                 | 48 |
| Figure 3.7  | Supply compensated binary delay line $DLF$                                             | 51 |
| Figure 3.8  | The conventional (uncompensated) binary delay line $DLR\;$                             | 51 |
| Figure 3.9  | Timing diagram of the supply compensated MISC architecture.                            | 52 |
| Figure 3.10 | Die photo of the supply compensated MISC architecture fabri-<br>cated in 65nm CMOS     | 54 |
| Figure 3.11 | Simulated supply sensitivity of the binary delay line $DLF$                            | 55 |
| Figure 3.12 | Measured oscilloscope jitter histogram of the uncompensated reverse path               | 56 |
| Figure 3.13 | Measured oscilloscope jitter histogram of the supply compen-<br>sated forward path     | 57 |
| Figure 3.14 | Measured rms jitter of the supply compensated forward path at $1GHz$                   | 57 |
| Figure 3.15 | Comparison of the measured rms jitter in the forward path vs<br>the reverse path       | 58 |
| Figure 3.16 | Measured delay mismatch between DLF and DLR                                            | 58 |
| Figure 3.17 | Measured oscilloscope clock waveforms before MISC synchronization at $1GHz$            | 59 |
| Figure 3.18 | Measured oscilloscope clock waveforms after MISC synchronization at $1GHz$             | 59 |
| Figure 3.19 | Measured final residual skew                                                           | 60 |
| Figure 3.20 | Measured rms jitter through the forward and reverse MISC clock paths with quiet supply | 60 |
| Figure 4.1  | Cascode current source and its applications                                            | 64 |
| Figure 4.2  | Beyond rail-to-rail (BR2R) cascode current source                                      | 65 |
| Figure 4.3  | Differential Voltage-to-Time (DVT) architecture                                        | 67 |
| Figure 4.4  | Timing diagram of the DVT conversion cycle                                             | 68 |
| Figure 4.5  | The DVT current calibration loop                                                       | 69 |

| Figure 4.6  | DVT in the time conversion phase                             | 70 |
|-------------|--------------------------------------------------------------|----|
| Figure 4.7  | Simulated BR2R V-I curves                                    | 71 |
| Figure 4.8  | Die photo of the DVT prototype fabricated in 65nm CMOS. $$ . | 72 |
| Figure 4.9  | The DVT test setup                                           | 73 |
| Figure 4.10 | Measured DVT power spectral density                          | 73 |
| Figure 4.11 | Measured DVT characteristics utilizing the BR2R cascode pair | 74 |
| Figure 4.12 | Simulated transient response of the DVT architecture         | 75 |

### Abstract

Reduced voltage dynamic range and increased mismatch between identical circuit components become pressing challenges as the transistor dimensions enter the nanometer scale. Recently, through silicon via (TSV) technology allows diverse analog and digital dies to be stacked vertically forming a compact three dimensional integrated circuit (3D-IC). Yet, enhanced system integration in 3D-ICs comes at the cost of increased mismatch resulting from TSV defects and worsening thermal gradients when compared to their 2D counterparts. This thesis presents mismatch insensitive circuit design techniques for two applications including differential voltage to time converters and digital clock distribution architectures for 3D-ICs.

This thesis purposes a mismatch insensitive skew compensation (MISC) architecture for 3D-ICs that can align a source clock in die-1 with a load clock in die-2 regardless of control code dependent mismatch between delay lines or defect induced delay disparity in TSVs. Additionally, an on-chip auto-tuning algorithm to reduce the supply voltage sensitivity of delay lines utilized in MISC is presented. This supply compensated MISC architecture is fabricated in 65nm CMOS and occupies  $0.016mm^2$ while dissipating 4.8mW at 1GHz from a 1V supply. The maximum residual skew between the die-1 and die-2 clocks measures under 30ps for upto 50% mismatch in delay lines and upto 1ns delay disparity between TSVs. The rms jitter of this supply compensated MISC design measures 3.0ps in the presence of a 25mV 1MHz supply noise at 1GHz operation, compared to 112.3ps for the conventional design.

Additionally, beyond rail-to-rail (BR2R) compliant cascode current sources that can linearly charge a load capacitor to beyond the supply rails  $V_{dd}$  or gnd while maintaining an improved output impedance over an equivalent wide-swing cascode source are purposed. A mismatch insensitive differential voltage to time converter (DVT) employing these BR2R sources is fabricated in 65nm CMOS and dissipating  $47\mu W$  at 1V. The measured BR2R DVT SNDR is 50.2dB, compared to 38.7dB for the wide-swing cascode based DVT within a 2MHz input bandwidth. The DVT achieves a CMRR of 35.1dB for a 0.4V to 0.6V input common-mode range.

# List of Abbreviations Used

| SNDR          | Signal to Noise and Distortion Ratio              |
|---------------|---------------------------------------------------|
| SNR           | Signal to Noise Ratio                             |
| CMRR          | Common Mode Rejection Ratio                       |
| DVT           | Differential Voltage to Time Converter            |
| VTC           | Voltage to Time Converter                         |
| PCB           | Printed Circuit Board                             |
| pk-pk         | Peak-to-Peak                                      |
| MISC          | Mismatch Insensitive Skew Compensation            |
| TSV           | Through Silicon Via                               |
| 3D-IC         | Three Dimensional Integration Circuit             |
| 3D            | Three Dimensional                                 |
| 2D            | Two Dimensional                                   |
| DLL           | Delay Locked Loops                                |
| BR2R          | Beyond Rail-to-Rail                               |
| PVT           | Process, Voltage and Temperature                  |
| PWM           | Pulse Width Modulation                            |
| DTD           | Die-to-Die                                        |
| GRO           | Gated Ring Oscillator                             |
| PLL           | Phase Locked Loop                                 |
| TDC           | Time to Digital Converter                         |
| MOSFET        | Metal Oxide Semiconductor Field Effect Transistor |
| CMOS          | Complimentary Metal Oxide Semiconductor           |
| ADDLL         | All Digital Delay Locked Loops                    |
| SS            | Slow-Slow                                         |
| $\mathbf{FF}$ | Fast-Fast                                         |
| SF            | Slow-Fast                                         |
| FS            | Fast-Slow                                         |
| TT            | Typical-Typical                                   |

### Acknowledgements

I would like to thank my supervisor Dr. Kamal El-Sankary for his guidance, funding and support throughout the research work for this thesis.

I also want to thank Dr. Jason Gu and Dr. William Phillips for being part of my supervisory committee, and Dr. Mourad N El-Gamal for serving as the external PhD examiner.

I wish to thank our department secretary Nicole Smith for her help and support on countless administrative matters during my PhD studies.

I express my heartfelt gratitude to my wife Gurbinder Kaur and my daughter Mehtab Sandhu for their unending patience and support throughout this long journey.

Finally, I express my sincere love for my parents Sahib Singh and Gurdarshan Kaur, and my brother Bikramjeet Singh.

### Chapter 1

### Introduction

Moore's law states that the transistor density in a given package doubles approximately every two years. As a result the computational density of electronic circuits has improved steadily over the last couple of decades. System integration can be improved further by employing three dimensional integrated circuits (3D-IC) wherein diverse analog and digital blocks are fabricated on individual dies, which are then stacked vertically using through silicon vias (TSV). This 3D-IC technology reduces time lag between various system blocks leading to greater throughput within a compact design. However, such continuous transistor size scaling or 3D-IC integration can have serious repercussions for both analog and digital circuit design.

For example, continuous transistor size scaling necessitates reduced supply voltages which severely limits the allowable voltage dynamic range, thereby degrading the signal to noise ratio in traditional voltage mode circuits. Moreover, smaller transistor dimensions exacerbate mismatch between identical circuit components which worsens distortion in fully differential analog building blocks. Hence, analog applications such as differential voltage to time (pulse width) converters suffer from increased mismatch and reduced dynamic range in their constituent current sources and comparators in advanced CMOS technologies.

Digital 3D-IC clock distribution topologies aim to synchronize data across the entire 3D tier. This is normally achieved via delay locked loops (DLL) which match the phase of the source clock in die-1 to that of the load clock in die-2 by forming a feedback loop consisting of forward and reverse delay lines and through silicon vias (TSV). However, the propagation delay of TSVs is susceptible to process, voltage and temperature variations [1]. Also, the delay through a fabricated TSV can increase significantly due to open defects [2], [3]. In general 3D-ICs suffer from worse thermal gradients than their 2D (planar) counterparts [4]. Moreover, nonuniform thermal and voltage drop characteristics in 3D-ICs can exacerbate supply noise when compared to their 2D counterparts [5], [6]. This causes severe mismatch between the delay lines

and TSVs of the DLL feedback loop which combined with the supply noise induced jitter leads to additional clock skew between the source and load clocks in the two dies stifling data throughput. Therefore, mismatch insensitive circuit design techniques are needed to address some of the problems caused by reduced transistor dimensions in advanced CMOS technologies.

#### 1.1 Thesis Objectives

The primary objective of this thesis is to mitigate the adverse effects of circuit mismatch on the performance of differential voltage to time (DVT) converter and clock distribution topologies in three dimensional integrated circuits. The other goal is to extend the voltage dynamic range of the constituent current sources within the DVT converter and reduce the supply voltage sensitivity of the digital delay lines embedded within the 3D-IC clock synchronization architecture. The final goal is to verify the proposed techniques on silicon by fabricating the DVT and clock distribution architectures on chip followed by rigorous testing under various mismatch or supply noise conditions.

#### **1.2** Thesis Contribution

This work purposes mismatch insensitive circuit design techniques for differential voltage to time converters (DVT) and clock distribution topologies for three dimensional integrated circuits (3D-IC). The thesis contribution extends across two published journals, where the first [7] purposes a mismatch insensitive skew compensation architecture (MISC) for clock synchronization in 3D-ICs, while the second [8] purposes a mismatch insensitive differential voltage to time converter. The third yet unpublished work purposes an auto-tuning algorithm to reduce the supply voltage sensitivity of digitally controlled delay lines for 3D-IC clock synchronization architectures. The contributions within each of the three works are listed in well defined chapters 2-4 as follows:

• A Mismatch-Insensitive Skew Compensation Architecture for Clock Synchronization in 3-D ICs [7] in chapter 2.

- Supply Compensated Digitally Controlled Delay Lines for 3D-IC Clock Synchronization Topologies in chapter 3.
- Beyond Rail-to-Rail Compliant Current Sources for Mismatch Insensitive Voltage to Time Conversion [8] in chapter 4.

A brief summary of the problem statement and contribution in each thesis chapter is presented.

## 1.2.1 A Mismatch-Insensitive Skew Compensation Architecture for Clock Synchronization in 3-D ICs

Clock distribution across two dies in a 3D-IC is achieved via delay locked loops (DLL) which match the phase of the source clock in die-1 to that of the load clock in die-2 by forming a feedback loop consisting of forward and reverse delay lines and through silicon vias (TSV). Traditional solutions to skew compensation rely on perfect matching between these constituent delay lines and/or matched TSV delays. However, these assumptions are elusive given the worsening intra-die process variation in deep submicron CMOS technologies [9] coupled with TSV defects [2] and thermal gradients [4], which can cause severe clock skew between the source and load clocks in the two dies.

Hence, a mismatch insensitive skew compensation architecture (MISC) for 3D-ICs is presented in this work. The proposed MISC topology utilizes an all-digital iterative DLL algorithm to eliminate any clock skew resulting from control code dependent mismatch between delay lines or unequal TSV delays. The MISC performance is verified in theory and simulation in light of mismatch/finite resolution of delay lines, clock jitter, phase detector dead zone, TSV delay and buffer mismatch. Post synthesis timing verification of this cell based design is completed in 65nm CMOS process. Under similar worse case mismatch conditions, residual skew in the proposed MISC architecture is delimited to 32ps at 1GHz, compared to 116ps for a recent die-to-die clock synchronization topology [18].

## 1.2.2 Supply Compensated Digitally Controlled Delay Lines for 3D-IC Clock Synchronization Topologies

Digitally controlled delay lines form a major component of many 3D-IC clock synchronization architectures which are essentially DLLs aimed at aligning the source clock in die-1 with a load clock in die-2. However, the propagation delay through these delay lines is susceptible to the supply voltage noise generated across the entire 3D tier. Moreover, supply noise leads to additional clock jitter in these delay lines which can exacerbate residual skew between the source and load clocks upon final synchronization [7]. Supply noise suppression via voltage regulators [10], [11] comes at the cost of a reduced voltage headroom. Other oscillator based supply compensation techniques [12], [13] can not be directly ported to single ended digitally controlled delay lines.

Therefore, an auto-tuning algorithm is presented to reduce the supply voltage sensitivity of digitally controlled delay lines regardless of process or temperature variations. This supply compensated delay line is further incorporated within the MISC 3D-IC clock synchronization architecture. The complete supply compensated MISC topology is fabricated in 65nm CMOS and demonstrates robust performance against the buffer supply noise at operating frequencies of 250MHz to 1GHz. The rms jitter of this supply compensated MISC design measures 3.0ps in the presence of a 25mV 1MHz supply noise at 1GHz operation, compared to 112.3ps for the conventional (uncompensated) design. The complete design occupies an active area of  $0.016mm^2$  and dissipates 4.8mW at 1GHz from a 1V supply.

## 1.2.3 Beyond Rail-to-Rail Compliant Current Sources for Mismatch Insensitive Voltage to Time Conversion

Switched current sources are widely used in applications ranging from ramp generators to PLL charge pumps. The extensively used wide-swing cascode current source has a compliance voltage of  $V_{dd} - 2 \cdot V_{ov}$ , where  $V_{dd}$  and  $V_{ov}$  are the supply and MOSFET overdrive voltages, respectively. Moreover, the dynamic range of aforesaid applications is often critically dependent upon the compliance voltage of their constituent current sources. For example, the input dynamic range of a ramp based voltage to pulse width converter will be limited by its current source to  $V_{dd} - 2 \cdot V_{ov}$ . Therefore, this work presents a beyond rail-to-rail (BR2R) current source to deliver a compliance voltage as high as  $V_{dd}+V_{th}$ , while maintaining a higher output impedance than an equivalent wide-swing cascode source, where  $V_{th}$  is the MOSFET threshold voltage. Additionally, a process and mismatch insensitive differential voltage to time converter (DVT) employing the proposed BR2R sources is presented. The proposed DVT design incorporates three calibration loops to achieve process and mismatch immunity. The complete DVT architecture is fabricated in 65nm CMOS and occupies  $0.021mm^2$ , while dissipating only  $47\mu W$  at 1V. The BR2R DVT achieves an SNDR of 50.2dB and a CMRR of 35.1dB for a 0.4V to 0.6V input common-mode range.

#### 1.3 Thesis Outline

This thesis is organized as follows. Chapter 2 introduces a mismatch insensitive clock distribution architecture (MISC) for three dimensional integrated circuits to align the source clock in die-1 with a load clock in die-2 regardless of device mismatch or process variations. A comprehensive review of the various error sources is presented along with their relative contribution to residual skew in the purposed MISC architecture. The MISC performance is compared against prior clock synchronization topologies under diverse mismatch conditions. Next, chapter 3 purposes an auto-tuning algorithm to reduce the supply voltage sensitivity of digitally controlled delay lines, which are further incorporated within the MISC topology. Fabrication and measurement results of the complete supply compensated MISC architecture are presented under diverse mismatch and supply noise conditions. Chapter 4 purposes beyond rail-to-rail (BR2R) compliant current sources to improve the compliance voltage and output impedance of the extensively used wide-swing cascode current source. These BR2R sources are further integrated within a mismatch insensitive differential voltage to time (DVT) converter. Fabrication and measurement results of the complete BR2R based DVT architecture are presented, where its performance is verified under process, voltage and temperature variations. Finally, conclusion is presented in chapter 5 which also includes suggestions for future research work.

Chapter 2

A Mismatch Insensitive Skew Compensation Architecture for Clock Synchronization in 3D ICs

Tejinder Singh Sandhu and Kamal El-Sankary

© 2015 IEEE Reprinted, with permission, from:

T. S. Sandhu and K. El-Sankary, "A Mismatch-Insensitive Skew Compensation Architecture for Clock Synchronization in 3-D ICs," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 24, no. 6, pp. 2026-2039, June 2016. doi: 10.1109/TVLSI.2015.2496312

#### Abstract

Traditional Die-to-Die (DTD) clock skew compensation topologies prerequisite matched delay lines or equal through silicon via (TSV) delays. Unlike previous techniques, the proposed mismatch insensitive skew compensation (MISC) architecture can maintain a synchronous clock signal between two dies while completely eliminating any skew arising from code dependent mismatch in delay lines or unequal TSV delays. The performance of our design is verified in theory and simulation in light of mismatch/finite resolution of delay lines, clock jitter, phase detector dead zone, TSV delay and buffer mismatch. Post synthesis timing verification of this cell based design was done in 65nm CMOS process. Under similar worse case mismatch conditions, residual skew in the proposed architecture was delimited to 32ps at 1GHz, compared to 116ps for a recent DTD topology, while consuming only 2.1mW.

#### 2.1 Introduction

Through silicon via technology allows multiple stacked dies, while avoiding the excessive delay limitations of bonding wires. However, clock distribution in such a 3D die setup remains challenging due to the following reasons. Delay through TSVs is susceptible to process and temperature variations [1]. Also, the delay through a fabricated TSV could increase significantly due to open defects [2], [3], leading to significant skew among identical clock distribution networks. Moreover, cross-die process variation limits the slack for both on die critical paths and die-to-die paths using TSVs [14], thus requiring a tight constraint on clock skew.

Traditionally, delay locked loops are employed to match the phase of load clock with a given reference [15]. However, this requires skew free distribution of a reference clock signal, which itself is challenging given the non-ideal behavior of 3D ICs. Techniques mentioned in [16] and [17] do not require a distributed reference clock signal at load. Instead they rely on creating a return path by replicating the forward path delay to cancel skew. Nonetheless, such an approach requires perfect matching between these paths, which is quite improbable in a 3D setup given the diverse delay mismatch through TSVs.

Most recently, Die-to-Die (DTD) [18] and all-digital delay-locked loop (ADDLL)



Figure 2.1: Die-to-Die (DTD) synchronization topology [18]

[19] techniques eliminate the need to exactly replicate forward path delay, thus removing one source of clock skew. These latest techniques rely on perfect matching between two averaging delay lines. However, this assumption is invalid for two reasons, first because of the worse thermal gradients in a 3D IC compared to its 2D counterpart [4]. Second, due to the increasing within die process variation in deep sub-micron technologies [9], [20].

This thesis presents a mismatch insensitive skew compensation (MISC) topology for 3D ICs, which is truly independent of the inter-die wire delay or the mismatch between the averaging delay lines. The proposed architecture is implemented using standard cells in 65nm CMOS process. Its performance is verified across multiple 10571 unique post layout simulation runs, resulting into a peak residual skew of 32ps from 250MHz to 1GHz under worst case conditions. The proposed design consistently outperforms its recent counterparts [18], [19] under similar mismatch conditions.

#### 2.2 DTD operation and limitations

Replicating the forward wire delay to cancel skew was a major source of mismatch in [16] and [17], until the DTD topology [18] (Fig. 2.1) made synchronization independent of the forward wire delay. This topology operates in two phases, in the source equalization phase 'dir' is set low. Subsequently, the phase detector in Die-1 increments delay in the source delay line (DLS) until  $\phi s$  and the feedback clock  $\phi f b$ are aligned to within '*lsb*'. This places  $\phi f$ , ' $\delta \pm lsb$ ' delay ahead of  $\phi s$ . Where ' $\delta$ ' is the delay through the reverse path consisting of 'WireR', buffers B1, B3 and frequency divider div2, while 'lsb' is the delay resolution of DLS, DLL1 and DLL2.

$$\phi f = nT - \delta \pm lsb \tag{2.1}$$

Assuming matched buffers B1-B4 and frequency dividers  $(\div 2)$  div1 and div2, in the load equalization phase 'dir' is set high, so  $\phi f w$  is now a ' $\delta$ ' delayed version of  $\phi s$ .

$$\phi f w = (n-1)T + \delta \tag{2.2}$$

Where 'T' is one cycle period of the source clock  $\phi s$  and 'n' is an integer. From Fig. 2.1,  $\phi f$  and  $\phi f w$  are placed symmetrically around  $\phi s$  at a mean delay of ' $\delta$ '. Again, phase detector in Die-2 increments delay in load delay lines DLL1 and DLL2, until  $\phi f d$  and  $\phi f w$  are aligned to within  $2 \cdot lsb$  (equation (2.3)). In doing so,  $\phi f$  is delayed ' $2 \cdot \delta$ ' to align its phase with  $\phi f w$ . Since DLL1 and DLL2 are matched, so each provide a delay of ' $\delta$ '. Hence, the load clock  $\phi l$  now lies at the mid point between  $\phi f$  and  $\phi f d$ , aligning it in phase with the source clock  $\phi s$ , finishing synchronization. Mathematically:

$$\phi f d = nT + \delta \pm 2 \cdot lsb \tag{2.3}$$

$$d_1 + d_2 = |\phi f d - \phi f| \tag{2.4}$$

Where 'd<sub>1</sub>' and 'd<sub>2</sub>' are the delays added in DLL1 and DLL2, respectively. If ' $\delta_m$ ' is the delay mismatch between DLL1 and DLL2 at the final control code, then  $d_1 = d \pm \frac{\delta_m}{2}$  and  $d_2 = d \mp \frac{\delta_m}{2}$ . Using these values of  $d_1$  and  $d_2$  in equation (2.4) gives  $d = \delta \pm \frac{lsb}{2}$ . Now, knowing  $\phi l = \phi f + d_1$  yields:

$$\phi l = nT \pm \frac{3}{2} \cdot lsb \pm \frac{\delta_m}{2} \tag{2.5}$$

If  $t_{B1}-t_{B4}$  are the delays through B1–B4 and  $t_{div1}, t_{div2}$  are the delays through frequency dividers div1, div2, respectively. Then in the presence of phase detector dead zone width ' $\delta_w$ ', input jitter ' $\delta_j$ ' and buffers mismatch ' $\delta'_d = (t_{B1}+t_{B3}+t_{div2}) - (t_{B2}+t_{B4}+t_{div1})$ ', equation (2.5) modifies to:

$$\phi l = nT \pm \frac{3}{2} \cdot lsb \pm \frac{\delta_m}{2} \pm \frac{\delta'_d}{2} \pm \frac{\delta_w}{2} \pm \delta_j \tag{2.6}$$

A detailed analysis of these additional error sources is given in section IV. From Fig. 2.9 and Table 2.1, residual skew contribution from control code dependent mismatch ' $\delta_m$ ' between identical delay lines can be an order of magnitude higher (hundreds



Figure 2.2: Proposed MISC architecture

of picoseconds), compared to the sum of  $\delta_d$ ,  $\delta_w$  and  $\delta_j$  (few tens of picoseconds). This could lead to significant skew in the DTD architecture, which fails to provide a comprehensive synchronization solution under worse case conditions. MISC on the other hand completely eliminates any residual skew contribution from  $\delta_m$ .

#### 2.3 Proposed MISC Architecture and Operation

The MISC architecture is shown in Fig. 2.2 along with its timing diagram in Fig. 2.4. The MISC consists of two accumulators (ACMF, ACMR), two digitally controlled delay lines (DLF, DLR), five tri-state buffers (B1–B5), two clock frequency dividers  $(s \div 2, r \div 2)$ , a multiplexer (MUX) to choose between the source (PDS) or load (PDL) phase detector outputs and an xor gate (X) for negation of MUX output. Additionally, a controller (not shown) and coarse time to digital converters in both dies to reduce lock in time (not shown), conclude the MISC architecture. The goal of MISC is to align  $\phi f d$  with  $\phi s$ , regardless of mismatch between DLF and DLR or between TSVf and TSVr, where TSVf, TSVp and TSVr are through silicon vias between the two dies.

The clock path direction from  $B3 \rightarrow TSVr \rightarrow r \div 2$  is opposite to that from  $B5 \rightarrow TSVr \rightarrow B4$ . The direction control bit 'dir' switches between these two paths.

Load Equalization ('dir' = high)
B1 (snc = high), B5 and B4 are on while B2 and B3 are tri-stated, activating the following clock paths.

Forward Path: From  $\phi s$  to  $\phi f d$  through  $s \to B1 \to TSVf \to DLF \to PDL$ . Reverse Path: From  $\phi s$  to  $\phi r d$  through  $s \to B5 \to TSVr \to B4 \to DLR \to PDL$  i.e.,

$$\phi f d = t_{B1} + t_{TSVf} + t_{DLF} \tag{2.7}$$

$$\phi rd = t_{B5} + t_{TSVr} + t_{B4} + t_{DLR} \tag{2.8}$$

Where  $t_{B1}$ ,  $t_{B4}$ ,  $t_{B5}$  are delays through tri-state buffers B1, B4, B5;  $t_{TSVf}$ ,  $t_{TSVr}$  are the delays through TSVf, TSVr and  $t_{DLF}$ ,  $t_{DLR}$  are the delays through DLF, DLR, respectively. Upon activating these paths, PDL in conjugation with ACMF and ACMR adjusts delays in DLF and/or DLR to satisfy:

Condition 1: 
$$\phi f d - \phi r d = 0$$
 (2.9)

• Source Equalization ('dir' = low)

B1 (snc = high), B2 and B3 are on while B4 and B5 are tri-stated, activating the following clock paths.

Source Path: From  $\phi s$  to  $\phi s div$  through  $s \to s \div 2 \to PDS$ . Feedback Path: From  $\phi s$  to  $\phi r div$  through  $s \to B1 \to TSVf \to DLF \to B2 \to DLR \to B3 \to TSVr \to r \div 2 \to PDS$  i.e.,

$$\phi sdiv = t_{s \div 2} \tag{2.10}$$

$$\phi r div = t_{B1} + t_{TSVf} + t_{DLF} + t_{B2} + t_{DLR} + t_{B3} + t_{TSVr} + t_{r \div 2}$$
(2.11)

Where  $t_{B2}$ ,  $t_{B3}$  are delays through tri-state buffers B2, B3 and  $t_{s\div 2}$ ,  $t_{r\div 2}$  are the delays through clock dividers  $s \div 2$ ,  $r \div 2$ , respectively. From equations (2.7), (2.8), (2.10) & (2.11) assuming matched buffers ( $t_{B1} = t_{B2} = t_{B3} = t_{B4} = t_{B5}$ ) and dividers ( $t_{s\div 2} = t_{r\div 2}$ ), the PDS determined delay difference between source and feedback paths becomes:

$$\phi r div - \phi s div = \phi f d + \phi r d \tag{2.12}$$

Now,  $s \div 2$  and  $r \div 2$  would enforce a delay of '2nT' around the feedback path i.e.,  $\phi r div - \phi s div = 2nT$ , where 'n' is an integer and 'T' is the time period of the source clock  $\phi s$ . Using equation (2.12), upon activating source and feedback paths, PDS in conjugation with ACMF and ACMR adjusts delays in DLF and DLR to satisfy:

Condition 2: 
$$\phi f d + \phi r d = 2nT$$
 (2.13)

Therefore, we can quantify both the delay difference (load equalization) and delay sum (source equalization) of the forward and reverse clock paths. For matched delay lines DLF and DLR, a simple two state process i.e., load equalization followed by source equalization will fulfill both the conditions in equations (2.9) & (2.13), thus aligning  $\phi f d$  with  $\phi s$ . However, as shown in Fig. 2.3 and discussed next, satisfying these conditions in the presence of code dependent mismatch in DLF and DLR requires additional MISC processing before synchronization is achieved.

#### 2.3.1 State 1: dir = 1, snc = 1, neg = 1, MUX = PDL

From equations (2.7) & (2.8), initially the forward and reverse clock paths ( $\phi f d$  and  $\phi r d$ ) are out of phase mainly due to the PVT induced delay mismatch in TSVf and TSVr. MISC compensates for this mismatch by performing load equalization, with an aim to satisfy condition 1 in equation (2.9) under the realm of finite minimum resolution ('*lsb*') of DLF and DLR.

MISC equalizes  $\phi f d$  with  $\phi r d$  in Die-2 by adding delay to the path which leads i.e., if PDL indicates  $\phi f d$  leading  $\phi r d$ , then ACMF is enabled (enF = 1) to add delay in DLF, while ACMR is disabled (enR = 0) to hold delay in DLR and vica-versa. From Fig. 2.4, at the end of state 1,  $\phi f d$  and  $\phi r d$  lie at a mean delay of ' $\delta$ ' from the source  $\phi s$  and are aligned to within 'lsb':

$$\phi f d_{s1} = (n-1)T + \delta \pm \frac{1}{2} \cdot lsb \tag{2.14}$$

$$\phi r d_{s1} = (n-1)T + \delta \mp \frac{1}{2} \cdot lsb \qquad (2.15)$$

Alternatively, state 1 ensures that the delay mismatch between TSVf and TSVr does not contribute to any phase error between  $\phi f d$  and  $\phi s$  upon final synchronization. While subsequent stages work to align  $\phi f d$  with  $\phi s$ , regardless of control code dependent mismatch between delay lines DLF and DLR.



Figure 2.3: MISC state flow chart. Shaded blocks represent load equalization, while white blocks comprise source equalization

#### 2.3.2 State 2: dir = 0, snc = 1, neg = 0, MUX = PDS, enF = enR = 1

At this point, path delays at  $\phi f d$  and  $\phi r d$  satisfy condition 1 in equation (2.9) to within 'lsb'. However, their sum is still far from '2nT' required to satisfy condition 2 in equation (2.13). So, the MISC in state 2 performs source equalization by equally incrementing delays in DLF and DLR until condition 2 is satisfied (condition 1 continues to be satisfied).

The source and feedback path are enabled, allowing PDS to simultaneously increment delays in DLF and DLR with an attempt to align  $\phi sdiv$  with  $\phi rdiv$ . In light of equations (2.12) & (2.13), since both ACMF and ACMR are enabled, therefore  $\phi rdiv$ and  $\phi sdiv$  are matched to within  $2 \cdot lsb$  delay resolution:

$$\phi r div - \phi s div = 2nT \pm 2 \cdot lsb$$

$$= \phi f d_{s1} + \phi r d_{s1} + D_f + D_r$$
(2.16)

Where  $D_f$  and  $D_r$  are the additional delays added in DLF and DLR, respectively. For a mean delay ' $d_2$ ' added in state 2 with 'm' percent mismatch, the delays through DLF and DLR become  $D_f = d_2 \cdot (1 \pm m/200)$  and  $D_r = d_2 \cdot (1 \mp m/200)$ , respectively. Using this in equations (2.14), (2.15) and (2.16) yields:

$$d_2 = T - \delta \pm lsb \tag{2.17}$$



Figure 2.4: Timing diagram of the proposed MISC architecture, indicating forward and reverse path delays at the end of each state (not to scale). Also  $e = \frac{m}{200}$  and  $\alpha' = \alpha \pm lsb$ 

Here, the (n-1)T terms in equations (2.14) & (2.15) signify that the maximum delay increment in DLF and DLR in state 2 is  $d_2 \approx T'$  ( $\delta \approx 0$ ). Using equations (2.14), (2.15) & (2.17) and knowing that  $\phi f d_{s2} = \phi f d_{s1} + D_f$  and  $\phi r d_{s2} = \phi r d_{s1} + D_r$ , gives:

$$\phi f d_{s2} = nT \pm lsb + \alpha \tag{2.18}$$

$$\phi r d_{s2} = nT \pm lsb - \alpha \tag{2.19}$$

Where  $\alpha = \pm l^{sb/2} \pm d_2 \cdot m/200$  is one half of the code dependent mismatch in DLF and DLR. In state 2, ACMF and ACMR are incremented equally, yet control code dependent mismatch causes DLF to accumulate more delay than DLR. The resulting error '2 ·  $\alpha$ ' is split differentially between the forward and reverse clock paths, around an integer multiple 'n' of the source clock period 'T' (Fig. 2.4).

#### 2.3.3 State 3: dir = 1, snc = 1, neg = 1, MUX = PDL, enF = enR = 1

From equations (2.18) & (2.19), the sum of forward and reverse path delays satisfy condition 2 in equation (2.13) to within  $2 \cdot lsb$ . However, mismatch in DLF and DLR causes ' $\alpha$ ' to appear in path delays  $\phi f d$  and  $\phi r d$ , violating both condition 1 in equation (2.9) and equality at the beginning of state 3 (Fig 2.3). MISC eliminates this error in state 3 by switching back to load equalization. Wherein, DLF can be decremented by ' $\alpha$ ' while incrementing DLR by the same amount to align  $\phi f d$  with  $\phi r d$ . Therefore, state 2 adds common mode delay while state 3 adds differential delays to DLF and DLR.

The forward and reverse clock paths are enabled and the MUX selects PDL to determine if they align within  $2 \cdot lsb$  (see Fig. 2.3). Upon entering state 3 the first time, mismatch between DLF and DLR could cause a phase error of greater than  $2 \cdot lsb$  to appear between  $\phi f d_{s2}$  and  $\phi r d_{s2}$ . In that case, the input to ACMF is negated with respect to ACMR, and the two accumulators begin counting in opposite directions until the clocks at  $\phi f d$  and  $\phi r d$  are aligned to within  $2 \cdot lsb$  i.e.,

$$D_f + D_r = |\phi f d_{s2} - \phi r d_{s2}| \pm 2 \cdot lsb$$
(2.20)

Now, for a mean delay ' $d_3$ ' added in state 3 with 'm' percent mismatch, the additional delays through DLF and DLR become  $D_f = d_3 \cdot (1 \pm m/200)$  and  $D_r = d_3 \cdot (1 \mp m/200)$ , respectively. Using this in equations (2.18), (2.19) and (2.20) yields:

$$d_3 = \alpha \pm lsb \tag{2.21}$$

DLF and DLR are incremented differentially, so  $\phi f d_{s3} = \phi f d_{s2} - D_f$  and  $\phi r d_{s3} = \phi r d_{s2} + D_r$ . Using these results and equations (2.18), (2.19) and (2.21) gives:

$$\phi f d_{s3} = nT \mp \frac{m}{200} \cdot (\alpha \pm lsb) \tag{2.22}$$

$$\phi rd_{s3} = nT \mp \frac{m}{200} \cdot (\alpha \pm lsb) \pm 2 \cdot lsb$$
 (2.23)

Hence, the error term ' $\alpha$ ' in equations (2.18) & (2.19) is reduced by the factor m/200. These results are illustrated in Fig. 2.4 (state 3), using e = m/200 and  $\alpha' = \alpha \pm lsb$ . In state 3, ACMR/ACMF are incremented/decremented equally, yet control code dependent mismatch causes DLF to decrement more delay compared to the incremented delay in DLR. At this point, paths  $\phi f d$  and  $\phi r d$  satisfy condition 1 in equation (2.9) to within  $2 \cdot lsb$ . Yet, depending upon 'm' and ' $\alpha$ ' these path delays might still violate condition 2 in equation (2.13).

### 2.3.4 State $2_2$ : dir = 0, snc = 1, neg = 0, MUX = PDS, enF = enR = 1

Preceding the failed condition check followed by load equalization in state 3 (Fig. 2.3), we now move to the second iteration of state 2 and perform source equalization. Similar to previous analysis, the error term  $m/200 \cdot (\alpha \pm lsb)$  in the above equations (2.22) & (2.23) is further reduced by the factor m/200, with the resulting error distributed differentially around the forward and reverse paths:

$$\phi f d_{s2_2} = nT \pm \left(\frac{m}{200}\right)^2 \cdot (\alpha \pm lsb)$$
(2.24)

$$\phi r d_{s2_2} = nT \mp \left(\frac{m}{200}\right)^2 \cdot (\alpha \pm lsb) \pm 2 \cdot lsb$$
 (2.25)

Again, for equal change in ACMF and ACMR, control code dependent mismatch causes DLF to accumulate more delay than DLR. Interestingly, notice that at the end of state  $2_2$ ,  $\phi f d$  is closer to 'nT' ( $\alpha' \cdot e^2$  away) than at any end point in the preceding states (Fig. 2.4). In other words, comparing equations (2.18), (2.22), (2.24) and Fig. 2.4, jumping back and forth between source and load equalization states reduces the mismatch induced phase difference between  $\phi f d$  and  $\phi s$  by 'm/200' at every iterative step. In general the  $k^{th}$  iteration of sate 2 yields:

$$\phi f d_{s2_k} = nT \pm \left(\frac{m}{200}\right)^{2(k-1)} \cdot (\alpha \pm lsb)$$
 (2.26)

$$\phi rd_{s2_k} = nT \mp \left(\frac{m}{200}\right)^{2(k-1)} \cdot (\alpha \pm lsb) \pm 2 \cdot lsb \qquad (2.27)$$

From Figs. 2.3 and 2.4, for  $k^{th}$  such iteration of state 2, the error term  $(m/200)^{2(k-1)} \cdot (\alpha \pm lsb)$  in the above equations (2.26) & (2.27) becomes much smaller than  $2 \cdot lsb$ . Hence, upon entering state  $3_k$ , the condition  $|\phi fd - \phi rd| \leq 2 \cdot lsb$  is satisfied, ending the synchronization cycle, finally:

$$\phi f d_{s3_k} = nT \tag{2.28}$$

$$\phi r d_{s3_k} = nT \pm 2 \cdot lsb \tag{2.29}$$

In equations (2.28) & (2.29), the  $2 \cdot lsb$  factor in  $\phi rd_{s3_k}$  would instead end up in  $\phi fd_{s3_k}$  if the expressions for  $\phi fd_{s1}$  and  $\phi rd_{s1}$  are juxtaposed in equations (2.14) & (2.15). Hence,  $\phi fd$  and  $\phi s$  are aligned to within  $2 \cdot lsb$ . In retrospect, from Fig. 2.2, delay elements in the feedback path that are common to the forward or reverse paths (B1, TSVf, TSVr, DLF, DLR) do not cause any skew at the output  $\phi fd$ . In other words, MISC completely eliminates any residual skew resulting from mismatch between DLF and DLR or between TSVf and TSVr.

To simplify analysis, DLF and DLR were assumed to have a constant mismatch of 'm%'. On the contrary, MISC is equally capable of eliminating mismatch between

non-linear delay lines. In such a case, the  $(\frac{m}{200})^{2(k-1)}$  factor in equations (2.26) & (2.27) simply becomes  $(\frac{m_1}{200} \times \frac{m_2}{200} \times \frac{m_3}{200} \cdots \times \frac{m_{2(k-1)}}{200})$ , where  $m_1, m_2, m_3...m_{2(k-1)}$  are the instantaneous percentage mismatches between DLF and DLR at every iterative step. Also, DLF and DLR are not confined to the same delay resolution 'lsb' i.e., the  $2 \cdot lsb$  factor in equations (2.28) or (2.29) becomes  $lsb_F + lsb_R$ . Where  $lsb_F$  and  $lsb_R$  are the instantaneous minimum delays through DLF and DLR, respectively.

#### 2.3.5 Tracking after synchronization

Once the synchronization cycle is completed, MISC enters dynamic tracking mode. Wherein, it continues to fine tune delays in DLF and DLR in a time-interleaved manner in order to keep  $\phi f d$  aligned with  $\phi s$ . Thus compensating temperature and long term jitter induced skews that could drift  $\phi f d$  away from  $\phi s$  over time. Tracking begins by performing load equalization in state 3, followed by source equalization in state 2 and continues until equality at the beginning of state 3 is satisfied (Fig. 2.3). Moving forward the tracking operation repeats itself at specific time intervals.

#### 2.3.6 Inverse locking resolution

Inverse locking occurs when the rising edge of  $\phi f d$  is aligned to the falling edge of  $\phi s$  i.e., a 180° phase error. Assuming matched delay lines with very fine resolution  $(lsb \approx 0)$ , the synchronization scheme requires only two steps. In the first step, load equalization matches forward and reverse path delays as in state 1. So,  $\phi f d_{s1}$  and  $\phi r d_{s1}$  are now placed at a delay of ' $\delta$ ' from  $\phi s$  (Fig. 2.5). Moreover, from equation (2.12), the phase difference between  $\phi s/\phi s div$  and  $\phi r^*/\phi r div^*$  in the following state 2 is '2· $\delta$ ', where  $\phi r^*$  and  $\phi r div^*$  are the initial states of signals  $\phi r$  and  $\phi r div$ , respectively. For the second step, source equalization is activated, resulting into two possible cases as shown in Fig. 2.5. In state 2–case a,  $\phi r div^*$  aligns with the first rising edge of  $\phi r^*$ . Therefore, to align the rising edges of  $\phi s div$  and  $\phi r div$ , DLF and DLR are evenly incremented with their delay sum equaling  $2 \cdot (T - \delta)$ . This places  $\phi f d_{s2}$  and  $\phi r d_{s2}$  at a delay of 'T' from  $\phi s$ , completing synchronization. However, if  $\phi r div^*$  aligns with the second rising edge of  $\phi r^*$  as in state 2–case b, then the delay sum increment (DLF+DLR) necessary to align  $\phi s div$  with  $\phi r div$  is only  $T - 2 \cdot \delta$ . Hence,  $\phi f d_{s2}$  and  $\phi r d_{s2}$  are now placed 180° out of phase with respect to  $\phi s$ .



Figure 2.5: Timing diagram of the MISC in states 1 and 2 for two typical cases. Case 'a' results in correct phase alignment of  $\phi s$  and  $\phi f d$ , whereas inverse locking occurs in case 'b'

Inverse locking occurs in state 2 (Figs. 2.2 & 2.5) because  $s \div 2$  and  $r \div 2$  can not differentiate between the first and second rising edges of  $\phi s$  and  $\phi r$ , respectively. MISC solves this problem by tri-stating the input buffer B1 for four cycles in the beginning of state 2 (snc is low) i.e., no clock signal appears at  $\phi r$ . At the end of 4 cycles, snc is made high, allowing  $\phi sdiv$  and  $\phi rdiv$  to capture the first rising edges of  $\phi s$  and  $\phi r$ , respectively. Hence, the inverse locking phenomenon in Fig. 2.5 state 2 – case b is avoided altogether. Upon first edge capturing, the lead/lag phase relation between  $\phi rdiv^*$  and  $\phi sdiv$  is latched. Later  $s \div 2$ ,  $r \div 2$  act as simple buffers (no frequency division) and delays in DLF, DLR are incremented in light of this latched decision until  $\phi sdiv$  aligns with  $\phi rdiv$ .

#### 2.4 Additional skew sources in MISC

Preceding analysis assumed matched tri-state buffers B1–B5 and frequency dividers  $s \div 2, r \div 2$ , along with jitter free input  $\phi s$  and zero dead zone phase detectors PDL, PDS. The following analysis quantify the effects of these error sources on the MISC residual skew. Using superposition, DLF and DLR are assumed to be matched with very fine resolution i.e., the factor '*lsb*' is ignored in state equations.

#### 2.4.1 Mismatch in Buffers and Frequency dividers

From Fig. 2.2 and equations (2.7), (2.8), (2.10) and (2.11), difference between the PDS assumed and the actual sum of forward and reverse path delays is given by:

$$\delta_d = (\phi f d + \phi r d) - (\phi r div - \phi s div)$$
  
=  $(t_{B4} + t_{B5} + t_{s \div 2}) - (t_{B2} + t_{B3} + t_{r \div 2})$  (2.30)

Where  $\delta_d$ , defined as the directional buffers mismatch, is induced by PVT variations. From equations (2.14) & (2.15), forward and reverse path delays at the end of state 1 are:

$$\phi f d_{s1,d} = (n-1)T + \delta \tag{2.31}$$

$$\phi r d_{s1,d} = (n-1)T + \delta \tag{2.32}$$

Accounting for the directional buffers mismatch ' $\delta_d$ ' quantified in equation (2.30), the source equalization equation (2.16) at the end of state 2 modifies to:

$$\phi r div - \phi s div + \delta_d = 2nT + \delta_d$$

$$= \phi f d + \phi r d + D_f + D_r$$
(2.33)

Where  $D_f$  and  $D_r$  are the additional delays added in DLF and DLR, respectively. For matched delay lines  $D_f = D_r = d$ , so equations (2.31), (2.32) & (2.33) yield:  $d = T - \delta + \frac{\delta_d}{2}$ . Updating path delay values i.e.,  $\phi f d_{s2,d} = \phi f d_{s1,d} + d$  and  $\phi r d_{s2,d} = \phi r d_{s1,d} + d$  gives:

$$\phi f d_{s2,d} = nT + \frac{\delta_d}{2} \tag{2.34}$$

$$\phi r d_{s2,d} = nT + \frac{\delta_d}{2} \tag{2.35}$$



Figure 2.6: Phase detector dead zone (a)  $\delta_w/2$  is the minimum phase error between  $\phi f d$  and  $\phi r d$  in state 1 (b) dead zone induced phase error  $(\delta_w/2)$  in state 2 (c) residual skew caused by PDL and PDS dead zones

Load equalization in the following state 3 cannot eliminate common mode error  $\delta_d/2'$ in equations (2.34) & (2.35). Hence, only one half of the directional buffers mismatch  $(\delta_d/2)$  appears as additional skew at MISC output  $\phi f d$ . Parameter  $\delta_d$  depends upon the width, threshold voltage and current factor mismatch of the constituent MOS devices in B2–B5,  $s \div 2$  and  $r \div 2$ . Therefore, the variance of  $\delta_d$  can be reduced by using larger sized buffers [21]. Similarly, from Fig. 2.1, buffers mismatch in the DTD topology  $(\delta'_d)$  modifies equations (2.2) and (2.3) to  $\phi f w = (n-1)T + \delta \pm \delta'_d$ and  $\phi f d = nT + \delta \pm 2 \cdot lsb \pm \delta'_d$ , respectively. Resulting into an additional skew of  $\delta'_d/2$  at the DTD output  $\phi l$  in equation (2.6). Notice that  $\delta'_d = \delta_d$  because both terms represent delay mismatch across two tri-buffers and a frequency divider.

#### 2.4.2 Phase Detector dead zone

Phase detectors PDS and PDL might go metastable if their inputs transition very close to each other. Even if sufficient time is allocated for this metastability resolution, the final output could still result into a wrong decision. From Fig. 2.6(a), for a dead zone width of  $\delta_w$ , the minimum phase difference between PD inputs to guarantee correct decision becomes  $\delta_w/2$  (setup & hold time violations can not occur simultaneously).

From Figs. 2.2 & 2.6(a), state 1 adjusts delays in DLF or DLR until the phase difference between  $\phi f d$  and  $\phi r d$  reduces to  $\delta_w/2$ , reaching the valid output limit of PDL i.e.,

$$\phi f d_{s1,w} = (n-1)T + \delta + \frac{\delta_w}{4}$$
(2.36)

$$\phi r d_{s1,w} = (n-1)T + \delta - \delta_w/4$$
(2.37)



Figure 2.7: Jitter induced skew (a) shift in the rising edges of  $\phi f d$  and  $\phi r d$  for an input jitter of  $\delta_j$  in state 1 (b) instantaneous phase error between  $\phi s div$  and  $\phi r div$  for a peak input jitter of  $\delta_j$  (c) residual skew caused by input jitter

Similarly, PDS dead zone will cause a maximum phase error of  $\delta_w/2$  to appear between  $\phi s div$  and  $\phi r div$  in state 2 (Fig. 2.6(b)). Mathematically, the analysis is similar to the last part where  $\delta_d$  in equation (2.33) is replaced with  $\delta_w/2$ , giving us:

$$\phi f d_{s2,w} = nT + \frac{\delta_w}{2} \tag{2.38}$$

$$\phi r d_{s2,w} = nT \tag{2.39}$$

The following state 3 cannot eliminate differential skew less than  $\delta_w/2$  in the above equations (2.38) & (2.39). Therefore, one half of the dead zone width ( $\delta_w/2$ ) appears as additional skew at the MISC output  $\phi f d$ . Likewise, modifying equations (2.1) & (2.3) to include the effect of phase detector dead zones in the DTD topology gives  $\phi f = nT - \delta \pm \delta_w/2 \pm lsb$  and  $\phi f d = nT + \delta \pm \delta_w/2 \pm 2 \cdot lsb$ . This results into an additional skew term  $\delta_w/2$  at the DTD output  $\phi l$  (equation (2.6)).

#### 2.4.3 Input jitter

In steady state, both  $\phi f d$  and  $\phi r d$  lie at an equal delay of approximately ' $\delta$ ' from  $\phi s$ . Now, an input jitter of ' $\delta_j$ ' in Fig. 2.7(a), would simply shift the rising edges of  $\phi f d$   $(f \to f_j)$  and  $\phi r d$   $(r \to r_j)$  without introducing phase difference between them. Consequently, the forward and reverse path delays are matched at the end of state 1, regardless of input jitter (equations (2.31) & (2.32)).

However, the MISC maintains a delay of 2nT between the source and feedback paths in state 2 i.e.,  $\phi r div - \phi s div = 2nT$  in Fig. 2.2. Therefore, any jitter in the input clock  $\phi s$  is coupled to  $\phi r div$  after a delay of  $2nT + t_{r \div 2}$ , whereas it appears with a delay of only  $t_{s \div 2}$  at  $\phi s div$ . Hence, an instantaneous peak jitter of  $\delta_j$  in  $\phi s$  could falsely align  $\phi r div$  with  $\phi s div \pm \delta_j$  (Fig. 2.7(b)). So, for the source equalization phase in state 2, the factor  $\delta_d$  in equation (2.33) is replaced with  $\delta_j$ , giving:

$$\phi f d_{s2,j} = nT + \frac{\delta_j}{2} \tag{2.40}$$

$$\phi r d_{s2,j} = nT + \frac{\delta_j}{2} \tag{2.41}$$

Therefore, a maximum of one half of the input jitter  $\delta_j/2$  appears as additional skew at the MISC output  $\phi f d$  (Fig. 2.7(c)). Notice that the steady state delay difference between  $\phi f b - \phi s$  and  $\phi f d - \phi f w$  is one cycle period in the DTD topology (Fig. 2.1). Therefore, any jitter at the input  $\phi s$  affects both phases i.e., equations (2.1) and (2.3) modify to  $\phi f = nT - \delta \pm \delta_j \pm lsb$  and  $\phi f d = nT + \delta \pm \delta_j \pm 2 \cdot lsb$ , respectively. Resulting into a peak residual skew of  $\delta_j$  at the DTD output  $\phi l$  (equation (2.6)).

From equations (2.28), (2.29), (2.34), (2.38) & (2.40), considering all additional error sources, the final residual skew at the MISC output becomes:

$$\phi fd = nT \pm 2 \cdot lsb \pm \frac{\delta_d}{2} \pm \frac{\delta_w}{2} \pm \frac{\delta_j}{2} \tag{2.42}$$

Even though MISC (and traditional synchronization schemes) suffers from additional skew due to buffers mismatch, phase detector dead zone and jitter; delay lines mismatch could far outweigh these additional error sources. However, only MISC is truly independent of the TSV delay or the mismatch between its constituent delay lines. While traditional schemes rely on perfect matching between delays lines and/or TSVs, both of which are quite elusive in a 3D die setup.

#### 2.5 circuit implementation

This section describes circuit details of the forward/reverse delays lines (DLF, DLR), the source/load time to digital converters (TDC, TDCL) and phase detectors (PDS, PDL), followed by an overview of the MISC circuit blocks.

#### 2.5.1 Delay lines

Delay lines DLF and DLR utilize NAND based 128 coarse-delay [22] and 42 fine-delay elements as depicted in Fig. 2.8. Delay is controlled through coarse 'cF[127:0]' and fine 'nF[41:0]' thermometer tuning codes, in total they provide  $128 \times 42 = 5376$  delay settings. A coarse-delay element 'CdEnt' consists of four NAND gates, including a



Figure 2.8: Delay line 'DLF' using coarse and fine delay elements, DLR is identical to DLF



Figure 2.9: Control code dependent mismatch between identical delay lines DLF and DLR using monte-carlo analysis in 65nm CMOS, with 32ps coarse code resolution (fine code set at zero). Mismatch at 1GHz (Code = 32, average delay = 1ns) is 160ps

dummy gate 'G', delay through each such element is twice the NAND gate delay. Increasing 'cF' adds additional 'CdEnt' elements in series, further incrementing delay through DLF. NAND gate acting as a digitally controlled varactor constitutes a finedelay element 'FdEnt'. Incrementing 'nF' increases node capacitance, thus adding delay in the fine-delay chain.

In 65nm CMOS, average coarse and fine delay resolutions stand at 32ps and 4ps, respectively, giving a maximum delay of 4ns through DLF or DLR. From equation (2.42), the resolution induced skew is limited to  $2 \times lsb$ . Therefore, the fine-delay range must be at least twice the coarse-delay resolution ( $2 \times 32ps$ ). However, we cannot predict whether fine tuning delay should be added or subtracted. Hence, at reset the fine delay code is set at its mid point, this increases the fine delay range to four times the coarse tuning-range ( $42 \times 4ps > 4 \times 32ps$ ).



Figure 2.10: Source time to digital converter (TDC) in die-1 (a) block digram (b) timing diagram, an identical TDC in die-2 quantifies the phase error between  $\phi f d$  and  $\phi r d$ 

Fig. 2.9 depicts PVT induced worst case code dependent mismatch between identical delay lines DLF and DLR, from monte-carlo analysis in 65nm CMOS. For the DTD topology [18], one half of the mismatch between these identical delay lines DLL1 and DLL2 will appear as additional skew at its output. For example, at 32ps coarseresolution, mismatch induced skew in the DTD topology could be as much as 80ps at 1GHz (*Code* = 32, mismatch  $\approx 160ps$ ). Alternatively, MISC is independent of this code dependent mismatch in DLF and DLR.

#### 2.5.2 Time to digital converter

MISC employs identical time to digital converters (TDC) in both dies to speed up the synchronization process. Fig. 2.10 shows a delay line based TDC which quantifies the phase error between  $\phi s div$  and  $\phi r div$  in state 2. Delay through cells 'D' is equivalent to the coarse-delay resolution in DLF (Fig. 2.8). From Fig. 2.10(b), the phase difference between  $\phi s div$  and  $\phi r div$  is represented by the pulse width 'Phs'. The signal 'Phs' is subsequently latched on the rising edges of its delayed versions P[0]–P[3]. The resulting output T[3:0] is later used by the accumulators ACMF and ACMR



Figure 2.11: Load phase detector PDL (a) block diagram (b) timing diagram, PDS is similar to PDL

to adjust their delay counts in multiples of lsb'.

For example, in Fig. 2.10(b), the TDC output '0111' points to a phase difference of '16 × *lsb*' between  $\phi sdiv$  and  $\phi rdiv$ , where delay through cell 'D' is *lsb*. Now, the accumulators ACMF and ACMR will adjust their delay count by 4 each (conversion gain =  $(^{4+4})/_{16} = 0.5$ ) in the next cycle, thereby reducing lock-in time. On the other hand if T[3:0] = '0000', then the delay count in ACMF and ACMR increments by just 1. Note that the absolute accuracy of the propagation delay through cells 'D' is not important as long as the conversion gain is less than 1. Similar to phase detectors, the TDC outputs are also multiplexed.

#### 2.5.3 Phase detector

In Fig. 2.11, output 'Pl' of flip-flop type phase detector PDL detects the lead/lag relationship between the forward ( $\phi f d$ ) and reverse ( $\phi r d$ ) path clocks. This triggers delay adjustment in DLF and/or DLR until 'Pl' starts to dither (jumps around 1 and 0), meaning  $\phi f d$  and  $\phi r d$  are in phase. Initially in state 1, the controller must choose the path which leads among  $\phi f d$  and  $\phi r d$  based on the PDL output 'Pl'. However, 'Pl' might result into a wrong decision if  $\phi f d$  and  $\phi r d$  lie within the metastability window  $\delta_w$  of the flip-flop 'FF'. Therefore PDL utilizes proximity detection, such that its output 'Meta' goes high if  $\phi f d$  and  $\phi r d$  are aligned to within 'T<sub>E</sub>'. Where  $T_E$ , the delay through cells 'E' must be greater than  $\delta_w$ . In such a scenario, delay in DLF is incremented until 'Meta' goes low, thereafter the controller equalizes forward and reverse path delays. PDS is similar to PDL. The end of each state is detected by


Figure 2.12: Detailed circuit blocks for the source equalization phase in state 2

the controller when their respective phase detector outputs start to dither.

## 2.5.4 MISC detailed circuit block

A detailed MISC block in the source equalization phase (state 2, dir = neg = '0') is shown in Fig. 2.12. The 'Ps' and 'T[3:0]' inputs determine the direction and amount of count increment in ACMF, ACMR, respectively. Initially, MISC activates only coarse tuning codes (cF, cR) while fine tuning codes are set at mid-point (nF = nR = 21). Now, the accumulators adjust delays in DLF and DLR until  $\phi sdiv$  and  $\phi rdiv$  are in phase. Notice that ACMF and ACMR are clocked at ' $\div$  div' of the input frequency ( $\phi ft$ ), where 'div' is six in state 2 and four in states 1 and 3. This is done to allow sufficient time for any delay change in DLF and DLR to reflect at the phase detector inputs (PDL or PDS) before the next change. Controller latches the PDL output in the beginning of next state 3, and performs a single coarse delay change in each of DLF and DLR. Now, if the PDL output changes from its latched value, then the equality at the beginning of state 3 is satisfied (Fig. 2.3), ending coarse tuning (C.F is set low). Thereafter, this cycle is repeated for 1 iteration (state2  $\rightarrow$  state3) using only fine delay resolution, ending the synchronization cycle.

From Figs. 2.2 & 2.3, we can skip state 1 altogether and start directly from state



Figure 2.13: Post synthesis timing simulation of the MISC architecture at 1GHz in 65nm CMOS under only delay lines mismatch (a) complete synchronization cycle of the MISC (b) zoomed in views in different states

2 in MISC. However, state 2 can only resolve the sum of delays in the forward and reverse paths, and not their difference. Therefore, state 3 must now resolve both the control code dependent mismatch between DLF and DLR and the delay mismatch in TSVf and TSVr. In this case, the phase difference between the PDL inputs in state 3 could approach 'T', necessitating additional frequency dividers ( $\div$ 2) at the PDL inputs to avoid inverse locking.

## 2.6 Results and Analysis

The first subsection gives an overview of the single MISC synchronization cycle under only delay lines mismatch. Next, we compare MISC and DTD topologies across additional error sources discussed in section IV. Finally, a comparative analysis of recent synchronization techniques is presented in the third subsection.

## 2.6.1 MISC Post Synthesis Synchronization Cycle

MISC completely eliminates any skew arising from mismatch between DLF and DLR  $(\delta_m)$ . To see this clearly in simulation, additional error sources in section IV are



Figure 2.14: Complete MISC layout implemented on a single die (planar 2D design), measuring  $140\mu m \times 58\mu m$ . TSVf and TSVr are replaced with delay lines to cover PVT induced delay variations

ignored while only  $\delta_m$  is considered. Mismatch  $\delta_m$  is introduced by constructing DLF and DLR from NAND cells with different driving strengths. Other error sources are ignored i.e., buffers B2–B5 and dividers  $s \div 2$  and  $r \div 2$  are matched ( $\delta_d = 0$ ), phase detectors PDS and PDL have zero dead zone width ( $\delta_w = 0$ ) along with jitter free input  $\phi s$  ( $\delta_j = 0$ ).

Capacitance of a TSV is its dominant delay factor [23], also open defects increase TSV propagation delay [2], [3]. So, TSVf and TSVr are modeled as tunable delay lines using NAND based course delay elements as in Fig. 2.8. Hence, the coarse/fine resolution of DLF and DLR is set at 24ps/6.1ps and 32ps/3.9ps, respectively. Also, TSVf and TSVr are tuned with a delay mismatch of about 120ps. Fig. 2.14 shows the MISC layout done in Cadence EDI using standard cells in 65nm CMOS. This MISC design consumes 2.1mW at 1GHz, while occupying an active area of  $0.0081mm^2$ (Table 2.2). The post layout netlist and timing file is simulated in ModelSim, where Fig. 2.13(a) shows a typical MISC synchronization cycle at 1GHz, followed by the zoomed in state views in Fig. 2.13(b):

- <u>A</u> Initially, the reverse path clock  $(\phi rd)$  leads the forward path clock  $(\phi fd)$  due to a delay mismatch of 120ps between TSVf and TSVr.
- B Therefore, only ACMR is enabled (enR is high, enF is low) to increment delay in DLR until  $\phi f d$  and  $\phi r d$  are aligned. This compensates delay mismatch between TSVf and TSVr, ending state 1.

- C In state 2, snc is set low for four cycles to eliminate inverse locking. Now, PDS equally increments delays in DLF and DLR until  $\phi s div$  and  $\phi r div$  are in phase. Thereafter, skew resulting from code dependent mismatch between DLR and DLR is differentially distributed across  $\phi f d$  and  $\phi r d$  with  $\phi s$  as mean.
- D MISC jumps back and forth between states 2 and 3, until conditional equality at the beginning of state 3 iteration 2 is satisfied (Fig. 2.3), ending coarse tuning (C\_F is set low). AT this point  $\phi f d$  and  $\phi r d$  are aligned to  $\phi s$  to within 24ps + 32ps = 56ps, regardless of code dependent mismatch between DLF and DLR or delay mismatch in TSVf and TSVr.
- $[\underline{E}]$  Fine tuning is activated (C\_F is set low), and the MISC repeats a single iteration of states 2 and 3. Thereafter, fine tuning is over in the third iteration of state 3 and the system enters tracking mode. The final residual skew between  $\phi f d$ and  $\phi s$  was 6ps, while it took 269 cycles to complete one MISC synchronization run.

## 2.6.2 MISC Vs DTD under worst case conditions

We now compare MISC and DTD by emulating 10571 unique simulation runs of each architecture, covering all error sources in section IV. A justification for the large number of such runs is given in the first subsection. Testbench setup is covered in the second subsection, followed by skew results in the third subsection.

#### 2.6.2.1 Why simulate a large number of fabrication runs?

Consider three uniformly distributed random variables each of length '10', bounded between (-1, 1). Fig. 2.15(a) shows a histogram of their permutation sum bounded between (-3, 3), yielding 1000 sums in total. Now, only the variable set {1,1,1} can give sum = 3, hence the probability to get sum = 3 is 1/1000. While the probability to get sum = 0 is much greater because sets such as {1,-1,0}, {0.2,0.4,-0.6} etc., all give sum = 0. Using the same analogy, for a single fabrication run of any architecture, the error sources  $\delta_m$ ,  $\delta_w$ ,  $\delta_d$  and  $\delta_j$  will most likely add destructively to deliver a small residual skew ( $sum \approx 0$ ). This is clearly evident from the measured skew of only 9.6ps (600MHz) for a single DTD fabrication run [18]. Even through the skew contribution



Figure 2.15: (a) Permutation sum of the three random variables, each taking on 10 random values bounded between (-1,1), yielding  $10^3$  sums in total (b) test flow for the MISC and DTD architectures

from metastability alone can reach  $\pm 19ps$  ( $\delta_w/2$  in equation (2.6)), where  $\delta_w = 38ps$  for a flip-flop based phase detector in 90nm CMOS. This necessitates a large number of fabrication runs to characterize peak residual skew in any architecture.

Alternatively, a testbench (Fig. 2.16) can simulate a large number of fabrication runs by sweeping each error source across its peak values, thus capturing all possible destructive or constructive interactions between such sources. Fig. 2.15(b) shows the test flow adopted in this thesis. Physical post layout gate level netlist and timing information (gate and wiring delay) of a single MISC and DTD design is extracted using Synopsys DC and Cadence EDI. This info is imported into ModelSim, where a testbench as shown in Fig. 2.16, emulates 10571 unique simulation runs of each architecture (DTD not shown). Layout for each architecture is done on a single die (planar 2D design), replacing TSVf and TSVr with delay lines.

## 2.6.2.2 Testbench setup

Testbench models TSVf and TSVr as tunable delay lines with control codes cntF[0:4] and cntR[0:4], respectively, each constructed using identical 31 coarse delay elements CdEnt (Fig. 2.8). The MISC timing file is edited to give TSVf and TSVr a delay range of  $0 \leftrightarrow T$ , this covers path mismatches due to process, temperature and TSV open defects. For example, at 1GHz input the Sim#1 set in Fig. 2.16 gives cntF=9 and cntR=17, representing TSVf and TRVr delays of  $\frac{9 \times 1ns}{31} \approx 290ps$  and  $\frac{17 \times 1ns}{31} \approx 548ps$ , respectively. Design values for the various error sources are discussed next along with



Figure 2.16: Testbench setup, emulating 10571 simulation runs of the MISC architecture. Testbench passes a non-repeating unique control code set {cntD, cntF, cntR} to MISC at the start of each synchronization run. This set controls delays through B5, TSVf and TSVr thus emulating  $\delta_d$  and  $\delta_m$ . Whereas, jitter  $\delta_j$  is emulated at every clock edge of  $\phi s$  and metastability  $\delta_w$  when PDS or PDL (not shown) inputs transition within 12ps of each other. The setup for DTD (not shown) is identical to that of MISC

their integration in the test bench.

#### • Buffers mismatch $\delta_d$

Average delay through buffers B2 - B5 in Fig. 2.2 is 27ps. The worst case value of buffers mismatch parameter  $\delta_d$  is  $\pm 22ps$  from monte carlo analysis in 65nm CMOS. Now, buffers B2–B4 and dividers  $s \div 2$  and  $r \div 2$  are constructed using identical standard cells i.e., their delays match (fixed). So to emulate  $\delta_d$  as in equation (2.30), the testbench sweeps the delay through B5 between  $5ps \leftrightarrow 49ps$ , using control code cntD[0:3] (Fig. 2.16). Where B5 is replaced by a tunable delay line constructed using 11 fine delay elements (FdEnt in Fig. 2.8). Testbench varies 'cntD' between  $0 \leftrightarrow 11$ , thus emulating  $-22ps < \delta_d < 22ps$  i.e., Sim#1 set with cntD=2 in Fig. 2.16 represents a delay of  $5ps + \frac{2\times44ps}{11} = 13ps$ in B5, emulating  $\delta_d = -14ps$ , changing to  $\delta_d = 10ps$  in Sim#2 (cntD=8). The MISC timing file is edited to strictly enforce  $-22ps < \delta_d < 22ps$ .

| Error source                                                     | Peak value                                | DTD skew              | MISC skew            |
|------------------------------------------------------------------|-------------------------------------------|-----------------------|----------------------|
| $\delta_m$                                                       | 160 <i>ps</i> @ 1GHz                      | $\delta_m/2 = 80 ps$  | 0ps                  |
| $\delta_j$                                                       | 10 ps                                     | $\delta_j = 10 ps$    | $\delta_j/2 = 5ps$   |
| $\delta_w$                                                       | 24ps                                      | $\delta_w/_2 = 12ps$  | $\delta_w/_2 = 12ps$ |
| $\delta_d$                                                       | 22ps                                      | $\delta_d/2 = 11 ps$  | $\delta_d/2 = 11 ps$ |
| $lsb_{fine}$                                                     | 4ps                                       | $1.5 \cdot lsb = 6ps$ | $2 \cdot lsb = 8ps$  |
| Peak residual skew @ all<br>frequencies, for matched<br>DLF, DLR | sum error<br>sources except<br>$\delta_m$ | $\leq 39 ps$          | $\leq 36 ps$         |
| Peak residual skew @ 1GHz, with mismatch in DLF, DLR             | sum all error<br>sources                  | $\leq 119 ps$         | $\leq 36ps$          |

Table 2.1: Summary of error sources and their relative contribution to residual skew in MISC and DTD topologies

# • Metastability $\delta_w$

Rising edge setup and hold times for a D-type flip-flop in 65nm CMOS are 12ps and 4ps, respectively. Considering worst case we use 12ps for both, hence the phase detector dead zone width becomes  $\delta_w \approx 24ps$  (setup + hold time). PDS and PDL are implemented exactly as in Fig. 2.11. However, to emulate metastability ( $\delta_w$ ) the testbench in Fig. 2.16 randomizes their outputs (PDL not shown) if their inputs transition within 12ps of each other. Alternatively,  $\delta_w$  can be relatively small for Sim#n i.e., input edges of PDS and PDL are very close triggering metastability but the random decision is actually correct, or it might peak at 24ps for Sim#(n + 1).

• Jitter  $\delta_j$ 

MISC and DTD architectures are tested with an input jitter of  $\pm 10ps$  i.e., the source clock edges at  $\phi s$  lie within 10ps of their ideal location  $\phi c$  in Fig. 2.16.

• Delay lines mismatch  $\delta_m$ 

DLF and DLR are implemented exactly as in Fig. 2.8. However, the MISC timing file is edited to introduce delay mismatch between their respective coarse delay elements, in line with  $\delta_m$  vs control code as in Fig. 2.9. In steady state (equations (2.28) & (2.29)), the MISC forward and reverse path delays approach 'nT'. In other words, neglecting delays through buffers B1–B5 gives the equalities  $t_{TSVf} + t_{DLF} \approx nT$  and  $t_{TSVr} + t_{DLR} \approx nT$ . So, reducing delays through TSVf and TSVr via testbench codes (cntF, cntR) will cause MISC to add delays in DLF and DLR i.e., to maintain lock DLF and DLR will settle at higher end of their control code ranges and vica-versa. Hence, at 1GHz input (T = 1ns) the testbench set {cntD, cntF, cntR} causes DLF and DLR to span across 0 to 32 of their respective control code ranges (delay range 0 to T), thus emulating  $0 < \delta_m < 160ps$  as in Fig. 2.9.

The timing file is edited several times so as to emulate error sources  $\delta_m$ ,  $\delta_w$ ,  $\delta_d$ and  $\delta_j$  as in actual fabrication. This behavioral change does not affect the physical netlist or functionality of any block. A summary of the skew contribution from these error sources is given in Table 2.1. Testbench forwards the control set {cntD, cntF, cntR} to the MISC at the start of each simulation run. Given the code ranges of  $0 \leq cntD \leq 11$ ,  $0 \leq cntF \leq 31$  and  $0 \leq cntR \leq 31$ , the testbench permutation yields  $11 \times 31 \times 31 = 10571$  unique simulation runs of the MISC. Where the skew from each such run is extracted between the rising edges of  $\phi c$  and  $\phi f d$ . Increasing the number of simulation runs will bring the observed peak residual skew closer to its theoretical value, at the expense of a longer simulation time and memory resources.

Note that  $t_{TSVf}$  is actually the sum of delays through TSVf and interconnects connecting TSVf to other buffers and delay lines (B1, DLF in Fig. 2.2), same is true for  $t_{TSVr}$ . Hence, the testbench control codes 'cntF' and 'cntR' also simulate PVT induced delay mismatch in these interconnects. Similarly, the code 'cntD' also simulates mismatch in interconnects associated with buffers  $B2 \rightarrow B5$ ,  $s \div 2$  and  $r \div 2$ . Likewise, testbench evaluates the DTD topology, where the set {cntD, cntF, cntR} controls delays through B2, WireF and WireR, respectively. Also, mismatch is introduced between DLL1 and DLL2 along with input jitter and phase detector metastability. A similar testing strategy is adopted in [24].

#### 2.6.2.3 Testbench skew results

DTD and MISC architectures are subjected to the same testbench control set {cntD, cntF, cntR} (Fig. 2.16), the resulting skew is discussed next.



Figure 2.17: Residual skew at 1GHz operating frequency considering matched delay lines DLF and DLR and all error sources except  $\delta_m$  (Table 2.1) in (a) DTD (b) MISC



Figure 2.18: Residual skew at 1GHz operating frequency considering mismatch between delay lines DLF and DLR and all error sources (Table 2.1) in (a) DTD (b) MISC

## • MISC vs DTD at 1GHz with $\delta_m = 0$

For a first set of simulations, the testbench setup is as discussed in the last subsection with one change, DLF and DLR are matched (identical in netlist and timing) i.e.,  $\delta_m = 0ps$  in Fig. 2.9 for both architectures. Fig. 2.17 shows the residual skew from 10571 simulation runs of the DTD and MISC topologies at 1GHz, considering all error sources in Table 2.1 except  $\delta_m$ . Observe that the peak residual skew is about 34ps and 32ps in the DTD and MISC topologies, respectively. These results are close to their ideal values depicted in Table 2.1 (39ps and 36ps for DTD and MISC, respectively), established theoretical from equations (2.6) & (2.42).

• MISC vs DTD at 1GHz with  $\delta_m \neq 0$ 

Alternatively, second set of 10571 simulations are done at 1GHz considering



Figure 2.19: Peak residual skew considering all mismatch error sources (Table 2.1) in DTD [18] and MISC topologies

all error sources in Table 2.1 with exactly the same testbench setup as in the last subsection. This includes code dependent mismatch between delay lines i.e., for the same control code = 32 (average delay  $32 \times 32$ ps $\approx 1$ ns in Fig. 2.9), DLF accumulates an additional delay of 160ps when compared with DLR. From Fig. 2.18(a), the resulting peak residual skew in the DTD topology increases to about 116ps. On the other hand, MISC is independent of such mismatch, hence its peak residual skew remains unchanged at about 32ps (Fig. 2.18(b)). Again these results agree well with skew parameter values in Table 2.1 and equations (2.6), (2.42).

# • MISC vs DTD for $250MHz \leftrightarrow 1GHz$ with $\delta_m \neq 0$

Reducing the input frequency increases the control code range of delay lines because DLF and DLR will accumulate more delay to accommodate increased input period. Hence, mismatch in DLF and DLR also increases at higher input periods (Fig. 2.9). For example contribution from  $\delta_m$  increases to 130ps at 500MHz input (Code=64,  $\delta_m \approx 260ps$  in Fig. 2.9), compared to 60ps at 1GHz.

Fig. 2.19 depicts the maximum residual skew upon synchronization considering all error sources for the two architectures as a function of the input frequency (cycle period). Each point on this graph represents the maximum residual skew obtained from 10571 simulation runs of the concerned architecture

|                       |                           | Frank Strategy                   | P · · · ·        |                       |
|-----------------------|---------------------------|----------------------------------|------------------|-----------------------|
| -                     | MISC                      | DTD [18]                         | ADDLL [19]       | CDC [25]              |
| Technology            | $65 \mathrm{nm}$          | $90 \mathrm{nm}$                 | 90nm             | $0.18 \mu \mathrm{m}$ |
| Category              | Digital                   | Digital                          | Digital          | Analog                |
| Supply                | 1V                        | 1V                               | 1V               | 1.8V                  |
| Power                 | 2.1mW@<br>1GHz            | 1.8mW@<br>600MHz                 | 3.27mW @<br>1GHz | 56mW @<br>1.5GHz      |
| Active Area $(mm^2)$  | 0.0081                    | 0.0088                           | 0.09             | NA                    |
| Frequency             | 250MHz –<br>1GHz          | $50\mathrm{MHz}-600\mathrm{MHz}$ | 300MHz –<br>1GHz | 556MHz –<br>1.5GHz    |
| Lock Time<br>(cycles) | 269 (max) - 196 (typical) | 54                               | 79               | NA                    |
| skew                  | 32ps @ 1GHz               | 116ps @<br>1GHz                  | 119ps @<br>1GHz  | 2ps* @<br>1.5GHz      |

Table 2.2: Comparison with prior art

\* If delay lines match perfectly

at a given frequency similar to Fig. 2.18. It can be observed that the residual skew in the MISC architecture is limited to around 32ps regardless the input cycle period. In comparison, the peak residual skew in the DTD architecture is 116ps at 1GHz, which increases to over seven times (234ps) the residual skew of the MISC architecture (32ps) for an input frequency of 250MHz. These results confirm that the MISC is truly independent of the delay mismatch through TSVf and TSVr or code dependent mismatch between DLF and DLR.

#### 2.6.3 Comparative Analysis

A comprehensive analysis of the various synchronization topologies is done based on error sources in Table 2.1, with the estimated skew reported in Table 2.2. The estimated skew for ADDLL [19] is slightly higher than DTD [18] mainly due to the greater number of directional buffers involved, increasing  $\delta_d$ . The analog clock-deskewing circuit (CDC) technique [25], reports a skew of just 2ps. However, it assumes perfectly matched delay lines, it also suffers from excessive power consumption. The techniques mentioned in Table 2.2 are independent of TSV delay. Other recent clock synchronization techniques [16], [17] rely on equal forward and return TSV delays, in addition to perfectly matched delay lines. Now, defects induced TSV delay mismatch could reach hundreds of picoseconds [2]. Hence, under worst case conditions, residual skew in these topologies [16], [17] would be even higher compared to the architectures reported in Table 2.2.

The DTD [18] and ADDLL [19] techniques base their reported skew on a single fabrication/simulation run. Whereas, these values in Table 2.2 reflect the maximum skew in each topology extracted from 10571 unique simulation runs which cover the full range of all error sources depicted in Table 2.1.

From Table 2.1, control code dependent mismatch between identical delay lines  $(\delta_m)$  is arguably the greatest source of residual skew in recent clock synchronization topologies. This error source ' $\delta_m$ ' is even higher in a 3D die setup given the worst thermal gradient of a 3D IC compared to its 2D counterpart [4]. Still, ' $\delta_m$ ' is set to increase further due to worsening intra-die process variation in deep sub-micron technologies [9], [20]. Nonetheless, only MISC can effectively eliminate skew arising from ' $\delta_m$ '.

Yet mismatch compensation in MISC comes at the cost of increased lock-in time, requiring 269 cycles for initial synchronization under worse case delay lines mismatch (196 cycles typical case). However, this lock time averages 122 cycles for subsequent synchronization runs in tracking mode. Also, MISC requires only two delay lines compared to three for the DTD [18] and four for the ADDLL [19]. Hence, MISC features the lowest area footprint coupled with comparatively lower power consumption.

# 2.7 TSV defects and lock time

TSV delay is susceptible to process and temperature variations [1]. Also, open defects [3], [26] increase resistance in the TSV channel leading to excessive propagation delay. Depending upon the defect severity, this increase in delay varies over a wide range, approaching 1000ps in moderate cases [2], [3]. The first subsection deals with defects in TSVf and TSVr. Defect induced delay increment in TSVp is analyzed in the second subsection. Catastrophic TSV failure is addressed next, followed by lock time



Figure 2.20: PDS delay path in the MISC source equalization phase in state 2. The delay increment in DLF and DLR (not shown) is synchronized to every six clock cycles of  $\phi ft$  (÷6)

implications in the fourth subsection.

## 2.7.1 Defects in TSVf and TSVr

A flip-flop based phase detector (PDL) can only resolve input phase difference of less than the input period 'T' i.e., it can not differentiate between  $0.1 \times T$  or  $1.1 \times T$ . Thus MISC features built in frequency dividers (÷2) in buffers B1 and B5 (Fig. 2.2), which are only active for a few cycles in state 1. These allow PDL in state 1 to compensate for defect induced delay increment of less than '2 × T' in TSVf or TSVr.

#### 2.7.2 Defects in TSVp

From MISC state 2 in Fig. 2.20, delay increment in DLF and DLR is synchronized to every six clock cycles of  $\phi ft$ . Hence, any delay change in DLF and DLR must settle and circle the loop within a period of '6 × T' i.e.,

$$t_{DLF} + t_{B2} + t_{DLR} + t_{B3} + t_{TSVr} + t_{r \div 2}$$

$$+ t_{PDS} + t_{TSVp} + t_{MUX} + t_X + t_{ACMF} < 6 \cdot T$$

$$(2.43)$$

For defect induced high delay case through TSVf and TSVr, MISC might lock with a delay of '4  $\cdot$  T' around the feedback loop in Fig. 2.2. In worst case scenario, from Fig. 2.20,  $t_{DLF} + t_{B2} + t_{DLR} + t_{B3} + t_{TSVr} = 4 \cdot T$ , using this result in equation (2.43) gives,  $t_{r\div2} + t_{PDS} + t_{TSVp} + t_{MUX} + t_X + t_{ACMF} < 2 \cdot T$ . Also for the MISC design in 65nm CMOS, typical delays through  $r \div 2$ , PDS, MUX, X and ACMF in Fig. 2.20 are  $t_{r\div 2} = 65ps$ ,  $t_{PDS} = 65ps$ ,  $t_{MUX} = 38ps$ ,  $t_X = 40ps$  and  $t_{ACMF} = 260ps$ , respectively. Using these results yield:

$$t_{TSVp} < 2 \cdot T - 468ps \tag{2.44}$$

Similar analysis for the delay through any of the T[3:0] signals  $(t_T)$  from Die-1 to Die-2 in Fig. 2.12 gives the following condition (using TDC delay of  $t_{TDC} = 0.5 \times T$ ):

$$t_T < 1.5 \cdot T - 363 ps$$
 (2.45)

Using smaller of the two delay conditions  $(t_{TSVf}, t_{TSVf} < 2 \times T \text{ or } t_{TSVp} < 2 \times T - 468ps)$ . MISC operating at 1GHz (T = 1000ps) can compensate for low to moderate defect induced delay increase of less than 1532ps in TSVp, TSVf and TSVr and upto 1137ps in T[3:0].

### 2.7.3 Process variation and catastrophic TSV failure

Delay through a TSV is closely tied to its capacitance and driver output resistance [23]. Recent advances have reduced the mean TSV capacitance to around 40 f F [27], which gives a delay of less than 300 ps using an  $8k\Omega$  driver (minimum sized standard buffer in 65nm CMOS). Hence, a suitably sized buffer can be chosen to always obey the above TSV delay limits for a given CMOS process.

In worst case, defects could induce catastrophic TSV failure [3]. In such cases, ring oscillator based detection techniques [23], [26] can be employed to judge defect severity. Following which, one of the several sparse TSV allocation techniques [28] can be utilized to cover catastrophic TSV failure.

#### 2.7.4 Lock time

The MISC lock time is unaffected as long as the TSV delay limits mentioned in the previous subsection are obeyed. As explained earlier, high delay through TSVf and TSVr forces MISC to lock with a delay of  $4 \times T$  around the feedback path in state 2 (Fig. 2.2). This necessitates a  $\div$ 6 frequency divider (Fig. 2.20), allowing for a delay room of  $6 \times T - 4 \times T = 2 \times T$  across  $r \div 2$ , PSD, TSVp, MUX, X and ACMF. However, for small delay through TSVf and TSVr, the MISC will lock with a delay of only  $2 \times T$ 

around the feedback path. In this case the frequency divider can be reduced from  $\div 6$  to  $\div 4$ , while maintaining the same delay room of  $4 \times T - 2 \times T = 2 \times T$  across  $r \div 2$ , PSD, TSVp, MUX, X and ACMF as in the high delay case. This can be achieved by appropriately selecting a  $\div 6$  or  $\div 4$  divider in state 2 based on the absolute delay in the feedback path. Similar optimization in the load equalization phase (states 1 and 3) reduces the mean lock time to 180 cycles. Alternatively, successive approximation locking techniques [29], [30] can be employed to further reduce lock time. However, they have certain limitations in terms of limited input frequency range coupled with increased circuit complexity.

#### 2.8 Conclusion

Traditional solutions to clock skew compensation in 3D ICs rely on perfect matching between their constituent delay lines and/or matched through silicon via delays in the forward and reverse paths. These assumptions are rather elusive given the increasing intra-die process variation in deep sub-micron technologies coupled with TSV defects and worse thermal gradients in a 3D IC compared to its 2D counterpart. MISC is the first to completely eliminate any skew contribution arising from control code dependent mismatch between delay lines or unequal TSV delays. A complete theoretical analysis of various skew sources was presented, agreeing well with analytical results. The proposed design is extensively tested between 250 MHz - 1 GHz, in presence of delay lines/buffers mismatch, clock jitter, TSV delay and phase detector dead zone, resulting into a peak residual skew of 32ps. Moreover, MISC is designed using only standard cells in 65nm CMOS process, occupying an area of  $0.0081mm^2$ while consuming 2.1mW at 1GHz. Chapter 3

Supply Compensated Digitally Controlled Delay Lines for 3D-IC Clock Synchronization Topologies

Tejinder Singh Sandhu and Kamal El-Sankary

Publication Pending:

#### Abstract

This work presents an on-chip auto-tuning algorithm to reduce the supply voltage sensitivity of digitally controlled delay lines constructed using identical digital buffers. The proposed algorithm tunes a compensator circuit embedded within each buffer to counterbalance the supply sensitivity of the overall delay line regardless of process or mismatch variations. A mismatch insensitive 3D-IC clock synchronization architecture (MISC) employing this delay line is fabricated in 65nm CMOS and demonstrates robust performance against the buffer supply noise at operating frequencies of 250MHz to 1GHz. The rms jitter of this supply compensated MISC design measures 3ps in the presence of a 25mV 1MHz supply noise at 1GHz operation, compared to 112.3ps for the conventional design. Whereas, the maximum residual skew between the Die-1 and Die-2 clocks measures under 30ps across the entire MISC frequency range. The complete design occupies an active area of  $0.016mm^2$  and dissipates 4.8mW at 1GHz from a 1V supply.

#### 3.1 Introduction

Digitally controlled delay lines are a key component in 3D-IC clock synchronization architectures [7], [18] which are essentially DLLs aimed at aligning clock signals across multiple dies. Such a 3D setup allows higher system integration by vertically stacking diverse digital cores on multiple dies. Yet, the propagation delay through these delay lines is dependent upon its supply voltage, which is susceptible to switching noise generated across the entire 3D tier. Moreover, nonuniform thermal and voltage drop characteristics in 3D-ICs can exacerbate supply noise when compared to their 2D counterparts [5], [6]. Hence, the jitter performance in the aforesaid 3D-IC clock architectures is often critically dependent upon the supply voltage sensitivity of their constituent delay lines.

Supply noise compensation can be divided into two broad categories. The first utilize voltage regulators [10], [11] at the cost of reduced voltage headroom, which can be particularly disadvantageous in low voltage CMOS technologies. The second involves on-chip supply noise detection to reduce the voltage sensitivity of their main delay cells. However, the literature for this second category deals almost exclusively with GRO [12] or PLL [31], [32], [13] based oscillator techniques, which can not be directly ported to single ended digitally controlled delay lines. Alternatively, clock buffer supply compensation in [33] relies on a fixed point calibration which can not track any post fabrication process or mismatch variations.

Hence, this thesis presents an auto-tuning algorithm to reduce the supply voltage sensitivity of digitally controlled delay lines regardless of process or mismatch variations. The proposed algorithm tunes a compensator circuit with opposite voltage sensitivity to that of a standard clock buffer to deliver a supply insensitive buffer, which forms the building block of the compensated delay line.

The proposed supply compensated delay line is further integrated within the MISC 3D-IC clock synchronization architecture [7], which is chosen for two main reasons. First, the MISC offers two synchronized clock paths between Die-1 and Die-2, which allows the aforesaid supply auto-tuning algorithm to run in background (blocking the forward path) while it maintains a synchronized clock between the two dies via the reverse path. Second, MISC allows a fair comparison between the proposed supply compensated delay line in the forward path vs the conventional (uncompensated) delay line in the reverse path, all while eliminating any clock skew resulting from mismatch between these delay lines.

## 3.2 Proposed Supply Compensated Delay Line

Power supply compensation in the constituent clock buffers of digitally controlled delay lines is presented first, followed by the proposed supply auto-tuning algorithm.

## 3.2.1 Clock Buffer Compensation

The uncompensated clock buffer (BUFF) has a negative supply sensitivity in Fig. 3.1(a), where supply sensitivity is the ratio of the percentage change in delay to the percentage change in supply voltage. Hence, an ideal compensator circuit with equal and inverse supply sensitivity can be attached to BUFF to deliver a supply insensitive compensated buffer (CBUFF). The CBUFF circuit in Fig. 3.1(b) utilizes MOS capacitors  $M_{CA}$  and  $M_{CB}$  in series with tunable triode transistors  $M_{RA}$  and  $M_{RB}$  to present a proportional supply  $(V_{DD})$  dependent load at the output of inverters A and B, respectively. The supply sensitivity of this compensator circuit is inversely related to that of the uncompensated inverters A and B. For example, an increase in



Figure 3.1: (a) Ideal compensation of supply voltage dependent clock buffer delay, where  $V_2 > V_1$  (b) supply compensated clock buffer (CBUFF).



Figure 3.2: (a) Compensated buffer (CBUFF) delay vs  $V_B$  at  $V_{DD} = 1V$  (b) CBUFF delay as a function of  $V_{DD}$  at specific  $V_B$  bias voltages in (a).

 $V_{DD}$  would tend to reduce the delay through A and B. Now, the compensator node  $V_B$  follows  $V_{DD}$  in phase to increment loading at the outputs of A and B by reducing the resistance offered by  $M_{RA}$  and  $M_{RB}$ , respectively, thus negating the initial delay reduction across A and B. The compensator circuit in Fig. 3.1(b) is adapted from [33].

The delay across an uncompensated buffer (BUFF) falls almost linearly with increase in  $V_{DD}$ . Yet, the compensated buffer (CBUFF) exhibits a non-linear  $V_B$  to delay relation in Fig. 3.2(a). Hence, the triode transistors  $M_{RA}$  and  $M_{RB}$  must be biased at the proper gate voltage  $V_B = V_{DD} - V_X$  to offer an equal and opposite supply sensitivity in the compensator circuit vs the uncompensated buffer (BUFF). Fig. 3.2(b) shows the CBUFF delay vs  $V_{DD}$  at three  $V_B$  bias levels A, B and C corresponding to  $V_X$  values of 0.25V, 0.44V and 0.55V, respectively. Observe that CBUFF is under compensated for the  $V_B$  bias point 'A' because the magnitude of the



Figure 3.3: Block diagram of the proposed auto-tuned supply compensated delay line.

supply sensitivity in BUFF exceeds that of the compensator circuit at 'A'. Alternatively, CBUFF is overcompensated for 'C', while optimal supply noise suppression is achieved at the bias point 'B' ( $V_X = 0.44V$ ).

The optimum bias voltage for  $V_B$  in Fig. 3.2 will inevitably change across post fabrication process or mismatch variations. Hence, an on-chip algorithm proposed in the next sub-section tunes  $I_S$  to find this optimal value of  $V_B$  across process or mismatch variations.

#### 3.2.2 Supply Auto-Tuning Algorithm

Supply sensitivity of the compensated buffer (CBUFF) in Fig. 3.2(b) switches polarity between the  $V_B$  bias points 'A' and 'C'. This fact is exploited to iteratively find the optimum  $V_B$  bias level by calibrating  $I_S$  in the proposed auto-tuned supply compensated delay line (DLF) in Fig. 3.3. Here, identical binary delay lines  $DLF_a$  and  $DLF_f$  are each constructed using 18 fine and 64 coarse CBUFF delay elements with a common  $V_B$  terminal. Where, the delay through  $DLF_a$  and  $DLF_f$  can be adjusted via their respective coarse (c) and fine (n) input terminals (see Section IV). Matched transistor pairs  $\{M_a, M_f\}$  and  $\{M_{pa}, M_{pf}\}$  are used to isolate the power supplies for  $DLF_a$  and  $DLF_f$  forcing  $V_{Xa} = V_{Xf}$ . Also,  $M_{va}$  operates in the triode region to bias



Figure 3.4: Flow chart of the proposed supply auto-tuning algorithm.

the  $DLF_a$  supply  $V_{DDa}$  at  $\approx 0.92V$  ( $G_a = 0$ ) or 1V ( $G_a = 1$ ), while the  $DLF_f$  supply remains fixed at  $V_{DDf} = V_{DD} = 1V$ . Tri-state buffers 'B' allow the supply controller (SCNT) to take over the input and delay control in  $DLF_f$  (via  $c_f/n_f$ ) and  $DLF_a$  (via  $c_a/n_a$ ) by asserting Cal = 1 during calibration.

The flow chart of the proposed supply auto-tuning algorithm is shown in Fig. 3.4. Supply calibration involves forcing  $V_{DDa} = 0.92V$ ,  $V_{DDf} = 1V$  and tuning  $I_S$  via D[4:0] to have the same delay across  $DLF_f$  and  $DLF_a$  at their maximum coarse/fine control codes. Yet, post fabrication mismatch between  $DLF_a$  and  $DLF_f$  can lead to non-optimal compensation. Hence, calibration begins by eliminating any mismatch between  $DLF_a$  and  $DLF_f$  at  $V_{DDa} = V_{DDf} = 1V(G_a = 1)$  and D[4:0] = 00000. This is achieved by tuning  $c_f/n_f$  and  $c_a/n_a$  until the phase detector (PD) encounters the same rising edge  $(E_G)$  delay across the outputs of  $DLF_a$  ( $\phi ad$ ) and  $DLF_f$  ( $\phi fd$ ). For example, if  $\phi ad$  lags  $\phi fd$ , then SCNT will subtract delay from  $DLF_a$  by decreasing  $c_a/n_a$  from their maximum values until  $\phi ad$  aligns with  $\phi fd$ .

For supply compensation  $DLF_a$  and  $DLF_f$  are powered at  $V_{DDa} = 0.92V$  and  $V_{DDf} = 1V$ , respectively, and SCNT uses binary search algorithm to resolve all 5 bits in  $I_S$  beginning with D[4]=1 i.e. '10000'. During each iteration PD detects the polarity of the supply voltage sensitivity in  $DLF_a$  and  $DLF_f$  by measuring the phase



Figure 3.5: Block diagram of the supply compensated MISC 3D-IC clock synchronization architecture, modified from [7].

relation between  $\phi ad$  and  $\phi fd$ , respectively. For example, since  $V_{DDa} < V_{DDf}$ , hence if  $\phi ad$  lags behind  $\phi fd$  ( $\phi ad > \phi fd$ ) then the supply sensitivity in  $DLF_a$  and  $DLF_f$ is negative i.e.  $V_{Ba}$  and  $V_{Bf}$  bias voltages lie above the optimum bias point 'B' in Fig. 3.2, so  $I_S$  must increase. Alternatively,  $\phi fd > \phi ad$  indicates overcompensation (positive supply sensitivity) in  $DLF_a$  and  $DLF_f$ , i.e.  $V_{Ba}$  and  $V_{Bf}$  lie below the optimum bias point 'B' hence  $I_S$  must decrease. Later, SCNT enforces  $V_{DDa} =$  $V_{DDf} = 1V$  ( $G_a = 1$ ) after all  $I_S$  bits are resolved. Thus, at equilibrium the voltages  $V_{Ba}$  and  $V_{Bf}$  are biased at the optimum point 'B' (Fig. 3.2) to achieve a near zero supply voltage sensitivity in  $DLF_a$  and  $DLF_f$ . Calibration ends by asserting Cal = 0which connects  $DLF_a$  and  $DLF_f$  in series and transfers their delay control to external inputs cF and nF to form a unified supply compensated delay line DLF.

# 3.3 Supply Auto-tuning integration in the MISC 3D-IC clock synchronization architecture

The supply compensated DLF delay line must be disconnected from its parent circuit during calibration in Fig. 3.3. The MISC 3D-IC clock architecture [7] in Fig. 3.5 offers two fully synchronized clock paths between Die-1 and Die-2. Hence, while the forward path ( $\phi s \rightarrow TSVf \rightarrow \phi fd$ ) is busy during DLF calibration, the Die-1 source clock ( $\phi s$ ) can still appear in-phase with the load clock ( $\phi rd$ ) in Die-2 via the



Figure 3.6: Flow chart of the supply compensated MISC architecture, modified from [7].

reverse path ( $\phi s \rightarrow \text{TSVr} \rightarrow \phi rd$ ) through the conventional (uncompensated) DLR delay line. Where, TSVf, TSVr and TSVp are through silicon vias between the two dies. Interestingly, MISC can synchronize load clocks ( $\phi fd$ ,  $\phi rd$ ) in Die-2 in-phase with the source clock ( $\phi s$ ) in Die-1 regardless of delay mismatch between DLF and DLR or defect induced delay disparity between TSVf and TSVr [34], [35]. Therefore, MISC is an ideal test application for evaluating the proposed supply compensated DLF against the conventional DLR delay line (see Section IV). Additionally, the MISC utilizes five tri-state buffers (B1-B5), source (PDS) and load (PDL) phase detectors, frequency divider ( $s \div 2$ ,  $r \div 2$ ) and a controller (CNT) to achieve synchronization. A brief summary of the MISC operation is presented.

A complete system flow chart of the supply auto-tuning algorithm integrated within the MISC clock synchronization architecture is shown in Fig. 3.6. The autotuning algorithm finishes supply compensation in DLF by asserting Cal = 0 which signals the MISC controller (CNT) to begin clock synchronization. The MISC operation is divided into states which either perform load equalization (align  $\phi fd$  with  $\phi rd$ when dir = 1) or source equalization (align  $\phi sdiv$  with  $\phi rdiv$  when dir = 0). The goal is to align  $\phi s$  with  $\phi fd$  and  $\phi rd$  regardless of delay mismatch between DLF and DLR or TSVf and TRVr. This is achieved by jumping back and forth between the source and load equalization states as described further.

#### 3.3.1 State 1: load equalization (dir=1, snc=1)

The PDL detects any mismatch between the forward  $(\phi f d)$  and reverse paths  $(\phi r d)$  resulting from delay disparity between TSVf and TSVr, and instructs CNT to add delay to the path that leads. Hence, if  $\phi f d$  leads  $\phi r d$  then delay is incremented in DLF until  $\phi f d$  aligns with  $\phi r d$ , and vica-versa. Hence at the end of state 1,  $\phi f d$  and  $\phi r d$  lie at a mean delay of ' $\delta$ ' from the source  $\phi s$  and are aligned to within the *lsb* delay resolution in DLF and DLR, i.e.

$$\phi f d_{s1} = (n-1)T + \delta \pm \frac{1}{2} \cdot lsb$$
 (3.1)

$$\phi r d_{s1} = (n-1)T + \delta \mp \frac{1}{2} \cdot lsb \tag{3.2}$$

Where, 'T' is the input clock period and 'n' is an integer.

## 3.3.2 State 2: source equalization (dir=0, snc=1)

The source and feedback paths are enabled and the PDS instructs CNT to increment equal delays in DLF and DLR until  $\phi s div$  is aligned with  $\phi r div$  to within  $2 \cdot lsb$  delay resolution:

$$\phi r div - \phi s div = 2nT \pm 2 \cdot lsb$$

$$= \phi f d_{s1} + \phi r d_{s1} + D_f + D_r$$
(3.3)

Where,  $D_f$  and  $D_r$  are the incremental delays added in DLF and DLR during state 2, respectively, and the 2nT factor results from the use of frequency dividers  $s \div 2$  and  $r \div 2$ . Mismatch between DLF and DLR will result into  $D_f \neq D_r$ . Hence, a mean delay ' $d_2$ ' added in state 2 with 'm' percent mismatch between DLF and DLR yields  $D_f = d_2 \cdot (1 \pm m/200)$  and  $D_r = d_2 \cdot (1 \mp m/200)$ , respectively. Using these results with (3.1), (3.2) and (3.3) gives:

$$d_2 = T - \delta \pm lsb \tag{3.4}$$

Now, the forward and reverse delays at the end of state 2 are given by  $\phi f d_{s2} = \phi f d_{s1} + D_f$ and  $\phi r d_{s2} = \phi r d_{s1} + D_r$ . Using these results with (3.1), (3.2) and (3.4) yields:

$$\phi f d_{s2} = nT \pm lsb + \alpha \tag{3.5}$$

$$\phi r d_{s2} = nT \pm lsb - \alpha \tag{3.6}$$

Where the additional skew  $\alpha = \pm^{lsb/2} \pm d_2 \cdot m/200$  results from the control code dependent mismatch between DLF and DLR. From Fig. 3.6, MISC jumps back and forth between the source and load equalization states which further reduces the mismatch induced skew ( $\alpha$ ) between  $\phi f d$  and  $\phi s$  by 'm/200' at every iterative step. So, the  $k^{th}$  iteration of sate 2 yields:

$$\phi f d_{s2_k} = nT \pm \left(\frac{m}{200}\right)^{2(k-1)} \cdot (\alpha \pm lsb)$$
(3.7)

$$\phi r d_{s2_k} = nT \mp \left(\frac{m}{200}\right)^{2(k-1)} \cdot (\alpha \pm lsb) \pm 2 \cdot lsb \tag{3.8}$$

To align  $\phi s$  in-phase with  $\phi f d$  and  $\phi r d$  the skew factor  $\left(\frac{m}{200}\right)^{2(k-1)} \cdot (\alpha \pm lsb)$  in (3.7) and (3.8) must be smaller than  $2 \cdot lsb$ . Using (3.4), the aforesaid condition is simplified to:

$$k > \frac{1}{2} + \frac{1}{2} \cdot \frac{\log\left(\frac{2 \cdot lsb}{T - \delta}\right)}{\log\left(\frac{m}{200}\right)} \tag{3.9}$$

Assuming a worst case mismatch of 50% between DLF and DLR (m = 50) and the lowest MISC operating frequency of 250MHz (T = 4000ps) with  $\delta = 0$ , (3.9) yields k > 2.7 for lsb = 5.4ps in DLF and DLR. Hence, k = 3 is chosen for this work where the MISC jumps back and forth three times to align  $\phi s$  in-phase with  $\phi fd$  and  $\phi rd$  to compensate for delay mismatch between DLF and DLR or between TSVf and TSVr.

#### 3.4 Circuit Implementation

The phase detectors PD, PDS and PDL in Figs. 3.3 and 3.5 are all based on the same 'D' flip-flop implementation in [7]. Circuit details of the tunable delay lines DLF and DLR are discussed next, followed by timing overview of the complete supply compensated MISC architecture.

#### 3.4.1 Delay Lines DLF and DLR

The MISC DLF delay line is constructed from two supply compensated binary delay lines  $DLF_a$  and  $DLF_f$  in Fig. 3.3. Each of  $DLF_a$  and  $DLF_f$  further utilize 64 coarse CBUFF (Fig. 3.1(b)) and 18 fine delay elements composed of a simple MOS capacitor paired with a switch in Fig. 3.7. The digital code 'c' controls the number



Figure 3.7: Supply compensated binary delay line  $DLF_a$  consisting of coarse and fine delay elements,  $DLF_f$  is identical to  $DLF_a$ .



Figure 3.8: The conventional (uncompensated) binary delay line DLR.

of CBUFF delay elements in series between IN and OUT via the supply compensated multiplexer S-MUX (similar in principle to CBUFF). For example, 'c = 000000' gives a delay of  $6 \cdot T_{MUX}$ , while 'c = 100001' increases this IN to OUT propagation delay to  $6 \cdot T_{MUX} + 33 \cdot T_{CBUFF}$ . Where,  $T_{MUX}$  and  $T_{CBUFF}$  are the propagation delays through S-MUX and CBUFF, respectively. Similarly, the thermometer code 'n' determines the number of 'ON' or 'OFF' fine delay elements. Increasing 'n' increments loading at the outputs of S-MUX which further increases IN to OUT propagation delay and vica-versa. Together c[5:0] and n[17:0] provide  $64 \times 18 = 1152$  possible delay settings in each of  $DLF_a$  and  $DLF_f$ . MISC synchronization (Cal=0) connects  $DLF_a$  in series with  $DLF_f$ , which increases these delay settings to  $128 \times 36 = 4608$  in the unified DLF.

The binary delay line DLR in Fig. 3.8 utilizes 128 uncompensated coarse (DEL) and 36 fine delay elements. Where, the binary coarse (cR[6:0]) and fine thermometer (nR[35:0]) digital control codes provide  $128 \times 36 = 4608$  possible delay settings in DLR similar to the unified DLF discussed earlier. Additionally, DLR forwards the clock period detector [18] output E[6:0] to the MISC controller (CNT) in Fig. 3.5 to narrow down the maximum delay range needed in DLF and DLR at a given operating frequency.

Simulations in 65nm CMOS give a delay of 40ps for the uncompensated DEL coarse delay element in DLR. Whereas, delay through the supply compensated CBUFF



Figure 3.9: Timing diagram of the supply compensated MISC architecture.

coarse element in DLF can range between 40ps - 50ps across the full  $I_S$  control range in Fig. 3.3. Both DLF and DLR have the same fine delay resolution of lsb = 5.6ps. This gives a maximum delay of about  $128 \times 40ps = 5.12ns$  in DLR and a delay range of  $5.12ns \leftrightarrow 6.4ns$  in DLF. Observe that the maximum delay mismatch of 1.28ns between DLF and DLR is well within the 50% tolerance range of the MISC architecture as discussed in Section III.

### 3.4.2 Overview of the supply compensated MISC architecture

The timing diagram of the supply compensated MISC architecture in Fig. 3.9 consists of two broad sections, supply auto-tuning in the DLF delay line followed by the MISC synchronization cycle. Upon reset (rst), the supply controller (SCNT) in Fig. 3.3 asserts Cal = 1 indicating the start of the supply auto-tuning in DLF. This bifurcates DLF into  $DLF_f$  and  $DLF_a$  with  $V_{DDa} = V_{DDf} = 1V$  ( $G_a = 1$ ) and maximizes their control codes to  $c_f = c_a = 63$  and  $n_f = n_a = 18$ . Additionally, the  $I_S$  control code is set to D = 00000, thus inducing negative supply sensitivity in  $DLF_f$  and  $DLF_a$ . Next, supply compensation in the MISC topology is best described by following the timing instances 'A' to 'J' in Fig. 3.9.

- A Initially,  $\phi ad$  is out of phase with  $\phi fd$  due to the post-fabrication mismatch between  $DLF_f$  and  $DLF_a$ . In this case  $\phi ad$  lags behind  $\phi fd$ .
- B Hence, SCNT in Fig. 3.3 finishes mismatch compensation by subtracting delay from  $DLF_a$  ( $n_a$  goes from  $18 \rightarrow 13$ ) until  $\phi ad$  aligns with  $\phi fd$ .

- C Supply compensation begins by making  $G_a = 0$  which forces  $V_{DDa}$  and  $V_{Ba}$  to drop by 80mV against  $V_{DDf}$  and  $V_{Bf}$ , respectively. Hence,  $\phi ad$  lags behind  $\phi fd$ because of the negative supply sensitivity in  $DLF_a$  and  $DLF_f$  at D = 00000(i.e. under-compensation).
- D Next, SCNT sets the MSB  $I_S$  control bit to '1' i.e. D = 10000. This however induces positive supply sensitivity in  $DLF_a$  and  $DLF_f$  (i.e. overcompensation) because  $\phi f d$  lags  $\phi a d$  even through  $V_{DDf} > V_{DDa}$ . Hence, SCNT resets D[4] to '0' and continues this process to resolve the remaining  $I_S$  control bits.
- E Optimum compensation is achieved when the final  $I_S$  code D = 01101 gives the same delay through  $DLF_f$  and  $DLF_f$  (i.e.  $\phi fd$  aligns with  $\phi ad$ ) even though  $V_{DDf} > V_{DDa}$ . Finally, SCNT signals the end of the supply auto-tuning and the beginning of the MISC synchronization cycle by enforcing Cal = 0 and  $G_a = 1$  which also unifies  $DLF_f$  and  $DLF_a$  into DLF.
- F Initially, the MISC reverse path clock  $(\phi rd)$  is out of phase with the forward path clock  $(\phi fd)$  due to delay mismatch between TSVf and TSVr in Fig. 3.5. In this case  $\phi rd$  leads  $\phi fd$ .
- G Therefore, the MISC controller (CNT) begins coarse tuning by incrementing delay in DLR (cR goes from  $0 \rightarrow 3$ ) until  $\phi f d$  aligns with  $\phi r d$ . Thus, state 1 ends by eliminating delay mismatch between TSVf and TSVr.
- H In state 2, PDS signals CNT to equally increment delays in DLF and DLR until  $\phi sdiv$  is aligned with  $\phi rdiv$ . The resulting control code dependent mismatch between DLF and DLR appears as a differential skew ' $\pm \alpha$ ' distributed across  $\phi fd$  and  $\phi rd$  with  $\phi s$  as mean.
- I MISC jumps back and forth twice between the source and load equalization states to reduce ' $\alpha$ ' to less than the sum of coarse delay resolutions in DLF ( $\approx 45ps$ ) and DLR (40ps), ending coarse tuning. At this point  $\phi fd$  and  $\phi rd$  are aligned to  $\phi s$  to within 40ps + 45ps = 85ps, regardless of code dependent mismatch between DLF and DLR or delay mismatch in TSVf and TSVr.
- [J] MISC activates fine tuning for the third iteration of the source  $(2_3)$  and load  $(3_3)$  equalization states. Finally, both the supply compensated forward path  $(\phi f d)$  and the uncompensated reverse path  $(\phi r d)$  in Die-2 are aligned to the

| Clock IN                                                                                                    |  |
|-------------------------------------------------------------------------------------------------------------|--|
| DS<br>134µm<br>Output<br>Buffers<br>CNT<br>TSVF DLR<br>DLF<br>DLF<br>DLF<br>DLF<br>DLF<br>DLF<br>DLF<br>DLF |  |
| 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2                                                                       |  |

Figure 3.10: Die photo of the supply compensated MISC architecture.

source clock ( $\phi s$ ) in Die-1 to within the sum of the fine delay resolutions in DLF and DLR i.e. 5.4ps + 5.4ps = 10.8ps.

The original MISC topology [7] is enhanced in this work to include binary search algorithm which reduces its lock time to 176 cycles at the maximum operating frequency of 1GHz. Whereas, the supply controller (SNT) in Fig. 3.3 has a lock time of 16 cycles and operates at a maximum frequency of  $1GHz \div 1000 = 1MHz$ . The supply auto-tuning and the MISC synchronization cycles form a loop, where the DLR control codes 'nR' and 'cR' remain latched during the supply auto-tuning cycle. Hence, after the first synchronization on power-up any subsequent DLF supply auto-tuning iterations can occur in background while the MISC source clock ( $\phi s$ ) in Die-1 remains synchronized to the load clock ( $\phi rd$ ) in Die-2 via the reverse path through DLR in Figs. 3.5 and 3.9.

## 3.5 Results

Die photo of the supply compensated MISC architecture fabricated in 65nm CMOS is shown in Fig. 3.10, with an active area of  $0.016mm^2$ . The complete design is fabricated on a single die, where the Die-1 and Die-2 interconnects TSVf and TSVr in Fig. 3.5 are implemented as tunable delay lines with a 100ps to 1.2ns delay range. Recent high performance through silicon vias with  $\approx 100 fF$  capacitance exhibit data transfer delays on the order of 150ps [36]. This delay is expected to fall further by



Figure 3.11: Simulated DLF supply sensitivity against the code in  $I_S$  for (a) different process corners at 27°C (b) temperature variation in the TT corner. The marker on each curve indicates the  $I_S$  code resolved by the proposed supply auto-tuning algorithm.

employing air-gap insulation TSVs with capacitance as low as 25 fF [37]. As such, DLF and DLR accumulate significant more delay than TSVf, TSVr or the tri-state buffers B1-B5 considering the peak synchronized MISC operation at 1GHz. Hence, the first subsection focuses primarily on the supply noise rejection performance of DLF and DLR (set at their highest control codes), while the delay through TSVf and TSVr is set to their minimum 100ps. Whereas, the next subsection quantifies the MISC performance in eliminating control code dependent mismatch between the supply compensated DLF and conventional DLR or delay mismatch between TSVf and TSVr.

#### 3.5.1 Supply noise compensation

Figs. 3.11 (a) and (b) show the simulated DLF supply sensitivity (at maximum cF, nF) against the code in  $I_S$  for difference process corners and temperatures, respectively. Where, the marker on each curve indicates the  $I_S$  code resolved by the auto-tuning algorithm for the respective process corner and temperature. Note that in each case the proposed algorithm resulted into near zero supply sensitivity in DLF.

The MISC topology in Fig. 3.5 offers two synchronized clock paths between Die-1 and Die-2. For comparative analysis, the forward path ( $\phi f d$ ) is compensated for supply variations in DLF, whereas the reverse path ( $\phi r d$ ) represents conventional supply dependent delay through DLR. To evaluate noise performance, a 25mV noise tone is added to the supply voltages ( $V_{DD}$ ) of DLF and DLR at modulation frequencies between 1kHz and 100MHz, where both DLF and DLR share the same  $V_{DD}$ . Figs.



Figure 3.12: Measured oscilloscope jitter histogram of the conventional (uncompensated) reverse path ( $\phi rd$ ) with 1MHz supply noise at 1GHz MISC operation.

3.12 and 3.13 show the measured jitter histograms of the conventional reverse path  $(\phi rd)$  and the proposed supply compensated forward path  $(\phi fd)$ , respectively, against a 1MHz noise tone. Observe that the measured rms jitter is improved from 112.3ps to just 3.0ps for MISC operation at 1GHz.

The DLF  $I_S$  code D[4:0] in Fig. 3.3 can be tuned manually to evaluate the performance of the on-chip supply auto-tuning algorithm. The measured rms jitter of the supply compensated forward path ( $\phi f d$ ) against the  $I_S$  code at 1GHz MISC operation is shown in Fig. 3.14(a). The auto-tuning algorithm converges to '10111' which is just one *lsb* away from the optimum '11000' found using manual calibration at 1kHz and 1MHz noise tones. However, quiet supply operation reveals that the jitter performance worsens with increasing  $I_S$  codes. This mainly results from the inclusion of the compensator circuit in Fig. 3.1(b) which introduces additional jitter in CBUFF. Fig. 3.14(b) shows the measured static supply sensitivity in DLF against the code in  $I_S$  D[4:0]. Observe that the auto-tuning code '10111' resolved by the on-chip algorithm results into near zero supply sensitivity in DLF.

Figs. 3.15(a) and (b) show the rms jitter performance of the supply compensated forward path ( $\phi f d$ ) and the conventional reverse path ( $\phi r d$ ) at different supply noise modulation and MISC operating frequencies, respectively. The conventional ( $\phi r d$ ) rms jitter falls at noise modulation frequencies above 1MHz mainly due to the effect



Figure 3.13: Measured oscilloscope jitter histogram of the supply compensated forward path ( $\phi f d$ ) with 1MHz supply noise at 1GHz MISC operation.



Figure 3.14: (a) Measured rms jitter of the supply compensated forward path ( $\phi f d$ ) at 1GHz MISC operation, (b) measured static supply sensitivity in DLF.

of parasitic decoupling capacitors between the PCB power and ground planes. Notice that the proposed compensated path ( $\phi f d$ ) consistently outperforms the conventional path ( $\phi r d$ ) in the presence of supply noise.

## 3.5.2 MISC clock synchronization

The supply compensation  $I_S$  code (D[4:0]) also changes the DLF propagation delay because the delay through its constituent CBUFF elements in Fig. 3.2(b) varies across  $I_S$  induced  $V_B$  bias levels 'A', 'B' and 'C'. The resulting measured delay mismatch between DLF and DLR (set at their maximum control codes) vs the  $I_S$  code D[4:0]



Figure 3.15: Measured rms jitter of the supply compensated forward  $(\phi f d)$  and the conventional reverse  $(\phi r d)$  paths vs (a) supply noise frequency at 1GHz MISC operation (b) MISC frequency with 1MHz supply noise.



Figure 3.16: Measured delay mismatch between DLF and DLR vs the code in  $I_S$  D[4:0].

is shown in Fig. 3.16, where the optimum code '11000' results into a 665*ps* delay mismatch. Uniquely, the MISC topology can eliminate this mismatch between its delay lines DLF and DLR which otherwise causes additional clock skew in former 3D-IC clock synchronization architectures [18], [38].

Figs. 3.17(a) and (b) show the unsynchronized Die-1 source clock ( $\phi s$ ) against the Die-2 forward ( $\phi f d$ ) and reverse ( $\phi r d$ ) clock waveforms at 1*GHz*, respectively, where the delay mismatch between TSVf and TSVr is set to about 300*ps*. Subsequent MISC synchronization (supply auto-tuning followed by skew compensation) aligns  $\phi s$ with  $\phi f d$  and  $\phi r d$  at residual skews of only 5*ps* and 10.5*ps* in Figs. 3.18(a) and (b), respectively. Figs. 3.19 (a) and (b) show the range of measured residual forward ( $\phi f d$ ) and reverse ( $\phi r d$ ) path skews against the source clock ( $\phi s$ ), respectively, for a



Figure 3.17: Measured oscilloscope clock waveforms before MISC synchronization at 1GHz (a)  $\phi s$  in Die-1 vs  $\phi f d$  in Die-2 (b)  $\phi s$  in Die-1 vs  $\phi r d$  in Die-2.



Figure 3.18: Measured oscilloscope clock waveforms after MISC synchronization at 1GHz (a)  $\phi s$  in Die-1 vs  $\phi fd$  in Die-2 (b)  $\phi s$  in Die-1 vs  $\phi rd$  in Die-2.

0ns to 1.1ns delay mismatch between TSVf and TSVr at different MISC operating frequencies. Observe that the maximum forward  $(\phi f d)$  and reverse  $(\phi r d)$  path skews are limited to under 30ps across the entire MISC frequency range.

Supply compensation slightly worsens the rms forward path  $(\phi fd)$  jitter when compared to the uncompensated reverse path  $(\phi rd)$  especially at higher MISC frequencies under quiet supply operation in Fig. 3.20(a). Nonetheless, the forward path  $(\phi fd)$  jitter consistently outperforms the reverse path  $(\phi rd)$  in the presence of supply noise in Fig. 3.15. A comparison of the measured DLF and DLR power with the total power consumption of the supply compensated MISC architecture is shown in Fig. 3.20(b). Here, supply compensation in DLF incurs an area and power overhead of about 58% and 32%, respectively, over the conventional DLR delay line. A summary of these measurement results is included in Table 3.1.



Figure 3.19: Measured residual skew between (a) the source clock  $\phi s$  is Die-1 and the forward clock  $\phi f d$  is Die-2 (b) the source clock  $\phi s$  is Die-1 and the reverse clock  $\phi r d$  is Die-2.



Figure 3.20: (a) Measured rms jitter through the forward  $(\phi f d)$  and reverse  $(\phi r d)$  MISC clock paths with quiet supply (b) measured power consumption in the supply compensated DLF and conventional DLR delay lines against the total power at different MISC frequencies.

## 3.6 Conclusion

An on-chip auto-tuning algorithm is presented to achieve supply noise compensation in digitally controlled delay lines regardless of post-fabrication process or mismatch variations. The compensated delay line incurs an area and power overhead of 58% and 32%, respectively, and measures an rms jitter of 3ps compared to 112.3ps for the conventional (uncompensated) delay line at 1GHz operation with a 25mV 1MHzsupply noise. These compensated and conventional delay lines are further incorporated inside the forward and reverse inter-die clock paths, respectively, of the MISC

| Process                                               | 65nm CMOS                                        |
|-------------------------------------------------------|--------------------------------------------------|
| Voltage                                               | 1V                                               |
| Active area                                           | $0.016mm^{2}$                                    |
| Total power                                           | 4.8mW @ 1GHz                                     |
| Frequency range                                       | 250MHz - 1GHz                                    |
| Supply auto-tuning lock time                          | 16 cycles @ 1MHz                                 |
| MISC lock time                                        | 176 cycles                                       |
| Maximum skew                                          | $30\mathrm{ps}$                                  |
| Forward path $(\phi f d)$ jitter at quiet supply      | 2.65 ps rms @ 1 GHz                              |
| Forward path $(\phi f d)$ jitter at 1kHz supply noise | $3.6 \mathrm{ps} \mathrm{~rms} @ 1 \mathrm{GHz}$ |
| Forward path $(\phi f d)$ jitter at 1MHz supply noise | $3.0 \mathrm{ps} \mathrm{rms} @ 1 \mathrm{GHz}$  |

Table 3.1: Summary of the MISC Measurement Results

3D-IC clock synchronization architecture. Uniquely, the MISC topology allows supply auto-tuning to run in background while maintaining an inter-die synchronized clock via the reverse path, and measures a maximum residual skew of 30ps across its entire 250MHz to 1GHz operating range.
Chapter 4

# Beyond Rail-to-Rail Compliant Current Sources for Mismatch Insensitive Voltage to Time Conversion

Tejinder Singh Sandhu and Kamal El-Sankary

© 2018 IEEE Reprinted, with permission, from:

T. S. Sandhu and K. El-Sankary, "Beyond Rail-to-Rail Compliant Current Sources for Mismatch-Insensitive Voltage-to-Time Conversion," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems. doi: 10.1109/TVLSI.2018.2844728

#### Abstract

This work presents beyond rail-to-rail (BR2R) compliant cascode current sources to markedly improve the compliance voltage and output impedance of the extensively used wide-swing cascode current source. The proposed BR2R sources employ a bootstrapping technique to linearly charge a load capacitor to beyond the supply rails  $V_{dd}$ or gnd while maintaining an improved output impedance over an equivalent wideswing cascode source. A process and mismatch insensitive differential voltage to time converter (DVT) employing these BR2R sources is fabricated in 65nm CMOS and occupies  $0.021mm^2$ , while dissipating  $47\mu W$  at 1V. The measured BR2R DVT SNDR is 50.2dB, compared to 38.7dB for the wide-swing cascode based DVT within a 2MHzinput bandwidth. The DVT achieves a CMRR of 35.1dB for a 0.4V to 0.6V input common-mode range.

#### 4.1 Introduction

Switched current sources are widely used in applications ranging from ramp generators to PLL charge pumps. The wide-swing cascode source in Fig. 4.1(a) has a compliance voltage as high as  $V_{dd} - 2V_{ov}$  or as low as  $2V_{ov}$  for the cascode sink, where  $V_{dd}$  and  $V_{ov}$  are the supply and MOSFET overdrive voltages, respectively. Often the dynamic range of aforesaid applications is critically dependent upon the compliance voltage of their constituent current sources. For example, a ramp based voltage to time converter in Fig. 4.1(b) has an input dynamic range of  $V_{dd} - 2V_{ov}$ , similarly, the charge pump output dynamic range is limited to  $V_{dd} - 4V_{ov}$  in Fig. 4.1(c).

Although PVT compensated [39], [40] currents sources have been reported earlier, nonetheless their compliance voltages lie well within the supply rails. Hence, this chapter brief presents a beyond rail-to-rail (BR2R) current source to deliver a compliance voltage as high as  $V_{dd} + V_{th}$ , while maintaining a higher output impedance than an equivalent wide-swing cascode source, where  $V_{th}$  is the MOSFET threshold voltage.

One of the BR2R applications is the voltage to time converter (VTC) in Fig. 4.1 (b). Prior wide-swing cascode based ramp generator [41], [42] or capacitive discharge [43], [44] VTC techniques are difficult to implement as a fully differential topology due to inevitable mismatch in the ramp slopes or capacitors of their respective differential



Figure 4.1: Cascode current source and its applications (a) wide-swing cascode source (b) ramp based voltage to time converter with  $V_{dd} - 2V_{ov}$  input dynamic range (c) PLL charge pump with  $V_{dd} - 4V_{ov}$  output dynamic range.

halves. Other current starved inverter [45] & [46] based VTC techniques exhibit a non-linear inversely proportional input to delay conversion relation.

Hence, a process and mismatch insensitive differential voltage to time converter employing the proposed BR2R source/sink pair is presented. The proposed design incorporates three calibration loops to achieve process and mismatch immunity and can even tolerate mismatch between its input sampling capacitors without sacrificing common-mode rejection ratio or the conversion gain.

#### 4.2 Proposed BR2R Current Source

The proposed BR2R current source in Fig. 4.2(a) achieves beyond rail-to-rail compliance by isolating its cascode transistors  $M_A$  and  $M_B$  from the load capacitor  $C_L$ using a slave transistor  $M_C$  (matched to  $M_A$ ) and a bootstrapping capacitor  $C_{PL}$ . For comparison, a wide-swing current source with transistors matched to the BR2R cascode pair  $\{M_A, M_{AS}\}, \{M_B, M_{BS}\}$  is depicted in Fig. 4.2(b). Transient simulation illustrating the charging of matched load capacitors  $C_L$  and  $C_{LS}$  via the two current sources is shown in Fig. 4.2 (c).

Initially, both  $C_L$  and  $C_{LS}$  are discharged making  $V_L = V_{LS} = 0V$ , also  $T_S = 1$ replenishes the charge across  $C_{PL}$  and precharges the BR2R nodal voltages  $V'_L$  and  $V_m$  to gnd and  $V_{dd} = 1V$ , respectively. When  $T_s$  goes low at  $t_0$ , the BR2R cascode  $(M_A, M_B)$  will force its current  $I_L$  into the slave transistor  $M_C$ . In doing so, the node  $V_m$  in Figs. 4.2(a) & (c) will jump to a voltage necessary to equalize the sourceto-gate voltages in  $M_A$  and  $M_C$ , while  $V'_L$  follows  $V_m$  to maintain charge continuity across  $C_{PL}$  i.e.

$$V_{m,t_0} = V_{dd} + V_C - V_A \tag{4.1}$$



Figure 4.2: (a) Proposed beyond rail-to-rail (BR2R) cascode current source (b) traditional wide-swing cascode current source (c) transient voltage and current waveforms illustrating the charging of  $C_L$  and  $C_{LS}$  in (a) and (b), respectively, for  $V_{dd} = 1V$ . Here,  $V_{LS,t_1}$  and  $V_{L,t_2}$  are the compliance voltages for the wide-swing cascode and the BR2R current sources, respectively.

$$V'_{L,t_0} = V_C - V_A \tag{4.2}$$

Where,  $V_{m,t_0}$  and  $V'_{L,t_0}$  are the voltages at nodes  $V_m$  and  $V'_L$ , respectively, at the time instant  $t_0$ . Now, all the BR2R transistors  $(M_A, M_B, M_C)$  are biased in the saturation region. Hence, the nodes  $V_L$  and  $V'_L$  start charging, where their charging rates are in ratio  $1 : {}^{C_L}/{}^{C_{PL}}$ , respectively. Whereas,  $V_m$  remains relatively constant as in (4.1) so as to maintain the same current in  $M_A$  and  $M_C$ .

The charging load nodes  $V_L$  and  $V_{LS}$  in the two matched topologies will overlap until the wide-swing cascode enters the triode region at  $t_1$ . Hence,  $V_{LS,t_1} = V_B + V_{th}$ the nodal voltage  $V_{LS}$  at  $t_1$  is the compliance voltage of the wide-swing cascode current source. Alternatively, the BR2R source remains in the saturation region beyond  $t_1$ because  $V_C$  is biased much higher than  $V_A$  or  $V_B$ . Now, for an appropriate  $C_L/C_{PL}$ ratio  $V'_L$  will charge at a slower rate than  $V_L$ , thus forcing  $M_C$  to enter the triode region at  $t_2$  before  $M_B$  i.e.

$$V_{L,t_2} = V_C + V_{th} (4.3)$$

$$V'_{L,t_2} = V'_{L,t_0} + (V_{L,t_2}) \cdot {}^{C_L}\!/_{C_{PL}}$$
(4.4)

Where  $V'_{L,t_2}$  is the voltage at node  $V'_L$  at the time instance  $t_2$ , and  $V_{L,t_2}$  the nodal voltage  $V_L$  at  $t_2$  is the compliance voltage of the proposed BR2R cascode current source.

Looking back, the rising  $V_L$  and  $V'_L$  nodes risk pushing  $M_C$  and  $M_B$  into the triode region, respectively. However, bootstrapping can be achieved only if  $M_C$  enters the triode region before  $M_B$  i.e.  $V'_{L,t_2} < V_B + V_{th}$  must be ensured, which using (4.2), (4.3) and (4.4) gives the following condition.

$$\frac{C_{PL}}{C_L} > \frac{V_C + V_{th}}{V_A + V_B - V_C + V_{th}}$$
(4.5)

So, the BR2R compliance voltage can be set higher by increasing  $V_C$  as in (4.3), this however requires a larger bootstrapping capacitor  $C_{PL}$  to satisfy (4.5). Then again, a larger  $C_{PL}$  boosts the BR2R's output impedance as explained further.

#### 4.2.1 Compliance Voltage

The cascode simulations in Fig. 4.2 (c) are done with  $V_A = 0.625V$ ,  $V_B = 0.5V$ ,  $V_C = V_{dd} = 1V$  and  $V_{th} \approx 0.25V$ , which gives a compliance voltage of just  $V_{LS,t_1} = 0.75V$  for the wide-swing cascode current source. For the BR2R cascode, choosing  $C_{PL}/C_L = 3.5$  satisfies (4.5), thus allowing the load capacitor  $C_L$  to linearly charge above  $V_{dd}$  to  $V_{L,t_2} = V_C + V_{th} \approx 1.25V$ . Hence, the BR2R cascode can deliver a compliance voltage as high as  $V_{dd} + V_{th}$ .

#### 4.2.2 Output Impedance

During  $t_0 \leftrightarrow t_2$  in Fig. 4.2(c), the BR2R drain terminal  $V'_L$  rises at a rate of only  $C_L/C_{PL} = 1/3.5$  times the rise in the load capacitor voltage  $V_L$ . Hence, the output impedance of the BR2R cascode increases by a factor of  $C_{PL}/C_L$  when compared to an equivalent wide-swing cascode source. This fact is evident from the load current waveforms where  $I_L$  appears much flatter than  $I_{LS}$  during  $t_0 \leftrightarrow t_2$ .



Figure 4.3: Proposed Differential Voltage-to-Time (DVT) architecture.

#### 4.3 BR2R integration in a Differential Voltage to Time Converter

Prior VTC topologies [41]-[44] suffer from inevitable mismatch between the current sources or input sampling capacitors of their respective differential halves. Also, these VTCs suffer from an input dependent gain distortion due to the restricted compliance voltages of their current sources and the dependence of their comparator propagation delay on the intersect slope of their input and ramp voltages [47].

The proposed DVT architecture in Fig. 4.3 mitigates these issues by employing three calibration loops and beyond rail-to-rail (BR2R) cascode sources  $I_p$  and  $I_n$ . Here,  $I_p$  acts as a voltage controlled current source with  $V_a$  controlling the  $M_A$  gate voltage in Fig. 4.2(a), while the current sink  $I_n$  is built using NMOS transistors and is functionally similar to  $I_p$ . The DVT converts an input differential voltage  $(V_{in}^+ - V_{in}^-)$  sampled onto  $C_p$  and  $C_n$  into a pulse width at the output  $T_{out}$  of the continuous time comparator, such that the conversion gain is insensitive to process or mismatch variations. The DVT timing diagram consists of two distinct phases in Fig. 4.4, where the DVT achieves process and mismatch insensitivity in the calibration phase via the combined action of three feedback loops and five calibration capacitors  $C_a$ ,  $C_o$ ,  $C_{oo}$ ,  $C_e$  and  $C_{ee}$ . While, the DVT output  $T_{out}$  is sampled in the subsequent voltage to time conversion phase as described below.



Figure 4.4: Timing diagram showing a complete DVT conversion cycle.

#### 4.3.1 Calibration Phase

The three DVT calibration loops are activated in a time interleaved manner as shown in Fig. 4.4, and are described below.

## 4.3.1.1 Offset Null $(T_s = T_{ref} = 1)$

The shorting of the comparator inputs to  $V_{ref}$  charges a small capacitor  $C_{oo}$  to the sign of its input offset when  $T_{ref}$  goes low. Subsequent charge sharing with a much larger capacitor  $C_o$  (at  $T_q = 1$ ) biases  $V_o$  to cancel the comparator offset.

#### 4.3.1.2 Current Equalization $(T_{ce} = 1)$

This loop puts the current source/sink pair  $(I_p, I_n)$  in series while isolating them from the rest of the DVT circuit. Thus, any residual charge flows through  $C_a$ , changing the control voltage  $V_a$  of the current source  $I_p$  until the current pair equalizes, thus forcing  $I_p = I_n$  regardless of process or mismatch variations.

#### 4.3.1.3 Current Calibration $(T_{cal} = T_{c13} = 1)$

The DVT current calibration loop in Fig. 4.5(a) pegs process and mismatch sensitive quantities  $I_p$ ,  $I_n$ ,  $C_p$  and  $C_n$  to off-chip stable voltage and pulse width references  $V_{ref}$ and  $T_{cal}$ , respectively. Initially,  $V_{ref}$  is sampled onto the input capacitors  $C_p$  and  $C_n$ , then  $\overline{T_s} = T_q = 1$  enforces  $V_p = 2 \cdot V_{ref}$  and  $V_n = 0V$ . Finally, this loop adjusts  $I_n$ every cycle via  $C_e$  and  $C_{ee}$  until the capacitor nodes  $V_n$  and  $V_p$  cross over precisely



Figure 4.5: The DVT current calibration loop (a) circuit diagram (b) timing diagram at steady state.

within the pulse width of  $T_{cal}$  as depicted in Fig. 4.5(b). Therefore, these three feedback loops enforce the following condition.

$$I \cdot \left[\frac{1}{C_p} + \frac{1}{C_n}\right] = \frac{2 \cdot V_{ref}}{T_{cal}} \tag{4.6}$$

#### 4.3.2 Time Conversion Phase

This phase is further divided into three sub-phases  $T_1$ ,  $T_2$  and  $T_3$  as depicted in Figs. 4.6 (a), (b) and (c), respectively. Here,  $T_1$  sets the output common-mode pulse width, while the DVT output  $T_{out}$  is taken differentially against two  $V_p$ ,  $V_n$  cross over events in  $T_2$  and  $T_3$ , which effectively cancels the comparator propagation delay at  $T_{out}$  in Fig. 4.6 (d).

• Sub-phase  $T_1$ :  $(T_{C13} = 1, T_2 = 0)$ 

The differential input is sampled onto  $C_p$  and  $C_n$  when  $T_{in} = T_s = 1$  in Figs. 4.3 & 4.4, thereafter  $\overline{T_s} = 1$  yields  $V_n - V_p = V_{in}^+ - V_{in}^-$ . The resulting  $V_p$ ,  $V_n$  charge/discharge curves for two cases corresponding to  $V_p > V_n$  and  $V_p < V_n$  are shown in Fig. 4.6 (d), where  $V_{cm}$  and  $V_d$  are the input commonmode and peak single-ended input voltages, respectively. Now, the node  $V_n$  is always charged in  $T_1$  via  $I_p$  while  $I_n$  discharges  $V_p$ , giving us:

$$V_{p,T_1} = V_p - I \cdot T_1 / C_p$$
 and  $V_{n,T_1} = V_n + I \cdot T_1 / C_n$  (4.7)

Where  $V_{p,T_1}$  and  $V_{n,T_1}$  are the nodal voltages  $V_p$  and  $V_n$  at the end of sub-phase  $T_1$ .



Figure 4.6: DVT in the time conversion phase (a) sub-phase  $T_1$  (b) sub-phase  $T_2$  (c) sub-phase  $T_3$  (d) DVT timing diagram in sub-phases  $T_1$ ,  $T_2$  and  $T_3$ .

•  $Sub - phase T_2$ :  $(T_{C13} = 0, T_2 = 1)$ 

Making  $T_1$  wide enough ensures  $V_{n,T_1} > V_{p,T_1}$  for both cases in Fig. 4.6(d), yielding  $T_{out} = 0$  at the start of sub-phase  $T_2$ . Now,  $I_n$  always discharges  $V_n$  while  $V_p$  is charged by  $I_p$  during  $T_2$  in Fig. 4.6(b). Hence, the DVT output  $T_{out}$  goes high when  $V_p$  exceeds  $V_n$  after a delay of  $T'_2$  (as depicted in Case-A), in general this happens when  $V_{p,T_1} + I \cdot T'_2/C_p = V_{n,T_1} - I \cdot T'_2/C_n$ . Now, using (4.6), (4.7) and  $V_n - V_p = V_{in}^+ - V_{in}^-$  in the aforesaid relation for  $T'_2$  gives:

$$T_2'' = T_{pg} + T_1 + \left(V_{in}^+ - V_{in}^-\right) \cdot T_{cal}/2 \cdot V_{ref}$$
(4.8)

Where,  $T_2'' = T_2' + T_{pg}$  includes the effect of the comparator propagation delay  $T_{pg}$ . Again, the nodes  $V_p/V_n$  continue to charge/discharge until the end of  $T_2$ , giving us:

$$V_{p,T_2} = V_{p,T_1} + \frac{I \cdot T_2}{C_p}$$
 and  $V_{n,T_2} = V_{n,T_1} - \frac{I \cdot T_2}{C_n}$  (4.9)

Where  $V_{p,T_2}$  and  $V_{n,T_2}$  are the nodal voltages  $V_p$  and  $V_n$  at the end of sub-phase  $T_2$ .



Figure 4.7: (a) Simulated V-I curves of the rail-to-rail BR2R cascode using  $\frac{C_{PL}}{C_L} = 2$  vs an equivalent wide-swing cascode source/sink pair (b) simulated gain variation against the input signal for the DVT built using the wide-swing cascode vs the BR2R cascode sources via period steady-state analysis.

•  $Sub - phase T_3$ :  $(T_{C13} = 1, T_2 = 0)$ 

Again making  $T_2$  wide enough ensures  $V_{p,T_2} > V_{n,T_2}$ , keeping  $T_{out} = 1$  at the start of sub-phase  $T_3$  in Figs. 4.6(c) & (d). Now,  $V_p$  is always discharged by  $I_n$  while  $I_p$  charges  $V_n$  during  $T_3$ . So, the DVT output  $T_{out}$  goes low when  $V_n$  intersects  $V_p$  after a delay of  $T'_3$  (as shown in case-A), in general this happens when  $V_{p,T_2} - I \cdot T'_3/C_p = V_{n,T_2} + I \cdot T'_3/C_n$ . Using (4.6), (4.7), (4.9) and  $V_n - V_p = V_{in}^+ - V_{in}^-$  in the aforesaid relation for  $T'_3$  gives:

$$T_3'' = T_{pg} + T_2 - T_1 - \left(V_{in}^+ - V_{in}^-\right) \cdot T_{cal}/_{2 \cdot V_{ref}}$$
(4.10)

Where,  $T_3'' = T_3' + T_{pg}$ . Using (4.8) and (4.10) with the DVT timing output pulse width  $T_{out} = T_2 - T_2'' + T_3''$  yields:

$$T_{out} = T_{cm} + \left(V_{in}^{+} - V_{in}^{-}\right) \cdot G_{vt}$$
(4.11)

Where,  $G_{vt} = -T_{cal}/V_{ref}$  is the DVT conversion gain and  $T_{cm} = 2 \cdot (T_2 - T_1)$  is the output common-mode delay. Now, precise timing  $(T_{cal})$  and voltage  $(V_{ref})$ references are readily available for any mixed signal system. So,  $G_{vt}$  is insensitive to process and mismatch variations and can be easily tuned by varying  $T_{cal}$ . Notice that the input dependent comparator propagation delay  $T_{pg}$  is completely canceled out in (4.11).

Interestingly, the preceding analysis leading up to (4.11) does not assume  $C_p = C_n$ . Therefore, the time locations of the  $V_p$ ,  $V_n$  intersection events in sub-phases  $T_2$  and  $T_3$  as defined by (4.8) and (4.10), respectively, are independent of the input commonmode voltage  $V_{cm}$  or the mismatch between  $C_p$  and  $C_n$ . Hence, the proposed DVT architecture can maintain a robust CMRR regardless of process or mismatch variations.

#### 4.4 Results

Fig. 4.7(a) shows the simulated V-I curves of the wide-swing cascode current source/sink pair at  $V_{dd} = 1V$  vs the BR2R pair designed for rail-to-rail operation using  $V_A = 0.625V$ ,  $V_B = 0.5V$ ,  $V_C = 0.75V$ ,  $C_L = 300 fF$ ,  $C_{PL}/C_L = 2$ ,  $M_A = M_C = M_{AS} =$  $3.75 \mu m/250 nm$  and  $M_B = M_{BS} = 5 \mu m/250 nm$ . The compliance voltage as well as the output impedance of the proposed BR2R pair is almost twice than an equivalent wide-swing cascode pair. The DVT topology simulated using the wide-swing cascode pair with limited compliance voltage suffers from an input dependent conversion gain compared to an almost constant gain for the BR2R based DVT as shown in Fig. 4.7(b).

Die photo of the DVT prototype fabricated in 65nm CMOS is shown in Fig. 4.8 with an active area of  $0.021mm^2$ . Fig. 4.9 shows the test setup, where the oscilloscope measures and stores the DVT output pulse width into an Excel file for SNDR estimation using MATLAB. The timing  $(T_s, T_{cal}, T_2)$  and voltage  $(V_{ref})$  references are generated off-chip, where the DVT operates with an 8MHz clock. The calibration and time-conversion phases alternate between consecutive clock periods, resulting into a decimated (by 2) input sampling frequency of 4MHz. The active digital (logic) and



Figure 4.8: Die photo of the DVT prototype fabricated in 65nm CMOS.



Figure 4.9: DVT test setup using a custom-built PCB showing the DVT output pulse width and its extracted timing trend for a 440kHz sinusoidal input.



Figure 4.10: Measured DVT power spectral density for a 600mV full scale differential input at 133KHz utilizing (a) BR2R cascode current sources (b) wide-swing cascode current sources.

analog (comparator and current sources) power measures  $16\mu W$  and  $31\mu W$ , respectively, for a total  $47\mu W$  consumed from a 1V supply. The DVT is fabricated with  $C_p = C_n = 0.3pF$ ,  $C_a = 0.6pF$ ,  $C_e = 4pF$ ,  $C_o = 3pF$  and  $C_{oo} = C_{ee} < 0.8fF$ , where  $I_p$  and  $I_n$  can switch between the wide-swing cascode or the BR2R cascode pair with  $C_{PL}/C_{p,n} = 4$ . The following DVT results are obtained using  $V_{dd} = 1V$ ,  $V_{ref} = 0.5V$ ,  $T_{cal} = T_2 = T_3 = 31ns$  and  $T_1 = 13ns$ . The DVT output  $T_{out}$  has a timing range of  $T_2 + T_3 = 62ns$  at the high end and 10ns at the lower end. Also, the BR2R source can operate at much higher frequencies (>> 8MHz) and is only limited by the time



Figure 4.11: Measured DVT characteristics utilizing the BR2R cascode pair (a) output SNDR and CMRR vs the input frequency (b) conversion gain  $|G_{vt}|$  vs the timing reference pulse width  $T_{cal}$  for  $V_{ref} = 0.5V$ 

needed to recharge its bootstrapping capacitor  $C_{PL}$  in Fig. 4.2(a).

The DVT output SNDR utilizing the BR2R cascode pair measures 50.2dB for a 133KHz, 600mV full scale differential input in Fig. 4.10(a), against an SNDR of only 38.7dB for the wide-swing cascode based DVT in Fig. 4.10(b). The analog signal bandwidth is 2MHz, where the BR2R based DVT SNDR rolls off to 47dB at 1.9MHz input in Fig. 4.11(a). The BR2R pair allows an input DVT common-mode range of 0.4V to 0.6V with a relatively constant 35.1dB CMRR for a 300mV common-mode test swing in Fig. 4.11(a). The static error between the measured and theoretical values of the DVT gain ( $|T_{cal}/v_{ref}|$ ) in Fig. 4.11(b) is attributed to the chip pad parasitic capacitance which distorts the pulse width of  $T_{cal}$ . Also, the measured maximum DVT gain variation against changes in temperature ( $10^{\circ}C \rightarrow 90^{\circ}C$ ) and voltage ( $0.9V \rightarrow 1.1V$ ) is limited to 4.1% and 1.3%, respectively. A summary of these results is included in Table 4.1.

The DVT immunity across mismatch and process variations is verified via the transient behavior of the DVT output and control voltages  $V_o$ ,  $V_e$  and  $V_a$  under the SS and FF corners in Fig. 4.12. For the SS-corner in Fig. 4.12(a),  $V_e$  rises while  $V_a$  falls to keep  $I_p = I_n$  until the equality in (4.6) is satisfied around cycle no. 263, while almost nil comparator offset forces  $V_o$  to remain close to  $V_{ref} = 500 mV$ . The FF-corner in Fig. 4.12(b) includes a mismatch of 20% between the input sampling capacitors  $C_p$  and  $C_n$  and the comparator input pair. Subsequently,  $V_o$  rises above 500mV to cancel comparator offset, also  $V_e$  falls while  $V_a$  rises until steady state is achieved around cycle



Figure 4.12: Simulated transient response of the DVT control voltages  $V_e$ ,  $V_o$  and  $V_a$  under different process corners (a) SS-corner (b) FF-corner with 20% mismatch between the input sampling capacitors  $C_p, C_n$  and the comparator input pair (c) Transient output pulse width  $T_{out}$  for a 500mV sinusoidal input, under the SS and FF corners in (a) and (b), respectively.

| v                                    |                                 |
|--------------------------------------|---------------------------------|
| Process                              | 65nm                            |
| Voltage                              | 1V                              |
| Area                                 | $0.021 mm^2$                    |
| Input dynamic range (pk-pk)          | 600mV                           |
| Input common-mode range              | $0.4V \longleftrightarrow 0.6V$ |
| Operating Frequency                  | 8MHz                            |
| Input Bandwidth                      | 2MHz                            |
| SNDR                                 | 50.2dB                          |
| CMRR                                 | 35.1 dB                         |
| Power (digital logic)                | $16\mu W$                       |
| Power (comparator + current sources) | $31\mu W$                       |
|                                      |                                 |

Table 4.1: Summary of the BR2R DVT Measurement Results

no. 123. At steady state the DVT output  $T_{out}$  becomes indistinguishable between the SS and FF corners in Fig. 4.12(c). Monte-carlo mismatch simulations confirm that the worst case DVT conversion gain error lies to within  $\pm 1\%$  ( $3 \times \sigma = 3 \times 248 ps/V$ ) of its mean gain (62.1ns/V).

#### 4.5 Conclusion

Owing to their beyond rail-to-rail compliance voltages and improved output impedance the proposed BR2R sources offer an enhanced alternative to traditional wide-swing cascode current sources. These BR2R sources can be immensely useful in applications such as charge pump for low voltage PLLs or in differential voltage to time converters (DVT) as demonstrated in this chapter. Here, the BR2R based DVT's linearity is improved by almost 12dB when compared to a traditional wide-swing cascode implementation. Moreover, unlike prior techniques, the proposed DVT architecture is process insensitive and can even tolerate mismatch between its input sampling capacitors without affecting its timing output.

## Chapter 5

## Conclusion

This chapter summarizes the contributions in this thesis. Additionally, suggestions for future work are presented.

#### 5.1 Conclusions

Continuous device scaling allows an increasing number of transistors to be packed in the same die area. System integration can be improved further by building three dimensional integrated circuits (3D-IC), which consists of multiple dies connected vertically using through silicon vias (TSV). However, nano scale transistor circuits suffer from increased mismatch and reduced voltage dynamic range. These adverse effects are further accentuated in 3D-ICs due to their worse thermal gradients when compared with planar 2D designs. This thesis purposed techniques to mitigate circuit mismatch issues in two applications including voltage to time conversion and clock distribution topologies in 3D-ICs.

Thesis chapter 2 purposed a mismatch insensitive skew compensation architecture (MISC) for 3D-ICs. Traditional solutions to clock skew compensation in 3D-ICs rely on perfect matching between their constituent delay lines and/or matched TSV delays. These assumptions are elusive given the increasing mismatch in deep submicron technologies coupled with worse thermal gradients in 3D-ICs when compared to their 2D counterparts. Unlike prior techniques, the purposed MISC architecture can eliminate any residual skew between the die-1 and die-2 clocks resulting from control code dependent mismatch in delay lines or unequal TSV delays. Under similar worse case mismatch conditions, residual skew in the proposed MISC architecture was delimited to 32ps at 1GHz, compared to 116ps for a recent DTD topology.

Chapter 3 purposed an auto-tuning algorithm to minimize the supply voltage sensitivity of the forward MISC delay line. This supply compensated MISC topology was fabricated in 65nm CMOS and demonstrated robust performance against the supply voltage noise within the operating frequencies of 250MHz to 1GHz. The compensated MISC delay line measured an rms jitter of only 3.0ps compared to 112.3ps for the conventional (uncompensated) delay line at 1GHz operation with a  $25mV \ 1MHz$  supply noise. Additionally, measurement results confirmed that the MISC can maintain a load clock in die-2 in phase with the source clock in die-1 to within 30ps across its entire frequency range, while tolerating up to 50% mismatch in delay lines or up to 1ns delay disparity in TSVs.

Finally, chapter 4 purposed beyond rail-to-rail compliant (BR2R) current sources to markedly improve the compliance voltage and output impedance of the extensively used wide-swing cascode source. The purposed BR2R sources were further integrated within a differential voltage to time converter (DVT). This DVT topology utilizes three feedback loops to achieve process and mismatch immunity and can even tolerate mismatch between its input sampling capacitors without sacrificing the commonmode rejection ratio or the conversion gain. The BR2R based DVT architecture was fabricated in 65nm CMOS and tested using a custom built PCB. Measurement results confirmed that the BR2R based DVT's linearity is improved by almost 12dB when compared to a traditional wide-swing cascode implementation.

#### 5.2 Future Work

In light of the theoretical and measurement results of this study, future investigations are recommended to enhance the performance of voltage to time and clock distribution topologies presented in this thesis as follows.

- Lock time of the proposed mismatch insensitive three-dimensional clock distribution architecture (MISC) can be decreased to less than 100 cycles by employing adaptive binary search algorithms. Essentially, the long lock time results from mismatch between the MISC delay lines. Hence, measuring and storing this mismatch on chip via a look-up table can drastically shorten the number of clock cycles needed to complete the binary search algorithm.
- The MISC architecture requires two delay lines to synchronize a source clock in die-1 to a load clock in die-2. However, only the delay accumulated in one of these delay lines is needed upon final synchronization, while the other delay line becomes redundant once the synchronization is complete. Hence, further

investigation into removing this redundancy to improve power efficiency is recommended.

- Beyond Rail-to-Rail (BR2R) current sources are presented in this work to improve the dynamic range of differential voltage to time converters (DVT). Further work into adding a time to digital converter after the DVT is recommended to realize a complete analog to digital conversion solution.
- The BR2R current sources presented in this thesis can drastically improve the voltage compliance and output impedance of the conventional wide-swing cascode source. Hence, future investigation into replacing these wide-swing cascode sources with the purposed BR2R sources in a number of applications such as phase locked loops or ramp generators is recommended.

## Bibliography

- [1] Z. Liu, S. Swarup, S.-D. Tan, H.-B. Chen, and H. Wang, "Compact lateral thermal resistance model of tsvs for fast finite-difference based thermal analysis of 3-d stacked ics," *Computer-Aided Design of Integrated Circuits and Systems*, *IEEE Transactions on*, vol. 33, no. 10, pp. 1490–1502, Oct 2014.
- [2] F. Ye and K. Chakrabarty, "Tsv open defects in 3d integrated circuits: Characterization, test, and optimal spare allocation," in *Design Automation Conference* (DAC), 2012 49th ACM/EDAC/IEEE, June 2012, pp. 1024–1030.
- [3] Y.-H. Lin, S.-Y. Huang, K.-H. Tsai, W.-T. Cheng, and S. Sunter, "A unified method for parametric fault characterization of post-bond tsvs," in *Test Confer*ence (ITC), 2012 IEEE International, Nov 2012, pp. 1–10.
- [4] P. Marchal, G. Van der Plas, P. Limaye, A. Mercha, V. Cherman, H. O'Prins, R. Labie, B. Vandevelde, Y. Travaly, and E. Beyne, "Verifying thermal/thermomechanical behavior of a 3d stack - challenges and solutions," in VLSI Design Automation and Test (VLSI-DAT), 2010 International Symposium on, April 2010, pp. 15–16.
- [5] A. Todri-Sanial, S. Kundu, P. Girard, A. Bosio, L. Dilillo, and A. Virazel, "Globally constrained locally optimized 3-d power delivery networks," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 22, no. 10, pp. 2131–2144, Oct 2014.
- [6] A. Todri, S. Kundu, P. Girard, A. Bosio, L. Dilillo, and A. Virazel, "A study of tapered 3-d tsvs for power and thermal integrity," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 21, no. 2, pp. 306–319, Feb 2013.
- [7] T. S. Sandhu and K. El-Sankary, "A mismatch-insensitive skew compensation architecture for clock synchronization in 3-d ics," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 24, no. 6, pp. 2026–2039, June 2016.
- [8] T. S. Sandhu and K. El-Sankary, "Beyond rail-to-rail compliant current sources for mismatch-insensitive voltage-to-time conversion," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, pp. 1–5, 2018.
- [9] K. Bowman, S. Duvall, and J. Meindl, "Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration," *Solid-State Circuits, IEEE Journal of*, vol. 37, no. 2, pp. 183–190, Feb 2002.
- [10] F. Lavalle-Aviles, J. Torres, and E. Sanchez-Sinencio, "A high power supply rejection and fast settling time capacitor-less ldo," *IEEE Transactions on Power Electronics*, pp. 1–1, 2018.

- [11] Y. Lu, Y. Wang, Q. Pan, W. H. Ki, and C. P. Yue, "A fully-integrated lowdropout regulator with full-spectrum power supply rejection," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 62, no. 3, pp. 707–716, March 2015.
- [12] Y. Liu, Y. Han, W. Rhee, T. Y. Oh, and Z. Wang, "A psrr enhancing method for gro tdc based clock generation systems," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 61, no. 3, pp. 680–688, March 2014.
- [13] S. G. Kim, J. Rhim, D. H. Kwon, M. H. Kim, and W. Y. Choi, "A low-voltage pll with a supply-noise compensated feedforward ring vco," *IEEE Transactions* on Circuits and Systems II: Express Briefs, vol. 63, no. 6, pp. 548–552, June 2016.
- [14] S. Garg and D. Marculescu, "3d-gcp: An analytical model for the impact of process variations on the critical path delay distribution of 3d ics," in *Quality of Electronic Design*, 2009. ISQED 2009. Quality Electronic Design, March 2009, pp. 147–155.
- [15] S. Rusu and S. Tam, "Clock generation and distribution for the first ia-64 microprocessor," in Solid-State Circuits Conference, 2000. Digest of Technical Papers. ISSCC. 2000 IEEE International, Feb 2000, pp. 176–177.
- [16] X. Chen, T. Zhu, W. Davis, and P. Franzon, "Adaptive and reliable clock distribution design for 3-d integrated circuits," *Components, Packaging and Manufacturing Technology, IEEE Transactions on*, vol. 4, no. 11, pp. 1862–1870, Nov 2014.
- [17] A. Kapoor, N. Jayakumar, and S. Khatri, "A novel clock distribution and dynamic de-skewing methodology," in *Computer Aided Design*, 2004. ICCAD-2004. IEEE/ACM International Conference on, Nov 2004, pp. 626–631.
- [18] J.-W. Ke, S.-Y. Huang, C.-W. Tzeng, D.-M. Kwai, and Y.-F. Chou, "Die-to-die clock synchronization for 3-d ic using dual locking mechanism," *Circuits and Systems I: Regular Papers, IEEE Transactions on*, vol. 60, no. 4, pp. 908–917, April 2013.
- [19] C.-C. Chung and C.-Y. Hou, "All-digital delay-locked loop for 3d-ic die-to-die clock synchronization," in VLSI Design, Automation and Test (VLSI-DAT), 2014 International Symposium on, April 2014, pp. 1–4.
- [20] J. de Gyvez and H. Tuinhout, "Threshold voltage mismatch and intra-die leakage current in digital cmos circuits," *Solid-State Circuits, IEEE Journal of*, vol. 39, no. 1, pp. 157–168, Jan 2004.
- [21] M. Pelgrom, A. C. Duinmaijer, and A. Welbers, "Matching properties of mos transistors," *Solid-State Circuits, IEEE Journal of*, vol. 24, no. 5, pp. 1433–1439, Oct 1989.

- [22] R.-J. Yang and S.-I. Liu, "A 40 ndash;550 mhz harmonic-free all-digital delaylocked loop using a variable sar algorithm," *Solid-State Circuits, IEEE Journal* of, vol. 42, no. 2, pp. 361–373, Feb 2007.
- [23] J.-W. You, S.-Y. Huang, Y.-H. Lin, M.-H. Tsai, D.-M. Kwai, Y.-F. Chou, and C.-W. Wu, "In-situ method for tsv delay testing and characterization using input sensitivity analysis," *Very Large Scale Integration (VLSI) Systems, IEEE Transactions on*, vol. 21, no. 3, pp. 443–453, March 2013.
- [24] D. De Caro, "Glitch-free nand-based digitally controlled delay-lines," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 21, no. 1, pp. 55– 66, Jan 2013.
- [25] A.-J. Chuang, Y. Lee, and C.-Y. Yang, "A chip-to-chip clock-deskewing circuit for 3-d ics," in *Circuits and Systems (ISCAS)*, 2012 IEEE International Symposium on, May 2012, pp. 1652–1655.
- [26] S. Deutsch and K. Chakrabarty, "Contactless pre-bond tsv test and diagnosis using ring oscillators and multiple voltage levels," *Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on*, vol. 33, no. 5, pp. 774– 785, May 2014.
- [27] M. Aoki, F. Furuta, K. Hozawa, Y. Hanaoka, H. Kikuchi, A. Yanagisawa, T. Mitsuhashi, and K. Takeda, "Fabricating 3d integrated cmos devices by using wafer stacking and via-last tsv technologies," in *Electron Devices Meeting (IEDM)*, 2013 IEEE International, Dec 2013, pp. 29.5.1–29.5.4.
- [28] H. Park and T. Kim, "Synthesis of tsv fault-tolerant 3-d clock trees," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 34, no. 2, pp. 266–279, Feb 2015.
- [29] L. Wang, L. Liu, and H. Chen, "An implementation of fast-locking and widerange 11-bit reversible sar dll," *Circuits and Systems II: Express Briefs, IEEE Transactions on*, vol. 57, no. 6, pp. 421–425, June 2010.
- [30] H.-H. Chang and S.-I. Liu, "A wide-range and fast-locking all-digital cyclecontrolled delay-locked loop," *Solid-State Circuits, IEEE Journal of*, vol. 40, no. 3, pp. 661–670, March 2005.
- [31] X. Gui, P. Gao, and Z. Chen, "A cml ring oscillator-based supply-insensitive pll with on-chip calibrations," *IEEE Transactions on Microwave Theory and Techniques*, vol. 63, no. 1, pp. 233–243, Jan 2015.
- [32] T. Wu, K. Mayaram, and U. K. Moon, "An on-chip calibration technique for reducing supply voltage sensitivity in ring oscillators," *IEEE Journal of Solid-State Circuits*, vol. 42, no. 4, pp. 775–783, April 2007.

- [33] M. Mansuri and C. K. K. Yang, "A low-power adaptive bandwidth pll and clock buffer with supply-noise compensation," *IEEE Journal of Solid-State Circuits*, vol. 38, no. 11, pp. 1804–1812, Nov 2003.
- [34] H. Sung, K. Cho, K. Yoon, and S. Kang, "A delay test architecture for tsv with resistive open defects in 3-d stacked memories," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 22, no. 11, pp. 2380–2387, Nov 2014.
- [35] Y. w. Lee, H. Lim, and S. Kang, "Grouping-based tsv test architecture for resistive open and bridge defects in 3-d-ics," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 36, no. 10, pp. 1759–1763, Oct 2017.
- [36] G. V. der Plas, P. Limaye, I. Loi, A. Mercha, H. Oprins, C. Torregiani, S. Thijs, D. Linten, M. Stucchi, G. Katti, D. Velenis, V. Cherman, B. Vandevelde, V. Simons, I. D. Wolf, R. Labie, D. Perry, S. Bronckers, N. Minas, M. Cupac, W. Ruythooren, J. V. Olmen, A. Phommahaxay, M. de Potter de ten Broeck, A. Opdebeeck, M. Rakowski, B. D. Wachter, M. Dehan, M. Nelis, R. Agarwal, A. Pullini, F. Angiolini, L. Benini, W. Dehaene, Y. Travaly, E. Beyne, and P. Marchal, "Design issues and considerations for low-cost 3-d tsv ic technology," *IEEE Journal of Solid-State Circuits*, vol. 46, no. 1, pp. 293–307, Jan 2011.
- [37] C. Huang, K. Wu, and Z. Wang, "Low-capacitance through-silicon-vias with combined air/sio2 liners," *IEEE Transactions on Electron Devices*, vol. 63, no. 2, pp. 739–745, Feb 2016.
- [38] X. Chen, T. Zhu, W. R. Davis, and P. D. Franzon, "Adaptive and reliable clock distribution design for 3-d integrated circuits," *IEEE Transactions on Components, Packaging and Manufacturing Technology*, vol. 4, no. 11, pp. 1862–1870, Nov 2014.
- [39] D. Wang, X. L. Tan, and P. K. Chan, "A 65-nm cmos constant current source with reduced pvt variation," *IEEE Transactions on Very Large Scale Integration* (VLSI) Systems, vol. 25, no. 4, pp. 1373–1385, April 2017.
- [40] H. Kayahan, Ceylan, M. Yazici, S. Zihir, and Y. Gurbuz, "Wide range, process and temperature compensated voltage controlled current source," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 60, no. 5, pp. 1345–1353, May 2013.
- [41] S. Naraghi, M. Courcy, and M. Flynn, "A 9-bit, 14uw and 0.06mm2 pulse position modulation adc in 90 nm digital cmos," *Solid-State Circuits, IEEE Journal of*, vol. 45, no. 9, pp. 1870–1880, Sept 2010.

- [42] W. Jung, Y. Mortazavi, B. Evans, and A. Hassibi, "An all-digital pwm-based delta sigma adc with an inherently matched multi-bit quantizer," in *Custom Integrated Circuits Conference (CICC)*, 2014 IEEE Proceedings of the, Sept 2014, pp. 1–4.
- [43] T. Oh, H. Venkatram, and U. K. Moon, "A time-based pipelined adc using both voltage and time domain information," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 4, pp. 961–971, April 2014.
- [44] L. J. Chen and S. I. Liu, "A 12-bit 3.4 ms/s two-step cyclic time-domain adc in 0.18- μm cmos," *IEEE Transactions on Very Large Scale Integration (VLSI)* Systems, vol. 24, no. 4, pp. 1470–1483, April 2016.
- [45] C. Taillefer and G. Roberts, "Delta sigma a/d conversion via time-mode signal processing," *Circuits and Systems I: Regular Papers, IEEE Transactions on*, vol. 56, no. 9, pp. 1908–1920, Sept 2009.
- [46] T. Miki, N. Miura, K. Mizuta, S. Dosho, and M. Nagata, "A 500mhz-bw 52.5dbthd voltage-to-time converter utilizing a two-step transition inverter," in ESS-CIRC Conference 2016: 42nd European Solid-State Circuits Conference, Sept 2016, pp. 141–144.
- [47] J. Z. Ru, C. Palattella, P. Geraedts, E. Klumperink, and B. Nauta, "A highlinearity digital-to-time converter technique: Constant-slope charging," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 6, pp. 1412–1423, June 2015.

## **APPENDIX A : Copyright Permission**

Response By Email (Tiffany)  $(07/25/2018 \ 03:50 \ PM)$ 

Dear Tejinder,

Thank you for your inquiry. Please review our guidelines in the "Can I reuse my published article in my thesis?" section of the following page: https://ieeeauthorcenter.ieee.org/choose-a-publishing-agreement/avoid-infringement-upon-ieee-copyright/.

If you have any further questions after reviewing that page, please do let me know. Thank you for publishing with IEEE, and good luck on your thesis!

Best regards, Tiffany – Tiffany McKerahan Author Engagement and Support Manager IEEE Publications

Customer By CSS Email (Tejinder Sandhu) (07/25/2018 11:07 AM) Dear Sir/Madam

I am finalizing my PhD thesis for submission to the Faculty of Graduate Studies at Dalhousie University, Halifax, NS, Canada. I am the first author of the following two IEEE transaction journals.

T. S. Sandhu and K. El-Sankary, "A Mismatch-Insensitive Skew Compensation Architecture for Clock Synchronization in 3-D ICs," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 24, no. 6, pp. 2026-2039, June 2016. doi:

## 10.1109/TVLSI.2015.2496312

T. S. Sandhu and K. El-Sankary, "Beyond Rail-to-Rail Compliant Current Sources for Mismatch-Insensitive Voltage-to-Time Conversion," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems. doi: 10.1109/TVLSI.2018.2844728

I am seeking your permission to include a manuscript version of the above mentioned papers in my thesis. Full publication details and a copy of this permission letter will be included in the thesis.

Regards, Tejinder Singh Sandhu