

# picoTDC: Pico-second TDC for HEP

Moritz Horstmann, Jorgen Christiansen, Samuele Altruda, Gill Lumer-Klabbers, Jeffrey Prinzie (KU Leuven)

CERN/EP-ESE



#### PicoTDC: overview

- Front-end
  - FastIC
- PicoTDC Design in a glance
  - Architecture
- Testing system
  - HW
  - FW
  - SW
- Tests
  - Code density test
  - Sweep test
  - Crosstalk test



#### FastIC: Highly configurable FE in 65 nm CMOS

Technical and financial collaboration between CERN and University of Barcelona (ICCUB)

**High resolution TDC Fast timing detectors Photomultiplier** Analog Front-End Digital Back-End SIPM, PMT, MCP Amp/Discr **TDC** 

- Technological advancement in detector technology and TDCs
  - Enormous progress in SiPMs: PDE increase from 20 % to 60 % !!
  - New TDC development @ CERN: picoTDC (~3 ps bin).
- FastIC: Bridge the gap between developments on fast timing detectors and new available TDCs
  - Multipurpose chip: SiPMs, PMTs, MCPs, i.e. wide range detector capacitances!
  - Target SPTR (MEASURED Sensor+FE+TDC) competitive with NINO, i.e. ~70 ps<sub>rms</sub> with 3x3 mm<sup>2</sup> SiPM (Hamamatsu S13360-3050CS, 50 um microcells, 3600 cells, 10 V over-voltage).



- Linearity in energy measurement (~2.5 % Lin. Error). Dynamic range from 5 uA to 20 mA.
- Configurable Input: Single ended (positive OR negative polarity) // Differential // Summation of 4 SE inputs (positive OR negative).
- Configurable Output. LVDS / SE CMOS (Linear / non-linear ToT) OR Analog output
- Power consumption < 10 mW/ch</li>

#### Towards area segmentation with fast analog summation:

- Potential time-jitter improvement by the segmentation factor '4'.
- Trade-off w.r.t. more power-hungry solutions (i.e. digital SiPMs)





- Maximum rate ~ 2 MHz (Linear ToT readout), > 50 MHz (Non-linear ToT/Analog mode. Pulse-shape-dependent).
- First version of the chip: 8 SE channels / 4 differential channels / 2 SUM4 channels.
- I2C interface. Compatible with picoTDC.



#### Contact for information:

rafael.ballabriga@cern.ch dgascon@fga.ub.edu jose.fernandez@cern.ch

First version submission: Fall 2019



#### PicoTDC architecture



#### PicoTDC Architecture



64 channels, 3ps or 12ps time binning, 200us dynamic range



### Two Stage Time Interpolation



Known issue: Capture FF Mismatch → Solved in next version



PLL

Top hit receivers

DLL

Bottom hit receivers





Samuele Altruda

Logic

#### PicoTDC on Test Cards







13/09/2019 Samuele Altruda

# PicoTDC testing system



### Instrumentation for testing





#### PCB mezzanine test board



Clk C2M\_1 and Clk C2M\_2 are connections without any specific purpose. Could be used as differential or single ended



#### FPGA Board (VC707 Xilinx commercial board)





#### Software structure (SW & FW available on Gitlab)







Code density test is performed to measure the effective bin size. The measurement doesn't include jitter and quantization.



To perform code density test we use an old RC based generator, which is sufficiently "random", providing us random hits.

As a random source we expect the same amount of hits in each bin.



13/09/2019 Samuele Altruda 16

Ch 31, not adjusted, **coarse mode**, bin 12ps, RMS\_dnl= 2,748ps, RMS\_inl= 2,971ps





Ch 31, adjusted, coarse mode, bin 12ps, RMS\_dnl= 0,270ps, RMS\_inl= 0,294ps Adjustment for Ch 31 is performed for this channel alone





Ch 31, not adjusted, fine mode, bin 3ps, RMS\_dnl= 2,813ps, RMS\_inl= 3,685ps





Good adjustment result after 2 step adjustment

Adjustment for Ch 31 is performed for this channel alone

Ch 31, adjusted, **fine mode**, bin 3ps, RMS\_dnl= 0,393ps, RMS\_inl= 0,351ps





CDT test **coarse time** not adjusted: Grouped vs. single channel RMS\_dnl= 3,48ps





13/09/2019 Samuele Altruda 21

CDT test **coarse time** adjusted: Grouped vs. single channel RMS\_dnl= 3,06ps





13/09/2019 Samuele Altruda 22



Sweep test is performed to quantify the linearity of picoTDC





To perform sweep test we use a dual channel pattern generator to provide the clock signal and the hit signal which is then delayed by the trombone.

The trombone is a very precise and repeatable programmable delay line.



#### **MAXIMUM DELAY**

100.0 ns

#### RESOLUTION

°0.50 ps per step



13/09/2019 Samuele Altruda 24

Ch 31, not adjusted, <u>coarse mode</u>, bin 12ps, RMS\_inl = 4,129ps

Total jitter= jitter from picoTDC + jitter from pulse generator







Ch 31, not adjusted, **coarse mode**, bin 12ps, RMS\_inl = 4,129ps





13/09/2019 Samuele Altruda 26

Ch 31, adjusted, <u>coarse mode</u>, bin 12ps, RMS\_inl = 3,585ps Adjustment for Ch 31 is performed for this channel alone







Ch 31, adjusted, **coarse mode**, bin 12ps, RMS\_inl = 3,585ps





13/09/2019 Samuele Altruda

28

Ch 31, not adjusted, **fine mode**, bin 3ps, RMS\_inl = 3,416ps





Samuele Altruda 29

- mean bin

Ch 31, not adjusted, **fine mode**, bin 3ps, RMS\_inl= 3,416ps





13/09/2019 Samuele Altruda

30

Ch 31, adjusted, <u>fine mode</u>, bin 3ps, RMS\_inl = 1,296ps Adjustment for Ch 31 is performed for this channel alone







Ch 31, adjusted, **fine mode**, bin 3ps, RMS\_inl= 1,296ps





13/09/2019 Samuele Altruda

32

Ch 31, adjusted, <u>fine mode</u>, bin 3ps, RMS\_inl = 1,352ps Averaged\_inl=0,801ps Adjustment for Ch 31 is performed for this channel alone





Samuele Altruda

33

#### PicoTDC crosstalk test



#### PicoTDC: Crosstalk test

Cross talk test is performed to quantify the noise introduced by the others channels against one. Worst case in exam: one against all.





Common Hit signal delay sweep from 0ps to 60000ps Ch 31 Hit signal at fixed delay



13/09/2019 Samuele Altruda

35

#### PicoTDC: Crosstalk test

Ch 31 vs All Chs, coarse mode, bin 12ps, LSB 12ps





13/09/2019 Samuele Altruda

36

# PicoTDC: performances summary

|             |    |          | Code Density Test |         | Sweep Test |
|-------------|----|----------|-------------------|---------|------------|
|             | Ch | adjusted | DNL               | INL     | INL        |
| Coarse time | 31 | X        | 2.748ps           | 2.971ps | 4.129ps    |
| Coarse time | 31 | ✓        | 0.270ps           | 0.294ps | 3.585ps    |
| Fine time   | 31 | X        | 2.813ps           | 3.68ps  | 3.416ps    |
|             | 31 | ✓        | 0.393ps           | 0.351ps | 1.296ps    |

CDT doesn't include jitter, quantization

INL include jitter, trombone non linearity, quantization

| Temperature performance | Coarse time | Variation limited in min-max 1 LSB pp                     | <1ps/°C   |
|-------------------------|-------------|-----------------------------------------------------------|-----------|
| Voltage performance     | Coarse time | Variation limited in min-max 7 LSB pp                     | <0.5ps/mV |
| Crosstalk<br>test       | Coarse time | Influence limited to 2 LSB Worst case one channel vs. all |           |









#### Low Jitter PLL

- Clock multiplication from 40MHz to 1.28 (2.56) GHz
  - Low jitter critical
  - Jitter filtering of 40MHz clock to the extent possible
    - 40MHz reference MUST be very clean
  - LC based oscillator
- Design: Jeffrey Prinzie,
   KU Leuven
- Prototyped & Tested
- Measurements very promising (340fs RMS jitter)





Phase Noise vs. Freq. Offset







#### Hit Receivers

- Differential receivers optimized for ultra-low jitter, low power
- Full Range (common mode 0V .. VDD=1.2V), somewhat LVDS-compatible
- Highest speed @~800mV common mode
- Optimized for 200mV Peak-Peak amplitude
- Design: Bram Faes, KU Leuven
- Prototyped & tested









# 1<sup>st</sup> Stage: DLL

- 64 taps, 12.2ps delay
- Self-Calibrating
- Jitter not as critical, doesn't pile up









#### 2<sup>nd</sup> Stage: Resistive Interpolation





### Finecode Drivers and Alignment

- Get down to 3ps bins
- Drivers: tapered buffers, each driving 32 FFs
- Phase alignment separate for each half









#### Capture Flip Flops

- Revisited design, timing vs. power very critical, 16k capture Flip Flops running @1.28GHz
- Highly optimized M/S Flip Flop followed by standard cell Flip Flop for metastability resolution
- Monte Carlo simulations show a mismatch of 800fs PMS, noise influence of 240fs RMS







# Full Timing Macro

- 64 channels, DLL and resistive interpolator in the center
- Hit signal input on the left, output on the right
- Hit decoding fully synchronous, custom layout with standard cells
  - Decoding of one hit per 0.8ns
- 1.6mm x 2.0mm







#### Sources of Measurement Deviation

- Bin size 3ps -> 880fs RMS
- PLL: 350fs RMS phase Jitter
- DLL: 400fs RMS phase Jitter, INL/DNL can be adjusted
- Clock Distribution: <500fs jitter</li>
- Capture FFs: <1ps mismatch (DNL)</li>
- Hit receivers: <1ps jitter</li>
- ~1.75ps RMS total deviation
- External sources: input clock jitter, signal preprocessing













## Constraints on Hit Signals

- One edge per 1.28GHz-Cycle (~0.8ns)
- Internal analog glitch filter after hit receiver
  - Filter time can be programmed to ensure 0.8ns
  - Or up to 10ns for filtering e.g. oscillations
- Small derandomizer (4 hits) for each channel running @1.28GHz
- Sustainable rate to channel buffer 320MHz, trigger matching running @320MHz for each channel separate



# Logic Features

- Triggered with configurable latency and length, overlap possible, or untriggered
- Naturally overflowing counter used for calculating trigger matches, TOT etc.
- Counter with arbitrary overflow and reset for machine cycle, can be inserted in event headers when triggered







#### Electrical Interfaces

- Hits: Differential (LVDS "compatible", common mode from 0.2V to 1.2V)
  - Highest speed (resolution) @ ~800mV common mode
- Time reference: 40MHz differential
  - Low jitter reference critical for high time resolution
- Trigger/Event-Rst/BX-Rst/Reset: Sync Yes/No
- Control/monitoring: I<sup>2</sup>C at CMOS 1.2V-levels
- Readout: 4 readout ports of 8 differential signals
  - Common mode 0.6V, programmable current 1-5mA
  - Compatible with LpGBT and FPGAs
- Packaging: 400 BGA (1mm pitch)





#### Config / Control / Status Interface

- I<sup>2</sup>C Interface, up to 1MBit/s
- 1.2V CMOS Levels
- 348 Bytes configuration / control
  - Additional 322 bytes delay adjust
- 300 Bytes status



#### Readout

- 1 or 4 differential readout ports with 8 bits
  - 40 320MHz
  - Bandwidth:
    - Min 320Mbits/s (~0.15 Mhits/s per channel)
    - Max 10Gbits/s (~4 Mhits/s per channel)
- Readout data: 32 bit words
  - TDC data, headers, trailers etc.



#### 32 Bit Frames

#### **TDC** measurement

Type (1)=0 TDC data (31)

Event headers (up to two)

| 1 ype (4)=100:   Tield A (13)   Tield B (13) | Type (4)=100? | Field A (13) | Field B (13) | 00 |
|----------------------------------------------|---------------|--------------|--------------|----|
|----------------------------------------------|---------------|--------------|--------------|----|

Possible fields: Event ID, Bx ID, Natural ID

**Event trailers** 

| Type (4)=1010 | Event ID (13) | Hit Count (13) | 00 |
|---------------|---------------|----------------|----|
|---------------|---------------|----------------|----|

Channel group separator (for single readout port)

| Type (4)=1111 Chan-Grp-ID (2) | 0x0000000 (26) |
|-------------------------------|----------------|
|-------------------------------|----------------|

#### Idle frame

| Type (4)=1101 | 0x0D0D0D0 (28) |
|---------------|----------------|



#### Absolute TDC data

FULL TDC data, **DEFAULT FORMAT** 

### Relative to Trigger

Triggered with relative time: Same as absolute

Type (1) Channel (4) Edge (1) Coarse cnt (13) Med. cnt (5) DLL int (6) Res int (2)

B: Triggered with relative leading and TOT: Same as absolute Lead. + TOT

| Type (1) Channel (4) | Leading (16) | TOT(11) |
|----------------------|--------------|---------|
| Type (1) Channel (4) | Leading (19) | TOT(8)  |



# Leading + TOT

Packet Type: 1bit

Channel ID: 4 bits, for single port readout +2 bit group separator

• Leading: 16/19 bits

Large dynamic range

16bit 3ps resolution: 200ns

19bit 3ps resolution: 1600ns

Programmable part of full 25bits leading TDC

• (Relative to trigger to be useable)

TOT (Relative to leading): 11/8 bits

• Short dynamic range:

8bit 3ps resolution: 780ps

• 11bit 3ps resolution: 6.1ns

Programmable part of full 25bits TOT difference

TOT assumed to be used for offline time-walk correction of leading.

Alternative: Readout of Individual Leading and Trailing edges with full range/resolution

2x readout bandwidth

| Type (1) | Channel (4) | Leading (16) | TOT(11) |
|----------|-------------|--------------|---------|
| Type (1) | Channel (4) | Leading (19) | TOT(8)  |



# **Estimated Power Consumption**

Highly dependent on hit rate, values based on 1 MHz per channel

High resolution, 64 channels: 1300mW

High resolution, 32 channels: 900mW

Low Resolution, 64 channels: 850mW

Low Resolution, 32 channels: 550mW

