Computing and Informatics, Vol. 33, 2014, 1116-1138

# CURRENT SENSING COMPLETION DETECTION IN SINGLE-RAIL ASYNCHRONOUS SYSTEMS

Lukáš NAGY, Viera STOPJAKOVÁ, Juraj BRENKUŠ

Institute of Electronics and Photonics Slovak University of Technology Ilkovičova 3 812 19 Bratislava, Slovakia e-mail: {lukas.nagy, viera.stopjakova, juraj.brenkus}@stuba.sk

Abstract. In this article, an alternative approach to detecting the computation completion of combinatorial blocks in asynchronous digital systems is presented. The proposed methodology is based on well-known phenomenon that occurs in digital systems fabricated in CMOS technology. Such logic circuits exhibit significantly higher current consumption during the signal transitions than in the idle state. Duration of these current peaks correlates very well with the actual computation time of the combinatorial block. Hence, this fact can be exploited for separation of the computation activity from static state. The paper presents fundamental background of addressed alternative completion detection and its implementation in *single-rail* encoded asynchronous systems, the proposed current sensing circuitry, achieved simulation results as well as the comparison to the state-of-the-art methods of completion detection. The presented method promises the enhancement of the performance of an asynchronous circuit, and under certain circumstances it also reduces the silicon area requirements of the completion detection block.

**Keywords:** Asynchronous systems, nano-scale CMOS technology, completion detection, current-consumption sensing

Mathematics Subject Classification 2010: 94-C05

### **1 INTRODUCTION**

Vast majority of digital systems is nowadays designed employing synchronous design methodology. However, despite its simplicity and straightforwardness, it faces a whole bag of problems in terms of fabrication issues and methodology shortcomings. The most critical part of the synchronous system is its *clock tree* used for distribution of the controlling clock signal along the whole chip area [1, 2]. With increasing scale of integration and system complexity, the clock tree becomes very complex as well. Furthermore, modern synchronous systems also feature methodologies for reduction of the power consumption (e.g. clock-gating), which also contribute to the clock tree related issues. In addition, emerging CMOS technologies are much more prone to the process variations that force design engineers to introduce substantial timing margin into developed systems [3, 4]. Synchronous circuits fabricated in advanced technologies also face several other problems, however, pointing out all those aspects is not the goal of this paper. Let us just mention the most common problems with the clock trees such as inflexibility in various environments, clock skew management, high-speed clock distribution issues, etc. [5]. There has also been rising popularity of "hybrid" globally-asynchronous-locally-synchronous (GALS) systems [6]. This kind of system is meant to fill the gap in the transition from purely synchronous to fully asynchronous systems. In addition, they can be used in applications, where the fully asynchronous circuit would become too complex. Hence, several parts of the overall system will be designed as a synchronous block inside the asynchronous environment [7].

Taking into account the mentioned drawbacks of synchronous design methodology, one can state that the asynchronous approach represents a promising alternative to the currently dominating synchronous design techniques. Asynchronous systems exhibit numerous advantages over their synchronous counterparts. The best known ones are for instance higher performance, lower power consumption based on data-driven operation, lower EMI (Electromagnetic Interference) radiation, complete avoidance of the issues related to the clock tree and clock distribution, etc. [8]. On the other hand, drawbacks include larger silicon area overhead and therefore increased costs, widespread unfamiliarity, immature test methods and lack of commercial EDA (Electronic Design Automation) tools [9] or academical kits [10, 11].

#### **2 STATE-OF-THE-ART IN COMPLETION DETECTION**

In asynchronous circuits, a *distributed control* is employed instead of global synchronization. Controlling part of the system follows a certain communication protocol called handshake or handshaking protocol [12]. Several types of handshake protocols have been developed, each having particular advantages and drawbacks. In this paper, we focus on the most frequently applied one, namely, 4-phase single-rail protocol (sometimes called return-to-zero) [13]. This controlling protocol uses two main signals – REQ and ACK for denoting requested and acknowledged operations, respectively. Since there is no specifically defined timing window when data token has to be executed or kept still, generating of the handshake signals strongly depends on so-called *completion detection*, which denotes when the combinatorial part has finished executing the data token, and it can be stored in a memory element.

Figure 1 depicts the state-of-the-art single-rail asynchronous pipeline along with the controlling part [12]. The blocks labelled as C1 and C2 are purely combinatorial logic parts, segments L1, L2 and L3 are the memory elements – latches. Finally, the gates labelled as C are Muller C-elements [14]. The actual completion detection in single-rail asynchronous systems is represented by a delay element that postpones REQ signal for the next peer by specific time, which is related to the computation time of the combinatorial block. The value of time delay is determined by "worst-case" scenario approach. This means that a designer considers the worst fabrication process corner, the lowest power supply voltage, the highest ambient temperature (PVT corner) and the data input vector that requires highest computation time. Since such delay cells are commonly available in modern process design kits (PDKs), this approach is very simple to design and implement but on the other hand, it inserts substantial timing margin into the system, which slows down the data execution and lowers the computation performance. This disadvantage can be resolved by multiplexing several delay elements with various values of delay time in order to fine-tune the computation performance or to introduce so-called "speculative" completion detection [12]. However, this step would significantly increase the area overhead and therefore also increase the unit cost. Several other completion detection methods have also been developed [15, 16]. Nevertheless, we would like to compare the proposed methodology to the most frequently employed completion detection method in the static logic implementations.

The above-mentioned shortcomings of recent approaches have driven researchers to think of new possibilities for detection of completed computation activity. One of several alternative options is the so-called *current sensing completion detection* (CSCD), which represents a promising technique that eliminates some of the previously stated drawbacks of existing methods.



Figure 1. State-of-the-art completion detection in single-rail asynchronous pipeline system

### **3 CURRENT SENSING COMPLETION DETECTION**

The original article mentioning the idea of CSCD was published in 1990 [17]. It is based on very well-known phenomenon that occurs in digital circuits fabricated in CMOS technologies. When input data vector changes its logic values, the combinatorial block consumes significantly higher amount of the power supply current. The shape and duration of the transient current peaks correlate very well with the actual computation time of the combinatorial logic. Thus, by appropriately designed current sensing circuitry, one can distinguish between the idle state and computing activity of the system.

Figure 2 depicts the general block diagram of CSCD circuitry. Let us describe the functionality of the proposed current sensing circuit. When the memory element is set to be transparent, the data is let to the input of the combinatorial block. Pure combinatorial logic executes the data and therefore, generates the power supply current peak that will be detected by the current sensor (CS). This block will rise its output during the duration of the power supply current peak. The block named "minimum delay generator" (MDG) is responsible for firing-off a single pulse that compensates the input (sensing) delay of the current sensor, and acts as a completion detection signal in case the current peak does not exceed the current threshold value (the sensor output is kept in low logic state). The signals from the current sensor and MDG block are synchronized by means of NOR gate and control signal "Done" for the handshake block is generated by the controlling part of the asynchronous system.



Figure 2. General block diagram of CSCD implementation in single-rail pipeline system

The first practical concept of the current sensing circuitry was published in [18] few years later. The article discusses main features of the presented methodology employed in several data encodings as well as several current sensor circuit topolo-

gies. Significant portion of research has been done in [19, 20, 21, 22, 23, 24] and in others. However, sensing circuitries published in the above-mentioned articles have been developed in either rather old CMOS or BiCMOS technologies that work with high supply voltage and/or they employ bipolar transistors. Usage of such devices represents an elegant solution for processing or amplifying the current peaks produced by combinatorial block. However, these devices are no longer available in advanced CMOS technologies. Another important drawback of presented sensors is the usage of resistors. Even though these devices are available in some nanoscale technologies, their implementation would require unacceptable amount of silicon area since the sheet resistance of the materials used is very low. The presented sensors also do not reflect low-power, low-voltage requirements for recent integrated digital systems. One of the main advantages of asynchronous circuits is their ability to adapt to the voltage fluctuations. Therefore, the current sensor has to be able to withstand these fluctuations as well. However, the current sensor topologies published in the past do not allow the operation under a low value of the supply voltage (below  $1.5 \,\mathrm{V}$ ).

In [25], the authors present an interesting work regarding CSCD implementation in low-power single-rail asynchronous system designed in 90 nm CMOS technology. However, the achieved results and the system performance was compared to synchronous version of the same circuit and therefore, unfortunately, the comparison of the CSCD circuitry to the conventional version has not been discussed. Nevertheless, the design of the sensor in nanometer technology represents a comprehensive work. On the other hand, it is unnecessarily complex and the sensing element (a MOS transistor) requires a large W/L (channel width/channel length) ratio in order to achieve a small voltage drop across it.

### 4 IMPLEMENTATION OF CSCD IN SINGLE-RAIL SYSTEMS

Figure 3 depicts the proposed implementation of CSCD methodology in a singlerail asynchronous pipeline system. As one can observe, the fixed constant delay element has been replaced by a circuitry generating the actual completion detection signal. The current sensor rises its output voltage to high value when combinatorial block C1 is executing the input data. This signal is synchronized with the pulse issued by MDG block by means of NOR gate. The feedback taken from its output terminal and controlling signal from the memory element creates the enabling signal that turns the current sensor off when its operation is not required and therefore minimizes the current consumption.

Figure 4 depicts the proposed design of MDG block at the gate level. The chain of even number of inverters delays the input signal from reaching the second input of NOR gate and thus the output of NOR gate is flipped for a time specified by the propagation delay introduced by the inverter chain. The propagation delay of the inverters has to be set very carefully, since the input delay of the current sensor has to be properly compensated within the whole range of specifications. This means



Figure 3. Implementation of CSCD methodology in single-rail pipeline

that the delay of the inverter chain has to cover the whole range of sensing delay values within the PVT variations in the design specifications.



Figure 4. Schematic diagram of MDG circuit

#### **5 PROPOSED CURRENT SENSOR**

In this section, we would like to describe the most important part of the proposed methodology – the current sensing block. The first subsection deals with the design of sensing circuitry at the transistor level as well as the description of its operation. The second part of this section evaluates the main parameters of the current sensor, and proposes one of the possible layout representations of the sensor.

#### 5.1 Current Sensor Design

The crucial part of the whole methodology is the current sensor which is responsible for correct and reliable detection of power supply current peaks produced during computation activity. Previously mentioned current sensor topologies are not suitable and/or not applicable in deep sub-micron technologies for already explained reasons.

Figure 5 depicts the transistor level schematics of the proposed sensor along with the dimensions of used active devices. It has been designed in 90 nm general purpose CMOS technology. The most important part of the sensor is so-called *bulk-driven* current mirror pair created by devices M1 and M2 [26, 27]. The mirroring ratio has been set to 4:1, so that the layout of transistor M2 could be done with designfor-manufacturing techniques taken into account. Use of the bulk-driven current mirror introduces a very important advantage of lowering the voltage drop across the sensing element. With bulk terminal connected to the drain terminal, the inner diode is biased slightly forward, which significantly lowers the requirements for the input voltage. In our case, the input voltage is a voltage drop across the sensing element created by the current drawn by the combinatorial logic (CL).



Figure 5. Transistor level scheme of the proposed current sensor

Devices M2-M5 form basic configuration of a current reference circuit. Static bias current flowing through its branches is modulated by the current drawn by the combinatorial logic, which causes the voltage levels changing over time as well. This transient voltage is then amplified by means of a push-pull inverter amplifier formed by devices M7 and M8. Two logic inverters formed by devices M9-M12 generate their outputs inverted to each other. Thus, both types of logic signal are available and can be used, each when necessary in the global system. The output inverters also drive the load capacitance at sufficient slew rate parameter. Transistor M6 is employed as an ON-OFF switch controlled by a control logic signal. When the control signal is in logic zero, the current sensor works as described above. However, if the control signal is set to logic one, transistor M6 is switched off, and the current reference is therefore disabled. The output logic states are fixed by the pull-down transistor M3, and the quiescent current consumption is limited down by the leakage level.

#### 5.2 Sensor Parameters

Since high-performance digital systems are designed in advanced CMOS technologies, we have developed the current sensor in 90 nm CMOS technology. The digital circuits are not that vulnerable to process variations as their analog counterparts. Since the current sensor is essentially an analog circuit, extra attention has to be paid to the parameters' stability over the process variations, ambient temperature conditions as well as the power supply voltage level. The most significant parameters of the sensor have been verified as follows. Figure 6 depicts the corner analysis of the proposed current sensor. The current peak produced by the combinatorial logic was simulated by a current pulse with duration of 1 ns. The transient simulation has been performed for every process corner available as well as the whole ambient temperature range.



Figure 6. Corner analysis of the proposed current sensor

As one can observe, the falling edge response is spread with the maximum span of 500 ps. The rising edge has spread as well within the maximum span of about 1 ns. These numbers also define the highest possible timing variations within the die area that should be taken into account in front-end routine in the defined design specifications.

After the overall stability evaluation, we have also performed Monte Carlo simulations in order to check the statistical distribution of the main timing parameters (recovery delay and sensing delay of the sensor). Figure 7 shows results of Monte Carlo analysis for 1000 simulations performed taking into account parasitic well-proximity effect and gate resistance as well. Both histograms exhibit normal distribution which can be considered as sign of the sensor functionality and stable parameters. The standard deviation to the mean value reached for the recovery delay and sensing delay is 6% and 14.9%, respectively. These values define the lowest possible postponement between the two following data token executions.

Another important investigation of the current sensor was focused on its current threshold value. Thus, again 1000 Monte Carlo simulations have been performed



Figure 7. Monte Carlo analysis of the timing parameters

considering parasitic effects, and the achieved results are depicted in Figure 8. The threshold current histogram exhibits normal distribution and the standard deviation is less than 10% which again confirms that the examined parameter would be stable enough within a production set of the chips.

As for low-voltage applications, the dependence of the current sensor performance on the supply voltage is an important feature to be evaluated. Since asynchronous systems exhibit excellent adaptability to the supply voltage fluctuations, the current sensor has to be able to withstand the voltage level variations as well. Figure 9 shows the dependency of the most important parameters on the supply voltage value. One can observe that the threshold current value remains quite constant even for the power supply voltage value of 600 mV. The input delay as well as output delay of the sensor are growing in exponential rate. Let us mention the fact that the sensor is capable of proper functioning even with lower power supply



Figure 8. Monte Carlo analysis of the threshold value

voltage, however the timing parameters would rise to unacceptable values. This drawback can be resolved by increasing the biasing current flowing through the current reference and the push-pull amplifier. The ability to work properly with down to 50% of the nominal power supply voltage is a rather promising result.

An interesting fact has been discovered through analysis of timing parameters performed over the voltage drop across the sensing element. Figure 10 depicts results of this analysis, where a hyperbolic dependence of the sensing (input) delay can be observed. On the other hand, the curve displaying the recovery (output) delay shows a logarithmic shape, which was not expected at all. The point where both curves intersect indicates the best trade-off between timing parameters of the sensor and the voltage drop value, since its value is still acceptable and the delays are symmetrical.

#### 5.3 Layout of the Sensor

Most digital systems are currently designed using standard cell approach, where all logic gates composing the systems have their layout representations available for layout designers in the respective PDK library. Bearing that in mind, we designed the current sensor layout as a standard cell of double height due to anti-latchup rules incorporated within the PDK. However, this step does not introduce any com-



Figure 9. Low-power analysis of the proposed current sensor

plications since PDK environments offer special double-height cells for purposes of switching the supply voltage level for a certain block. Figure 11 shows the layout implementation of the proposed current sensor. For the sake of lucidity, the second and the third metalization levels are not shown. The layout has been created with the design-for-manufacturing techniques taken into account. Those include the inter-digitizing and placing the dummy structures in the bulk-driven current mirror, putting all NMOS transistors in one line so that the well-proximity effect is equal for each instance, and others, since the intra-die and inter-die variations in deep sub-micron technologies can reach up to 20% [28].

CAD (Computer Aided Design) tools used for automated Place & Route routine require several library files available in the PDK, describing the properties of each cell. One of these library files is the so-called LEF file which bares the physical information about the cell as well as the standard cell information. Therefore, such file describing the proposed current sensor standard cell along with the routing restrictions for the first four metal layers was created, because the current sensor is an analog circuit and routing placed close to the transistors could affect their properties. Another important file is LIB file which contains the timing parameters, power consumption parameters, area requirements for the synthesis tool, etc. This file was created for every PVT corner within the PDK. The values of required



Figure 10. Timing parameters over the voltage drop across the sensor

parameters were extracted from analog simulations performed during the design period. The layout design tools are capable of separating a certain (sub)block of the design and lay it out isolated, with its own power lines. This approach resembles the procedure with multiple  $V_{DD}$  levels in the design. The combinatorial part would be laid out this way while the routing metalization levels of the power lines can be defined easily as well. The ground rails are common for whole design and are connected on the first metal layer, the  $V_{DD}$  for the current sensor is routed by means of the second metalization layer and the virtual  $V_{DD}$  of the combinatorial part (again first level of metalization) is isolated from the rest of the system.

### 6 ASYNCHRONOUS SYSTEM DESIGN USING CSCD

The implementation of the proposed CSCD methodology is meant to be simple and it should not affect the original top-down design flow. In the next subsection, we propose a design procedure including the automated insertion of the current sensor standard cell into the synthesized netlist. The second subsection proposes an approach to the test procedure for fabricated chips.



Figure 11. Layout of the current sensor

## 6.1 Design Methodology with CSCD

Virtually, every digital system is today designed using the hardware description languages such as Verilog, VHDL or currently very popular Balsa which is directly oriented for asynchronous designs [29]. The period of the design process is closely related to the functional simulations, which are performed in order to verify the correct function and behavior of the system. Unfortunately, these simulations are not that straightforwardly available with an analog circuit possibly embedded in the digital system. However, the designer should be able to design an asynchronous system using CSCD methodology. Since the computation time of the combinatorial logic block is determined by the synthesis tool, this information is available. Hence, the designer can prepare the testbench file with the signal from the current sensor modeled accordingly to the information from the synthesis tools. This way, the asynchronous circuits adopting the CSCD methodology can be successfully simulated in recent digital simulators. After completion of all verifications, the synthesis for specific technology is carried out. This step is maintained and does not require any modification to the conventional approach. However, the final netlist produced by a synthesis tool does not contain the definition of the proposed current sensor. In order to fix this problem, we have written a shell script that follows the proposed flow chart depicted in Figure 12. As one can observe, it is rather simple to

1128

implement the script in any programming language, as it is a text file (Verilog gate netlist) editing software.



Figure 12. Flowchart of script adopting CSCD

The user enters the name of the combinatorial block, then the script automatically renames the original module, creates a new module with the name of the original one, inserts the line with the current sensor cell, and maintains the internal connections. This step formally finishes the front-end part of the design, and designers can proceed to the back-end part.

The back-end part is mostly represented by Place & Route routine. The library files created for CAD tools are imported to the environment and engineer can design

the chip layout in usual way. The only change to the conventional design procedure is that the power supply rail for each combinatorial block has to be isolated from the rest of the circuit and the main  $V_{DD}$  is connected by means of the second metal layer.

### 6.2 Design-for-Test of the Current Sensor

The importance of test and testability of fabricated digital circuits and systems rises with their growing complexity, and methodologies for testing the asynchronous systems have been developed so far [30, 31, 32, 33]. In order to be able to switch an asynchronous system using CSCD circuitry into test mode, additional logic has to be added into the design. Figure 13 depicts the schematic diagram of one stage of the pipeline system incorporating a DfT technique proposed for CSCD-based systems. Multiplexer MX is controlled by a single control signal *Test* that switches its output to the normal function or a permanent logic zero. This keeps the current sensor enabled, and the combinatorial block can absorb the data. It is expected that this routine will occur during SCAN test. The output of the sensor is demultiplexed either to the system or to the "test result capturing" part, which could be read-out in order to evaluate the result of the current sensor test.



Figure 13. CSCD-based asynchronous system with proposed DfT technique

The proposed configuration introduces a slight increase of propagation delay due to the presence of demultiplexer DX at the current sensor output. On the other hand, it can reconstruct the deformed waveforms in case the current sensor generates such a signal.

#### 7 ACHIEVED RESULTS AND CONTRIBUTIONS

The main motivation for research in this area of asynchronous circuits design was replacement of the fixed worst-case delay by circuitry of the actual completion detector in order to speed-up the overall computation process. Figure 14 depicts the performance analysis of the proposed methodology. The very bottom line represents the duration of the pulse issued by MDG block. The top line is the value of delay introduced by the worst-case delay element. The curves between these two lines depict the actual computation time detected by the proposed current sensor. The analysis has been carried out by means of analog transient simulation, and an 8-bit parallel multiplier was used as the combinatorial block in two stage pipeline. To bring the simulations closer to reality, the delay of interconnections has been modeled by a simple RC element. The parameter values of the elements were determined by the random length of interconnections, and the value of sheet resistance was randomly chosen within the process corner specifications. This approach models the uneven distribution of the standard cells as well as the local fabrication process variations. The curves start roughly at 62 % of all possible transitions at the input data signals (transition factor). Before this point, the actual completion detection is substituted by a pulse generated by MDG circuit.



Figure 14. Performance analysis of the proposed methodology

As one can observe, the upper curve at its peak is still lower than the worst-case delay despite the slight slow-down due to the voltage drop across the sensor and the propagation delay of the additional logic gates. The actual computation time of the system is even slightly lower than the presented values. Nevertheless, the speed-up in the presented experiment is at least 14.52%, but in most cases it is even higher.

The theory of the addressed completion detection methodology promises an important attribute which actually increases the reliability of the controlling part of the system. The real combinatorial logic may contain design imperfections that can produce so-called *logic hazards*, which cause the logic states at the output of the combinatorial block to flip randomly during the data execution until the last one settles to the final state. During these random logic states the current draw contributes to the overall consumption. This phenomenon actually prevents the output of the current sensor from flipping its logic state. The paradox is that logic hazards actually prevent the detecting circuitry from a false completion detection. Figure 15 depicts results of analog transient simulation, where the curves show each bit signal at the output of the 8-bit parallel multiplier. One can observe the voltage drop footprint on logic one level, and the signals randomly change during the computation. The completion detection signal rises right after the last bit settles down, with a certain amount of propagation delay. The situation when the computation would consume a current not exceeding the threshold level, and at the same time, the logic is complex enough that CSCD is beneficial to be implemented, is rather unlikely.



Figure 15. Analysis of logic hazards

One of the most important parameters of integrated circuits is the chip die size. Therefore, the silicon area of hardware needed for completion detection is also very important. Figure 16 shows another contribution of the proposed CSCD method used in asynchronous systems. It shows the dependence of the silicon area of the completion detection block on the computation time. In conventional methods using the worst-case delay element, this dependence grows linearly (blue curve). The slope is determined by the silicon area occupied by a standard cell of the delay element. On the other hand, the constant line (red curve) displays the situation for CSCD methodology, where the amount of silicon area consumed by the current sensor, MDG block, AND and NOR gates is constant, regardless of the computation time or the system complexity. The point where these two curves intersect determines when it is beneficial to employ the proposed alternative approach to completion detection rather than the conventional one.



Figure 16. Silicon area vs computation time – comparison of methods

We also intend to write a program that would compare the silicon area required by the CSCD methodology and the actual worst-case delay element. Such information can be obtained from the area reports provided by synthesis tools. This way, the designer would be aware of the possibility of employing the alternative detection methodology and of the possible silicon area savings.

### **8 FUTURE WORK**

The presented work is a part of ongoing research and naturally, we intend to continue our research and focus the work on implementation of the developed CSCD circuitry as well as the methodology into an experimental chip, and evaluate the simulation results by measurement of the prototyped chips. As mentioned in the previous section, the automated comparison of the area consumption for a system being designed will also be one of the future research goals. The most challenging task from programming point of view is development of an algorithm that would compute the mirroring ratio of the input bulk-driven current mirror, especially in case of very complex combinatorial logic. In this case, some dimensions of the transistors will have to be adjusted in order to ensure the proper functionality of the sensor. Our idea is to use Sah's equations [34] and process parameters of the chosen CMOS technology, and calculate the device dimensions. On the other hand, it might be possible that several pre-designed cells with different threshold and the maximum current peak levels should cover general requirements for arbitrary asynchronous design. The last but not least future plan includes the possibility of employing the current sensor for  $I_{DDQ}$  testing purposes [35]. However, this would require additional analog circuitry that would prevent the latchup effect in case the combinatorial block contains a catastrophic defect and the current draw would grow to high values.

#### 9 CONCLUSIONS

It has been proven that the proposed completion detection methodology is fully implementable in asynchronous pipeline systems designed in very deep sub-micron CMOS technology while using standard CAD tools in the system design. The topdown design flow remains maintained except for a single step that has also been fully automated by a created shell script that inserts the respective analog standard cell into Verilog gate level netlist. One of the most important advantages of CSCD is its complete independence on the number of variables the system uses or on the complexity and computation time of the combinatorial block. The simulations have confirmed that it increases the performance in single-rail encoded asynchronous systems; however, speed enhancement strongly depends on the topology of combinatorial logic. Another important attribute is its insensitivity to logic hazards occurring during the computation activity of the combinatorial part. The addressed methodology also reduces the silicon area requirements for the completion detection circuitry under certain conditions. Decision whether these conditions are met or not will be fully automated as well. Analysis of the current sensor has proven that its topology is capable of proper functionality under lowered power supply voltage. This makes it suitable for low-power, low-voltage applications which is also an important ability for asynchronous systems employing alternative completion detection approach. A DfT strategy for post-fabrication test of the current sensor circuitry has also been introduced.

We believe that this scientific work in the area of asynchronous systems represents a fair portion of research, which brought several valuable results and contributions.

#### Acknowledgement

This work was supported in part by the Ministry of Education, Science, Research and Sport of the Slovak Republic under grants VEGA 1/0823/13 and VEGA 1/1008/12, and by the EC under ENIAC-JU project E2SG (Agr. No. 296131).

#### REFERENCES

- CHAPPELL, B.: The Fine Art of IC Design. Spectrum IEEE, Vol. 36, 1999, No. 7, pp. 30–34.
- [2] FRIEDMAN, E. G.: Clock Distribution Networks in Synchronous Digital Integrated Circuits. Proceedings of the IEEE, Vol. 89, 2001, No. 5, pp. 665–692.
- [3] SAPATNEKAR, S.: Overcoming Variations in Nanometer-Scale Technologies. IEEE Journal on Emerging and Selected Topics in Circuits and Systems Vol. 1, 2011, pp. 5–18.
- [4] PAPANIKOLAOU, A.—WANG, H.—MIRANDA, M.—CATTHOOR, F.: Reliability Issues in Deep Sub-Micron Technologies: Time-Dependent Variability and Its Impact on Embedded System Design. 13<sup>th</sup> IEEE International in On-Line Testing Symposium, July 2007, pp. 121.
- [5] SINGH, M.—NOWICK, S. M.: The Design of High-Performance Dynamic Asynchronous Pipelines: Lookahead Style. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 15, 2007, No. 11, pp. 1256–1269.
- [6] CHONG, K. S.—CHANG, K. L.—GWEE, B. H.—CHANG, J. S.: Synchronous-Logic and Globally-Asynchronous-Locally-Synchronous (GALS) Acoustic Digital Signal Processors. IEEE Journal of Solid-State Circuits, 2012, pp. 769–780.
- [7] FAN, X.—KRSTIC, M.—GRASS, E.—SANDERS, B.—HEER, C.: Exploring Pausible Clocking Based GALS Design for 40-nm System Integration. Design, Automation Test in Europe Conference Exhibition DATE, 2012, pp. 1118–1121.
- [8] CORTADELLA, J.—KISHINEVSKY, M.—KONDRATYEV, A.—LAVANGO, L.—YAKO-VLEV, A.: Logic Synthesis of Asynchronous Controllers and Interfaces. Springer, 2002, ISBN 3-540-43152-7.
- [9] YAKOVLEV, A-SPARSO, J.—THONNART, Y.—VIVET, P.: Asynchronous Logic and GALS Design: Principles and State-of-the-Art. Proceedings of the Conference on Design, Automation and Test in Europe (DATE '10), 2010.
- [10] MACKO, D.—JELEMENSKÁ, K.: VHDL Visualizer: HDL Model Visualization with Simulation-Based Verification. Proceedings of the IEEE International Symposium on Design and Diagnostics of Electronic Circuits and Systems, 2012, pp. 199–200.
- [11] JELEMENSKÁ, K.—ČIČÁK, P.—NOSÁĽ, M.: Visualization of Verilog Digital Systems Models. Emerging Trends in Computing, Informatics, Systems Sciences, and Engineering, Lecture Notes in Electrical Engineering, Vol. 151, 2013, pp. 805–818.
- [12] SPARSO, J.: Asynchronous Circuit Design. Technical University of Denmark, 2006.

- [13] BIRTWISTLE, G.—STEVENS, K. S.: The Family of 4-Phase Latch Protocols: Asynchronous Circuits and Systems. ASYNC '08, 14<sup>th</sup> IEEE International Symposium on Asynchronous Circuits and Systems, 2008, pp. 71–82.
- [14] RAJI, M.—GHAVAMI, B.—ZARANDI, H.-R.—PEDRAM, H.: Assessment of Nanoscale Muller C-Elements Under Variability Based on a New Fault Model. 16<sup>th</sup> CSI International Symposium on Computer Architecture and Digital Systems (CADS), 2012, pp. 121–126.
- [15] CONNELL, C. L.—BALSARA, P. T.: A Novel Single-Rail Variable Encoded Completion Detection Scheme for Self-Timed Circuit Design Using Ternary Multiple Valued Logic. Proceedings of the IEEE 2<sup>nd</sup> Low Power/Low Voltage Mixed-Signal Circuits and Systems, 2001, pp. 7–10.
- [16] BARTLETT, V. A.—GRASS, E.: Comparison of Completion-Detection Techniques for Asynchronous Circuits. University of Westminster, 1996.
- [17] IZOSIMOV, A.—TSYLYOV, V. V.: Physical Approach to CMOS Modules Self-Timing. Electronics Letters, 1990.
- [18] DEAN, M. E.—DILL, D. L.—HOROWITZ, M.: Self-Timed Logic Using Current-Sensing Completion Detection (CSCD). Journal of VLSI Signal Processing Systems, Vol. 7, 1994, No. 1-2, pp. 7–16.
- [19] LAMPINEN H.—VAINIO, O.: Current-Sensing Completion Detection Method for Standard Cell Based Digital System Design. Proceedings of the IEEE International Symposium on Circuits and Systems, 2002, pp. 117–120.
- [20] LAMPINEN, H.—VAINIO, O.: Dynamically Biased Current Sensor for Current-Sensing Completion Detection. Electronic Letters, Vol. 37, 2001, No. 7, pp. 408–409.
- [21] LAMPINEN, H.—PERALA, P.—VAINIO, O.: Design of a Self-Timed Asynchronous Parallel for Filter Using CSCD. Proceedings of the 2003 International Symposium on Circuits and Systems, Vol. 5, May 2003, pp. 165–168.
- [22] LAMPINEN, H.—PERALA, P.—VAINIO, O.: Implementation of a Self-Timed Asynchronous Parallel FIR Filter Using CSCD. Norchip Conference, November 2004, pp. 203–206.
- [23] GRASS, E.—BARTLETT, V.—KALE, I.: Completion-Detection Techniques for Asynchronous Circuits. Transactions on Information and Systems, 1997, pp. 344–350.
- [24] GRASS, E.—JONES, S.: Improved Current-Sensing Completion Detection (CSCD) Circuits. Handouts of the ACiD-WG Workshop on Design for Testability, 1994.
- [25] KAWOKGY, M.—SALAMA, C.: A Low-Power CSCD Asynchronous Viterbi Decoder for Wireless Applications. Low Power Electronics and Design (ISLPED), August 2007, pp. 363–366.
- [26] BLALOCK, B.—ALLEN, P.: A Low-Voltage, Bulk-Driven MOSFET Current Mirror for CMOS Technology. Proceedings of IEEE International Symposium on Circuits and Systems, Vol. 3, 1995, pp. 1972–1975.
- [27] AHLAD, K.—SHARMA, G. K.: Bulk Driven Circuits for Low Voltage Applications. Journal of Active and Passive Electronic Devices, Vol. 8, 2009, pp. 237–245.
- [28] MASUDA, M.—OHKAWA, S.—KUROKAWWA, A.—AOKI, M.: Challenge: Variability Characterization and Modeling for 65- to 90-nm Processes. Proceedings of IEEE Custom Integrated Circuits Conference (CICC), 2005, pp. 593–599.

- [29] Manchester University: The Balsa Asynchronous Synthesis System. 2010, http:// apt.cs.manchester.ac.uk/projects/tools/balsa/.
- [30] PETLIN, O.—FURBER, S.: Built-in Self-Testing of Micropipelines. Advanced Research in Asynchronous Circuits and Systems, April 2007, pp. 22–29.
- [31] SHI, F.—MAKRIS, Y.—NOWICK, S.—SINGH, M.: Test Generation for Ultra-High-Speed Asynchronous Pipelines. International Test Conference, November 2005, pp. 10–18.
- [32] DOBAI, R.—GRAMATOVA, E.: Deductive Fault Simulation Technique for Asynchronous Circuits. Computing and Informatics, Vol. 29, 2010, No. 6, pp. 1025–1043.
- [33] DOBAI, R.—BALAZ, M.: Compressed Skewed-Load Delay Test Generation Based on Evolution and Deterministiv Initialization of Populations. Computing and Informatics, Vol. 32, 2013, No. 2, pp. 251–272.
- [34] CAI, J.—SAH, C.-T.: Theory of Thermally Stimulated Charges in Metal-Oxide-Semiconductor Gate Oxide. Journal of Applied Physics, Vol. 83, 1998, pp. 851–857.
- [35] RAJSUMAN, R.: IDDQ Testing for CMOS VLSI. Artech House Publishers, October 1994, ISBN 0-89006-726-0.



Lukáš NAGY received his M.Sc. and Ph.D. degrees at Slovak University of Technology in 2008 and 2012 respectively. He currently works as an researcher at the Institute of Electronics and Photonics at the same university. His main research interests include the device modeling, analog and mixed-signal IC design, SoC design and asynchronous systems design.



Viera Stopjaková received the M. Sc. and the Ph. D. degrees in electronics from Slovak University of Technology in Bratislava, Slovakia, in 1992 and 1997, respectively. Currently, she is a full Professor at the Institute of Electronics and Photonics of the same university. She has been involved in several EU funded research projects under different funding schemes such as TEM-PUS, ESPRIT, Copernicus, FPs, ENIAC-JU, and also in national research grants. She has published over 100 papers in various journals and conference proceedings; and she is a coinventor of two US patents in the field of on-chip supply current

testing. Her main research interests include IC design, VLSI & SoC testing, on-chip testing, design and test of mixed-signal circuits, smart sensors, on-chip energy harvesting, biomedical monitoring and neural network implementation.



Juraj BRENKUŠ received the M. Sc. degree in electronics from Slovak University of Technology in Bratislava, Slovakia in 2000. Since March 2010 he has been a researcher at Institute of Electronics and Photonics of Slovak University of Technology. He is the author or co-author of more than 20 papers presented at international conferences. His main research interests are IC design and test, mixed-signal systems design, SoC design, on-chip testing and test development automation.