Publications

The push to meet growing user requirements and manufacturing challenges at lower technology nodes have motivated chip designers to adopt non-traditional design techniques. 2.5D/3DIC stacking has gained popularity in recent years since it enables chip manufacturers to integrate complex IPs to meet user demands without incurring design penalties. However, the non-traditional nature of the supply chain also means that additional challenges exist for verification and testing of the manufactured design, making the trust assurance of these designs an extremely challenging proposition. While there have been works focussing on securing 3DIC designs, very few address a completely untrusted supply chain. A robust security countermeasure must address the diverse trust requirements of the IPs in the design and the distributed supply chain requirements while ensuring that the functionality and performance overheads of the IC are not violated. We present TREEHOUSE, a trust assurance solution to counter piracy, reverse-engineering, and counterfeiting attacks. TREEHOUSE uses scan authentication to detect piracy and counterfeiting, scan-and functional-locking to prevent reverse-engineering. We evaluate the efficiency of our proposed scheme on an example 3DIC design. We show that TREEHOUSE incurs less than 1% area and power overheads while incurring less than 1% increase in overall gate count for each layer.

Hardware Watermarking is one of the popular countermeasures to prevent hardware counterfeiting. A robust watermark has to be invisible to the attacker, yet allow the verifier to access it easily. It should also be resistant to design transformations and should allow the defender to deploy at any stage in the design transformation process. To address this we propose SIGNED: A lightweight watermarking technique that relies on challenge-response generation. We evaluate SIGNED on ISCAS85, ITC99, and MIT-CEP benchmarks to show that SIGNED incurs less overheads and is resistant against removal and detection attacks.

The heterogeneous array of edge devices in an Internet of Things (IoT) infrastructure is increasingly vulnerable to physical in-field tampering attacks. These devices can significantly benefit from a difficult-to-clone and tamper-immune intrinsic identifier that can verify the authenticity or integrity of the physical components. In this article, we develop an intrinsic device identifier, RIHANN , that captures the state of the electronic hardware in an IoT device. This state can adequately reflect any physical tampering of the hardware components by transforming the intrinsic delay variations in the electronic components of an edge device into unique and robust signatures. Our proposed authentication approach utilizes the boundary scan architecture (BSA) in printed circuit boards (PCBs). BSA is a prevalent design for test (DFT) structure used in most PCBs in IoT edge devices. This technique supports an extensive array of heterogeneous devices and can seamlessly operate during the device’s runtime. We measure the boundary scan path delays using the parallel scan delay-measurement (PSDM) technique for commercially available ICs. We perform practical experiments on 20 devices, generate signatures, and evaluate their uniqueness, robustness, randomness, and resistance to aging. We also introduce a security protocol for the cloud server, owner/verifier, or other IoT devices connected to a network to verify their identity remotely. The policy prevents attacks from extracting the device’s secret keys using an efficient moving target defense mechanism that periodically updates and evolves the challenge–response database.

Cryptography hardware are highly vulnerable to a class of side-channel attacks known as Differential Fault Analysis (DFA). These attacks exploit fault induced errors to compromise secret keys from ciphers within a few seconds. A bias in the error probabilities strengthens the attack considerably. It abets in bypassing countermeasures and is also the basis of powerful attack variants like the Differential Fault Intensity Analysis (DFIA) and Statistical Ineffective Fault Analysis (SIFA). In this paper, we make two significant contributions. First, we identify the correlation between fault induced errors and gatelevel parameters like the threshold voltage, gate size, and ${V_{ ext{DD}}}$. We show how these parameters can influence the bias in the error probabilities. Then, we propose an algorithm, called Avatar, that carefully tunes gate-level parameters to strengthen the redundancy countermeasures against DFA, DFIA, and SIFA attacks with no additional logic needed. The central idea of Avatar is to reconfigure gates in the redundant circuits so that each circuit has a unique behavior to faults, making fault detection much more efficient. In AES for instance, fault attack resistance improves by 40% for DFA and DFIA, and 99% in the case of SIFA. Avatar incurs negligible area overheads and can be quickly adopted in any cipher design. It can be incorporated in commercial EDA flows and provides users with tunable knobs to trade-off performance and power consumption, for fault attack security.

HTTP/2 introduced multi-threaded server operation for performance improvement over HTTP/1.1. Recent works have discovered that multi-threaded operation results in multiplexed object transmission, that can also have an unanticipated positive effect on TLS/SSL privacy. In fact, these works go on to design privacy schemes that rely heavily on multiplexing to obfuscate the sizes of the objects based on which the attackers inferred sensitive information. Orthogonal to these works, we examine if the privacy offered by such schemes work in practice. In this work, we show that it is possible for a network adversary with modest capabilities to completely break the privacy offered by the schemes that leverage HTTP/2 multiplexing. Our adversary works based on the following intuition: restricting only one HTTP/2 object to be in the server queue at any point of time will eliminate multiplexing of that object and any privacy benefit thereof. In our scheme, we begin by studying if (1) packet delays, (2) network jitter, (3) bandwidth limitation, and (4) targeted packet drops have an impact on the number of HTTP/2 objects processed by the server at an instant of time. Based on these insights, we design our adversary that forces the server to serialize object transmissions, thereby completing the attack. Our adversary was able to break the privacy of a real-world HTTP/2 website 90% of the time, the code for which will be released. To the best of our knowledge, this is the first privacy attack on HTTP/2.

The challenges of custom integrated circuits (IC) design have made it prevalent to integrate commercial-off-the-shelf (COTS) components (micro-controllers, FPGAs, etc.) in today’s designs. While this approach eases the design challenges and improves productivity, it also gives rise to diverse security concerns. One such concern is the possibility of malicious hardware modifications, also called hardware Trojan attacks, by untrusted parties involved in the manufacturing or distribution of COTS devices. While Hardware Trojan detection is an active research topic in the field of microelectronics security, most methods assume the availability of a golden design/chip, which is impractical in the case of a COTS device. In this paper, we discuss challenges with detecting Trojan in COTS components, and introduce a Trojan detection method that applies unsupervised learning. We utilize side-channel power signatures to cluster and isolate chips with Trojans. The proposed method is suitable for trust verification of COTS components by an original equipment manufacturer (OEM) before system integration. In our method, the design house creates a set of security validation test vectors available to the tester (e.g., OEM). The OEM can also generate the test vectors using the block-level diagrams provided by the design house. Power signatures are generated for all the chips under test using these test vectors. We use the generated power signatures to apply feature extraction followed by clustering to group the chips into bins. Through this process, we divide the chips into distinct bins and distinguish the Trojan-inserted chips from the Trojan-free ones. The bin with golden chips can be identified by extensive testing and reverse engineering of one chip sampled from each bin. We utilize two clustering techniques K-Means, and Expectation-Maximization (EM) to perform a comparative analysis. Additionally, we perform extensive experiments to assert our method’s effectiveness and obtain over 98% accuracy on the clustering of FPGA chips with both combinational and sequential Trojans.

Counterfeit integrated circuits (ICs) have become a significant security concern in the semiconductor industry as a result of the increasingly complex and distributed nature of the supply chain. These counterfeit chips may result in performance degradation, profit reduction, and reputation risk for the manufacturer. Therefore, developing effective countermeasures against such malpractices is becoming severely crucial. Physical unclonable function (PUF)-based authentication methods have the potential to mitigate these challenges. However, PUF-based solutions are restrained by several factors, such as additional design efforts and significant area/power overhead, struggle to maintain and update challenge-response pairs (CRPs) database, and the vulnerability to machine learning (ML) attacks. In this article, we address these challenges by developing a novel database-free and enrolment-free hardware authentication approaches, i.e., a digital watermark metric for ICs. To enable efficient database-free hardware integrity verification without enrolment, first, we transform the intrinsic variations in circuit parameters, e.g., boundary scan chain (BSC) path delays in the joint test action group (JTAG) chain into robust digital signatures. Then, we perform statistical analysis on a small pilot unit of authentic chips to create a robust watermark for a complete batch of chips, which jointly captures the characteristics of the physical layout, the manufacturing process, and the foundry. The increasing complexity in the current state-of-the-art designs makes it extremely hard for an adversary to perfectly clone such statistical characterization of circuit parameters using counterfeit or compromised hardware. Besides, the proposed approach requires no additional design or hardware overhead in IC design since it utilizes an embedded structure, which inherently exists within the chips. It also obviates the design house from characterizing each manufactured chip instance, reducing overall testing cost. A path-delay measurement method at a high resolution based on clock phase sweep is introduced to measure the delay values effectively. The proposed intrinsic identifier-based authentication approach is validated by performing emulation on FPGAs and also by conducting physical measurements on custom-made printed circuit boards (PCBs). The reliability of the generated watermarks is evaluated with environmental temperature fluctuations and the aging effect.

The emergence of distributed manufacturing ecosystems for electronic hardware involving untrusted parties has given rise to diverse trust issues. In particular, IP piracy, overproduction, and hardware Trojan attacks pose significant threats to digital design manufacturers. Watermarking has been one of the solutions employed by the semiconductor industry to overcome many of the trust issues. However, current watermarking techniques have low coverage, incur hardware overheads, and are vulnerable to removal or tampering attacks. Additionally, these watermarks cannot detect Trojan implantation attacks where an adversary alters a design for malicious purposes. We address these issues in our framework called SIGNED: Secure Lightweight Watermarking Scheme for Digital Designs.SIGNED relies on a challenge-response protocol based interrogation scheme for generating the watermark. SIGNED identifies sensitive regions in the target netlist and samples them to form a compact signature that is representative of the functional and structural characteristics of a design. We show that this signature can be used to simultaneously verify, in a robust manner, the provenance of a design, as well as any malicious alterations to it at any stage during design process.

Physical Unclonable Functions (PUFs) are used for securing electronic designs across the implementation spectrum ranging from lightweight FPGA to server-class ASIC designs. However, current PUF implementations are vulnerable to model-building attacks; they often incur significant design overheads and are challenging to configure based on application-specific requirements. These factors limit their application, primarily in the case of the system on chip (SoC) designs used in diverse applications. In this work, we propose MeL-PUF - Memory-in-Logic PUF, a low-overhead, distributed, and synthesizable PUF that takes advantage of existing logic gates in a design and transforms them to create cross-coupled inverters (i.e. memory cells) controlled by a PUF control signal. The power-up states of these memory cells are used as the source of entropy in the proposed PUF architecture. These on-demand memory cells can be distributed across the combinational logic of various intellectual property (IP) blocks in a system on chip (SoC) design. They can also be synthesized with a standard logic synthesis tool to meet the area,power, or performance constraints of a design. By aggregating the power-up states from multiple such memory cells, we can create a PUF signature or digital fingerprint of varying size. We evaluate the MeL-PUF signature quality with both circuit-level simulations as well as with measurements in FPGA devices. We show that MeL-PUF provides high-quality signatures in terms of uniqueness, randomness, and robustness, without incurring large overheads. We also suggest additional optimizations that can be leveraged to improve the performance of MeL-PUF.

The semiconductor industry is constantly striving to improve the performance, reliability, and cost of electronic devices. The growing complexity in the design process of microelectronics coupled with the requirement of significant investment in research and development means that there is hardly any entity in the industry that is capable of acquiring the state-of-the-art technologies for all facets of the development process across myriad niche device technologies. Therefore, for economic and practical reasons, the modern electronic supply chain relies on several different vendors that specialize in a specific area of the design and fabrication process. From a security perspective, this distributed manufacturing process violates the trust of the underlying hardware as any entity in the supply chain could maliciously modify the design. This poses a significant concern, especially for government, military applications, and consumer electronic products handling private and critical data during the acquisition of untrusted microelectronic designs and components. Hence, trust has emerged as a crucial constraint that the various steps in the microelectronic manufacturing process should consider in order to ensure that no malicious functionality exists in the hardware. In the last decade, several works have proposed steps both to establish and verify trust in microelectronics. However, not all threat models are adequately covered, and the solutions are pertinent to a limited category of devices. In this article, we present the challenges in establishing trust in today’s distributed supply chain environment by discussing the attack models at each step of the manufacturing process. We also shed light on the existing solutions that try to address these threats and discuss their limitations. Finally, we elaborate on one of the existing supply chain standards where trust verification is still infeasible and identify avenues for future research.

Cache timing attacks are a serious threat to the security of computing systems. It permits sensitive information, such as cryptographic keys, to leak across virtual machines and even to remote servers. Encrypted Address Cache, proposed by CEASER - a best paper candidate at MICRO 2018 - is a promising countermeasure that stymies the timing channel by employing cryptography to randomize the cache address space. The author claims strong security guarantees by providing randomization both spatially (randomizing every address) and temporally (changing the encryption key periodically). In this letter, we point out a serious flaw in their encryption approach that undermines the proposed security guarantees. Specifically, we show that the proposed Low-Latency Block Cipher, used for encryption in CEASER, is composed of only linear functions which neutralizes the spatial and temporal randomization. Thus, we show that the complexity of a cache timing attack remains unaltered even with the presence of CEASER. Further, we compare the encryption overheads if CEASER is implemented with a stronger encryption algorithm.

Fault attacks are potent physical attacks on crypto-devices. A single fault injected during encryption can reveal the cipher’s secret key. In a hardware realization of an encryption algorithm, only a tiny fraction of the gates is exploitable by such an attack. Finding these vulnerable gates has been a manual and tedious task requiring considerable expertise. In this paper, we propose SOLOMON, the first automatic fault attack vulnerability detection framework for hardware designs. Given a cipher implementation, either at RTL or gate-level, SOLOMON uses formal methods to map vulnerable regions in the cipher algorithm to specific locations in the hardware thus enabling targeted countermeasures to be deployed with much lesser overheads. We demonstrate the efficacy of the SOLOMON framework using three ciphers: AES, CLEFIA, and Simon.

Power side-channel attacks pose a serious threat to the security of embedded devices. Most available countermeasures have significant overheads resulting in the application not meeting its requirements of low-power, high-performance and small area. We propose an algorithm called Karna 11 Karna, much like Achilles from Greek mythology, was born with a shield that protected him from attacks. Similarly, Our proposed scheme, Karna protects the design from power side-channel attacks in the manufacturing phase or in other words the chip is manufactured(born) with a shield. that can be incorporated in the Electronic Design Automation (EDA) flow, in order to significantly improve the side-channel security of the device, without impacting the other device characteristics. Karna does not add additional logic but rather achieves this by first identifying vulnerable gates in the design and then reconfiguring these gates to increase side-channel resistance. Unlike contemporary works, Karna does not require any specialized gate library but uses the gates available in the standard cell library. We integrate Karna into the Synopsys Design Compiler and demonstrate its efficacy at reducing side-channel leakage in implementations of AES, PRESENT and Simon block ciphers, synthesized for a 28nm technology node. An interesting observation is that Karna only uses the available space around the gates to perform this optimization and does not incur any additional area overheads. We showcase the side-channel resistance of these optimized designs using a Differential Power Analysis attack. Our proposed approach is able to reduce the power side-channel of the designs while incurring no penalty in delay, power and gate-count.

Side-channel attacks pose a serious threat to the security of embedded devices. Most available countermeasures have significant overheads and very often do not meet an application’s requirement of low-power, high-performance, and small-area. In this paper we propose an algorithm called Karna that can be incorporated in the Electronic Design Automation (EDA) flow, in order to significantly improve the side-channel security of a device, without compromising on the other device characteristics of power, performance, and area. Karna achieves this by first identifying vulnerable gates in the design and then reconfiguring these gates to increase side-channel resistance. Unlike contemporary works, Karna does not require any specialized gate library but uses the gates available in the standard cell library. We integrate Karna into the Synopsys Design Compiler and demonstrate its efficacy at reducing side-channel leakage in implementations of AES, PRESENT and Simon block ciphers, synthesized for a 28nm technology node. We show that our proposed approach is able to reduce the power side-channel of the designs while incurring no penalty in delay, power and gate-count. Our proposed approach incurs a 20% penalty in the area utilization while the total area of the design remains constant.

Privacy leaks from Netflix videos/movies is well researched. Current state-of-the-art works have been able to obtain coarse-grained information such as the genre and the title of videos by passive observation of encrypted traffic. However, leakage of fine-grained information from encrypted traffic has not been studied so far. Such information can be used to build behavioural profiles of viewers. On 28th December 2018, Netflix released the first mainstream interactive movie called ‘Black Mirror: Bandersnatch’. In this work, we use this movie as a case-study to show for the first time that fine-grained information (i.e., choices made by users) can be revealed from encrypted traffic. We use the state information exchanged between the viewer’s browser and Netflix as the side-channel. To evaluate our proposed technique, we built the first interactive video traffic dataset of 100 viewers; which we will be releasing. Preliminary results indicate that the choices made by a user can be revealed 96% of the time in the worst case.

The timing constrained discrete sizing technique (TC-DSP) is employed at all stages of the physical synthesis flow and has been studied extensively over the last 30 years. The ISPD gate sizing contests introduced industry standard benchmarks and library which motivated a lot of research in this area. However most of the solutions employed were either sensitivity driven or based on analytical methods that required incremental timing analysis after every iteration with both consuming a significant amount of time to perform the optimization. The key observations reported in this paper are i) there exists a good correlation between the slack distribution among gates in a given iteration and the order of gate replacements in subsequent iterations; and, ii) across the benchmark circuits there exists significant overlap in the number of sub-circuits that have similar structures. This paper exploits the above observations to propose MLTimer, an iterative algorithm that uses adaptive lazy timing analysis in conjunction with a Support Vector Machine (SVM) engine for solving the TC-DSP quickly and efficiently. We observe that for large benchmark circuits (≥ 200,000) our proposed solution reduces the leakage power by 3% and the running time by over 50% when compared to the best reported heuristic in the literature. This significant decrease in running time is very useful to the industry for achieving timing and power closures of large designs within a given deadline.

Illegal memory accesses are a serious security vulnerability that have been exploited on numerous occasions. In this letter, we present Gandalf, a compiler assisted hardware extension for the OpenRISC processor that thwarts all forms of memory-based attacks. We associate lightweight capabilities to all program variables, which are checked at run time by the hardware. Gandalf is transparent to the user and does not require significant OS modifications. Moreover, it achieves locality and incurs minimal overheads in the hardware. We demonstrate these features with a customized Linux kernel executing SPEC2006 benchmarks. To the best of our knowledge, this is the first work to demonstrate a complete solution for hardware-based memory protection schemes for embedded platforms.

The discrete Vt sizing technique is employed at all stages of the physical synthesis flow, because it does not impact the placement yet provides significant room for power/timing optimization. The timing-constrained discrete Vt sizing problem (TC-DVSP) is NP-complete and earlier techniques reported for the same, employed iterative greedy or sensitivity-driven heuristics, that required incremental timing analysis after every iteration. The key observation reported in this paper is that there exists a good correlation between the slack distribution among gates in a given iteration and the order of gate replacements in subsequent iterations.

This paper considers the timing-constrained discrete Vt replacement problem (DVRP), for leakage minimization in digital circuits. The problem is NP-complete. Earlier techniques reported for the DVRP employed iterative greedy or sensitivity-driven heuristics, that required incremental timing analysis after every iteration. The key observation reported in this paper is a good correlation between the slack distribution among gates in a given iteration and the order of gate replacements in subsequent iterations. This paper exploits the above observation to propose FastReplace, an iterative algorithm that uses adaptive lazy timing analysis to solve the DVRP. The proposed FastReplace technique, when applied to ISCAS and ITC benchmark circuits, produced solutions 9.8×and 3.1×faster as compared to the greedy technique and a commercial multi-Vt synthesis tool respectively,without impacting the solution quality.

Shared data synchronization is at the heart of the multi-core revolution since it is essential for writing concurrent programs. Ideally, a synchronization technique should be able to fully exploit the available cores, leading to improved performance. However, with the growing demand for energy-efficient systems, it also needs to work within the energy and power budget of the system. In this paper, we perform a detailed study of the performance as well as energy efficiency of popular shared-data synchronization techniques on a commodity multi-core processor. We show that Software Transactional Memory (STM) systems can perform better than locks for workloads where a significant portion of the running time is spent in the critical sections. We also show how power-conserving techniques available on modern processors like C-states and clock frequency scaling impact energy consumption and performance. Finally, we compare the performance of STMs and locks under similar power budgets.