Programme | DATE 2023

The time zone for all times mentioned at the DATE website is CEST – Central Europe Summer Time (UTC+1). AoE = Anywhere on Earth.

OC Opening Ceremony

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 08:30 CEST - 09:00 CEST
Location / Room: Queen Elisabeth Hall

Session chair:
Ian O’Connor, Ecole Centrale de Lyon, FR

Session co-chair:
Robert Wille, Technical University of Munich, DE, Jürgen Teich, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), DE

Time	Label	Presentation Title Authors
08:30 CEST	OC.1	WELCOME ADDRESSES Speaker: Ian O'Connor and Robert Wille, DATE, BE Authors: Ian O'Connor¹ and Robert Wille² ¹Lyon Institute of Nanotechnology, FR; ²TU Munich, DE Abstract Welcome Addresses from DATE 2023 Chairs
08:40 CEST	OC.2	PRESENTATION OF AWARDS Speaker: David Atienza, Georges Gielen and Yervant Zorian, DATE, BE Authors: David Atienza¹, Georges Gielen² and Yervant Zorian³ ¹EPFL, CH; ²KU Leuven, BE; ³Synopsys, US Abstract Presentation of Awards from Chairs

OK1 Opening Keynote 1

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 09:00 CEST - 09:45 CEST
Location / Room: Queen Elisabeth Hall

Time	Label	Presentation Title Authors
09:00 CEST	OK1.1	BUILDING THE METAVERSE: AUGMENTED REALITY APPLICATIONS AND INTEGRATED CIRCUIT CHALLENGES Presenter: Edith Beigné, Meta Reality Labs, US Author: Edith Beigné, Meta Reality Labs, US Abstract Augmented reality is a set of technologies that will fundamentally change the way we interact with our environment. It represents a merging of the physical and the digital worlds into a rich, context aware and accessible user interface delivered through a socially acceptable form factor such as eyeglasses. One of the biggest challenges in realizing a comprehensive AR experience are the performance and form factor requiring new custom silicon. Innovations are mandatory to manage power consumption constraints and ensure both adequate battery life and a physically comfortable thermal envelope. This presentation reviews Augmented Reality and Virtual Reality applications and Silicon challenges.

OK2 Opening Keynote 2

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 09:45 CEST - 10:30 CEST
Location / Room: Queen Elisabeth Hall

Time	Label	Presentation Title Authors
09:45 CEST	OK2.1	THE CYBER-PHYSICAL METAVERSE – WHERE DIGITAL TWINS AND HUMANS COME TOGETHER Presenter: Dirk Elias, Robert Bosch GmbH, DE Author: Dirk Elias, Robert Bosch GmbH, DE Abstract The concept of Digital Twins (DTs) has been discussed intensively for the past couple of years. Today we have instances of digital twins that range from static descriptions of manufacturing data and material properties to live interfaces to operational data of cyber physical systems and the functions and services they provide. Currently, there are no standardized interfaces to aggregate atomic DTs (e.g., the twin of the lowest-level function of a machine) to higher-level DTs providing more complex services in the virtual world. Additionally, there is no existing infrastructure to reliably link the DTs in the virtual world to the integrated CPSs in the real world (like a car consisting of many ECUs with even more functions). This keynote will address how the Metaverse can become the virtual world where DTs of humans and machines live and how to reliably connect DTs to the physical world. Insights in current activities of Bosch Research and its academic partners to move towards this vision will be provided.

ASD1 ASD technical session: Designing Fault tolerant and resilient autonomous systems

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 11:00 CEST - 12:30 CEST
Location / Room: Gorilla Room 1.5.4/5

Session chair:
Selma Saidi, TU Dortmund, DE

Time	Label	Presentation Title Authors
11:00 CEST	ASD1.1	MAVFI: AN END-TO-END FAULT ANALYSIS FRAMEWORK WITH ANOMALY DETECTION AND RECOVERY FOR MICRO AERIAL VEHICLES Speaker: Yu-Shun Hsiao, Harvard University, US Authors: Yu-Shun Hsiao¹, Zishen Wan², Tianyu Jia³, Radhika Ghosal¹, Abdulrahman Mahmoud¹, Arijit Raychowdhury², David Brooks¹, Gu-Yeon Wei⁴ and Vijay Janapa Reddi¹ ¹Harvard University, US; ²Georgia Tech, US; ³Peking University, CN; ⁴Harvard University / Samsung, US Abstract Safety and resilience are critical for autonomous unmanned aerial vehicles (UAVs). We introduce MAVFI, the micro aerial vehicles (MAVs) resilience analysis methodology to assess the effect of silent data corruption (SDC) on UAVs' mission metrics, such as flight time and success rate, for accurately measuring system resilience. To enhance the safety and resilience of robot systems bound by size, weight, and power (SWaP), we offer two low-overhead anomaly-based SDC detection and recovery algorithms based on Gaussian statistical models and autoencoder neural networks. Our anomaly error protection techniques are validated in numerous simulated environments. We demonstrate that the autoencoder-based technique can recover up to all failure cases in our studied scenarios with a computational overhead of no more than 0.0062%. Our application-aware resilience analysis framework, MAVFI, can be utilized to comprehensively test the resilience of other Robot Operating System (ROS)-based applications and is publicly available at https://github.com/harvard-edge/MAVBench/tree/mavfi.
11:22 CEST	ASD1.2	PHALANX: FAILURE-RESILIENT TRUCK PLATOONING SYSTEM Speaker: Taewook Ahn, Kookmin University, KR Authors: Changjin Koo¹, jaegeun park², Ahn TaeWook¹, Hongsuk Kim¹, Jong-Chan Kim¹ and Yongsoon Eun² ¹Kookmin University, KR; ²DGIST, KR Abstract We introduce Phalanx, a failure-resilient truck platooning system, where trucks in a platoon protect each other from sensor failures despite the lack of redundant sensors. For that, we first emulate the failed sensors by collectively utilizing other sensors across the platoon. If the failed sensor cannot be emulated, the control system is instantaneously reconfigured to a cooperative protection mode using only the live sensors. We take a scenario-based approach considering six scenarios with single and dual failures of the essential sensors (i.e., lidar, encoder, and camera) for platooning control. For each scenario, we present a protection method that enables the safe maneuvering of platoons. For the evaluation, Phalanx is implemented using our scale truck testbed instrumented with fault injection modules, demonstrating safe platooning controls for the failure scenarios.
11:45 CEST	ASD1.3	EFFICIENT SOFTWARE-IMPLEMENTED HW FAULT TOLERANCE FOR TINYML INFERENCE IN SAFETY-CRITICAL APPLICATIONS Speaker: Uzair Sharif, TU Munich, DE Authors: Uzair Sharif, Daniel Mueller-Gritschneder, Rafael Stahl and Ulf Schlichtmann, TU Munich, DE Abstract TinyML research has mainly focused on optimizing neural network inference in terms of latency, code-size and energy-use for efficient execution on low-power micro-controller units (MCUs). However, distinctive design challenges emerge in safety-critical applications, for example in small unmanned autonomous vehicles such as drones, due to the susceptibility of off-the-shelf MCU devices to soft-errors. We propose three new techniques to protect TinyML inference against random soft errors with the target to reduce run-time overhead: one for protecting fully-connected layers; one adaptation of existing algorithmic fault tolerance techniques to depth-wise convolutions; and an efficient technique to protect the so-called epilogues within TinyML layers. Integrating these layer-wise methods, we derive a full-inference hardening solution for TinyML that achieves run-time efficient soft-error resilience. We evaluate our proposed solution on MLPerf-Tiny benchmarks. Our experimental results show that competitive resilience can be achieved compared with currently available methods, while reducing run-time overheads by ~120% for one fully-connected neural network (NN); ~20% for the two CNNs with depth-wise convolutions; and ~2% for standard CNN. Additionally, we propose selective hardening which reduces the incurred run-time overhead further by ~2x for the studied CNNs by focusing exclusively on avoiding mispredictions.
12:07 CEST	ASD1.4	FORMAL ANALYSIS OF TIMING DIVERSITY FOR AUTONOMOUS SYSTEMS Speaker: Anika Christmann, TU Braunschweig, DE Authors: Anika Christmann, Robin Hapka and Rolf Ernst, TU Braunschweig, DE Abstract The design of autonomous systems, such as for automated driving and avionics, is challenging due to high performance requirements combined with high criticality. Complex applications demand the full performance of commercial high performance multi-core systems of-the-shelf (COTS), with or without accelerators. While these systems are optimized for performance, hard real-time requirements and deterministic timing behavior are major constraints for safety-critical systems. Unfortunately, infrequent timing outliers caused by interleaved hardware-software effects of COTS systems complicate traditional worst-case design. This conflict often prohibits deploying COTS hardware and consequently prevents sophisticated applications, too. Recently, an approach called Timing Diversity was introduced, which proposes to exploit existing dual modular redundant hardware platforms to mask deadline violations. This paper puts Timing Diversity on a theoretical foundation and provides specification for different implementations. It demonstrates that Timing Diversity needs fast recovery to be effective, proposes a recovery strategy and provides a mathematical model for the reliability of the resulting system. Using experimental data in a Linux based system, it shows that fast recovery is useful, making Timing Diversity a realistic option for compute demanding hard real-time applications.

FS1 Focus session: Embracing uncertainty and exploring non-determinism for efficient implementations of Machine Learning models

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 11:00 CEST - 12:30 CEST
Location / Room: Okapi Room 0.8.1

Session chair:
Lorena ANGHEL, SPINTEC, Grenoble INP – University Grenoble Alpes, FR

Time	Label	Presentation Title Authors
11:00 CEST	FS1.1	BINARY RERAM-BASED BNN FIRST-LAYER IMPLEMENTATION Speaker: Mona EZZADEEN, CEA-Leti, FR Authors: Mona Ezzadeen¹, Atreya Majumdar¹, Sigrid Thomas¹, Jean-Philippe Noel¹, Bastien Giraud¹, Marc Bocquet², Francois Andrieu¹, Damien Querlioz³ and Jean-Michel Portal⁴ ¹CEA, FR; ²IM2NP - Aix-Marseille University, FR; ³C2N - CNRS, FR; ⁴Aix-Marseille University, FR Abstract The deployment of Edge AI requires energy-efficient hardware with a minimal memory footprint to achieve optimal performance. One approach to meet this challenge is the use of Binary Neural Networks (BNNs) based on non-volatile in-memory computing (IMC). In recent years, elegant ReRAM-based IMC solutions for BNNs have been developed, but they do not extend to the first layer of a BNN, which typically requires non-binary activations. In this paper, we propose a modified first layer architecture for BNNs that uses k-bit input images broken down into k binary input images with associated fully binary convolution layers and an accumulation layer with fixed weights of {$2^{-1},...,2^{-k}$}. To further increase energy efficiency, we also propose reducing the number of operations by truncating 8-bit RGB pixel code to the 4 most significant bits (MSB). Our proposed architecture only reduces network accuracy by 0.28\% on the CIFAR-10 task compared to a BNN baseline. Additionally, we propose a cost-effective solution to implement the weighted accumulation using successive charge sharing operations on an existing ReRAM-based IMC solution. This solution is validated through functional electrical simulations.
11:30 CEST	FS1.2	SCALABLE SPINTRONICS-BASED BAYESIAN NEURAL NETWORK FOR UNCERTAINTY ESTIMATION Speaker: Soyed Tuhin Ahmed, Karlsruhe Institute of Technology, DE Authors: Soyed Ahmed¹, Kamal Danouchi², Michael Hefenbrock³, Guillaume PRENAT⁴, Lorena Anghel⁵ and Mehdi Tahoori¹ ¹Karlsruhe Institute of Technology, DE; ²University Grenoble Alpes, CEA, CNRS, Grenoble INP, IRIG-Spintec Laboratory, FR; ³RevoAI GMBH, DE; ⁴University Grenoble Alpes, CEA, CNRS, Grenoble INP, FR; ⁵Grenoble-Alpes University, Grenoble, France, FR Abstract Typical neural networks are incapable of effectively estimating prediction uncertainty, leading to overconfident predictions. Estimating uncertainty is crucial for safety-critical tasks such as autonomous vehicle driving and medical diagnosis and treatment. Bayesian Neural Networks (BayNNs), which combine the capabilities of neural networks and Bayesian inference, are an effective approach for uncertainty estimation. However, BayNNs are computationally demanding and necessitate substantial memory resources. Computation-in-memory (CiM) architectures utilizing emerging resistive non-volatile memories such as Spin-Orbit Torque (SOT) have been proposed to increase the resource efficiency of traditional neural networks. However, training scalable and efficient BayNNs and implementing them in the CiM architecture presents its own challenges. In this paper, we propose a scalable Bayesian NN framework via Subset-Parameter inference and its Spintronic-based CiM implementation. Our method is evaluated on large datasets and topologies to show that it can achieve comparable accuracy while still being able to estimate uncertainty efficiently at up to 70X lower power consumption and 158.7X lower storage memory requirements.
12:00 CEST	FS1.3	COUNTERING UNCERTAINTIES IN IN-MEMORY-COMPUTING PLATFORMS WITH STATISTICAL TRAINING, ACCURACY COMPENSATION AND RECURSIVE TEST Speaker: Bing Li, TU Munich, DE Authors: Amro Eldebiky¹, Grace Li Zhang² and Bing Li¹ ¹TU Munich, DE; ²TU Darmstadt, DE Abstract "In-memory computing (IMC) has become an efficient solution for implementing neural networks on hardware. However, IMC platforms request that parameters such as weights in neural networks are programmed to exact values. This is a very demanding task due to programming complexity and variations. Accordingly, new methods should be introduced to counter such uncertainties. In this talk, we will first discuss a method to train neural networks statistically with variations modeled as correlated random variables. The statistical effect is incorporated into the cost function during training. Consequently, a neural network after statistical training becomes robust to uncertainties. To deal with variations and noise further, we also introduce a compensation method with weight constraints and extra layers for neural networks. These extra layers are trained after the weights in the original neural network are determined to enhance the inference accuracy. Finally, we discuss a method for testing the effect of process variations in an optical acceleration platform for neural networks. This optical platform uses Mach-Zehnder interferometers (MZIs) to implement the multiply–accumulate operations. However, trigonometric functions in the transformation matrix of an MZI make it very sensitive to variations. To address this problem, we apply a recursive test procedure to determine the properties of MZIs inside an optical acceleration module, so that process variations can be compensated accordingly to maintain the accuracy of neural networks. "

FS2 Focus session: Open-source hardware technologies

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 11:00 CEST - 12:30 CEST
Location / Room: Gorilla Room 1.5.3

Session chair:
Giovanni De Micheli, EPFL, CH

The session will discuss perspectives on the future and transformative implications of open-source hardware technologies.

Time	Label	Presentation Title Authors
11:00 CEST	FS2.1	DEMOCRACY OF SILICON AND INTELLIGENT EDGE Speaker and Author: Naveed Sherwani, RapidSilicon, US Abstract The open-source movement has already tremendously shaken our industry with broad initiatives such as the RISC-V ISA. One of the most remarkable effect of open-source is the ability to collaborate a broad borderless-scale, and foster innovation and education worldwide - a true democracy of silicon. New levels of innovation are expected by the upcoming intelligent edge requirements, where data deluge will have to be handled locally to sensors with minimal energy requirements. Participation of the largest nation is extremely important to this mission, but enabling larger engagement through education opportunities should be a top priority to create our industry's workforce of tomorrow.
11:30 CEST	FS2.2	PULP: 10 YEARS OF OPEN SOURCE HARDWARE Speaker and Author: Frank Gürkaynak, ETH Zurich, CH Abstract The Parallel Ultra Low Power (PULP) Platform project kicked off in a small office almost exactly 10 years ago. We wanted to work on energy efficient computer architectures and realized that we needed the help and cooperation of a larger community if we were to be successful as an academic institution. This is why we had open source as a cornerstone of our project. 10 years and more then 50 ASICs later, open source hardware is no longer seen as a enthusiasts dream, or academic curiosity, but has established itself in business plans of companies big and small as well as receiving funding from governments. I have been lucky enough to have witnessed some of the key events of this development and in this talk I want to share a bit of this history as seen from our side and provide some insights into the developments we can expect in the near future.
12:00 CEST	FS2.3	OPENFPGA: BRINGING OPEN-SOURCE HARDWARE TO FPGAS Speaker and Author: Pierre-Emmanuel Gaillardon, University of Utah, US Abstract In this talk, we will introduce the OpenFPGA framework whose aim is to generate highly-customizable Field Programmable Gate Array (FPGA) fabrics and their supporting EDA flows. Following the footsteps of the RISC-V initiative, OpenFPGA brings reconfigurable logic into the open-source community and closes the performance gap with commercial products. OpenFPGA strongly incorporates physical design automation in its core and enables 100k+ look-up tables FPGA fabric generation from specification to layout in less than 24h with a single engineer effort.

LKS1 Later … with the keynote speakers

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 11:00 CEST - 12:30 CEST
Location / Room: Darwin Hall

Session chair:
Rolf Ernst, TU Braunschweig, DE, Selma Saidi, TU Dortmund, DE

Session co-chair:
Ian O’Connor, Ecole Centrale de Lyon, FR

M01 Modern High-Level Synthesis for Complex Data Science Applications

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 11:00 CEST - 12:30 CEST
Location / Room: Marble Hall

Organisers:
Antonino Tumeo, Pacific Northwest National Laboratory, US
Fabrizio Ferrandi, Politecnico di Milano, IT
Nicolas Bohm Agostini, Pacific Northwest National Laboratory and Northeastern University, US
Serena Curzel, Pacific Northwest National Laboratory, US and Politecnico di Milano, IT
Michele Fiorito, Politecnico di Milano, IT

Presenters:
Fabrizio Ferrandi, Politecnico di Milano, IT
Nicolas Bohm Agostini, Pacific Northwest National Laboratory and Northeastern University, US
Serena Curzel, Pacific Northwest National Laboratory, US and Politecnico di Milano, IT
Antonino Tumeo, Pacific Northwest National Laboratory, US
Serena Curzel, Pacific Northwest National Laboratory, US and Politecnico di Milano, IT
Michele Fiorito, Politecnico di Milano, IT

Data Science applications (machine learning, graph analytics) are among the main drivers for the renewed interests in designing domain specific accelerators, both for reconfigurable devices (Field Programmable Gate Arrays) and Application-Specific Integrated Circuits (ASICs). Today, the availability of new high-level synthesis (HLS) tools to generate accelerators starting from high-level specifications provides easier access to FPGAs or ASICs and preserves programmer productivity. However, the conventional HLS flow typically starts from languages such as C, C++, or OpenCL, heavily annotated with information to guide the hardware generation, still leaving a significant gap with respect to the (Python based) data science frameworks. This tutorial will discuss HLS to accelerate data science on FPGAs or ASICs, highlighting key methodologies, trends, advantages, benefits, but also gaps that still need to be closed. The tutorial will provide a hands-on experience of the SOftware Defined Accelerators (SODA) Synthesizer, a toolchain composed of SODA-OPT, an opensource front-end and optimizer that interface with productive programming data science frameworks in Python, and Bambu, the most advanced open-source HLS tool available, able to generate optimized accelerators for data-intensive kernels. We will further show how SODA integrates with OpenROAD flow, providing a truly automated end-to-end open-source compiler toolchain from high level machine learning frameworks to Silicon.

M01.1 Session 1: Modern High-Level Synthesis for Complex Data Science Applications

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 11:00 CEST - 12:30 CEST
Location / Room: Marble Hall

Time	Label	Presentation Title Authors
11:00 CEST	M01.1.1	AGILE HARDWARE DESIGN FOR COMPLEX DATA SCIENCE APPLICATIONS: OPPORTUNITIES AND CHALLENGES Speaker: Antonino Tumeo, Pacific Northwest National Laboratory, US Abstract Introductory material, context, state-of the art, and research opportunities
11:20 CEST	M01.1.2	BAMBU: AN OPEN-SOURCE RESEARCH FRAMEWORK FOR THE HIGH-LEVEL SYNTHESIS OF COMPLEX APPLICATIONS. Speaker: Fabrizio Ferrandi, Politecnico di Milano, IT Abstract Advanced materials on High-Level Synthesis methods
11:45 CEST	M01.1.3	END-TO-END DEMONSTRATION FROM HIGH-LEVEL FRAMEWORKS TO SILICON WITH SODA-OPT, BAMBU, AND OPENROAD Speakers: Nicolas Bohm Agostini¹ and Serena Curzel² ¹Pacific Northwest National Laboratory and Northeastern University, US; ²Pacific Northwest National Laboratory, US and Politecnico di Milano, IT Abstract Hands on on the end-to-end toolchain
12:10 CEST	M01.1.4	ADVANCED HIGH-LEVEL SYNTHESIS WITH BAMBU Speakers: Serena Curzel¹ and Michele Fiorito² ¹Pacific Northwest National Laboratory, US and Politecnico di Milano, IT; ²Politecnico di Milano, IT Abstract Hands on on advanced High-Level Synthesis with Bambu

M01.1 Session 1: Modern High-Level Synthesis for Complex Data Science Applications

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 11:00 CEST - 12:30 CEST
Location / Room: Marble Hall

Time	Label	Presentation Title Authors
11:00 CEST	M01.1.1	AGILE HARDWARE DESIGN FOR COMPLEX DATA SCIENCE APPLICATIONS: OPPORTUNITIES AND CHALLENGES Speaker: Antonino Tumeo, Pacific Northwest National Laboratory, US Abstract Introductory material, context, state-of the art, and research opportunities
11:20 CEST	M01.1.2	BAMBU: AN OPEN-SOURCE RESEARCH FRAMEWORK FOR THE HIGH-LEVEL SYNTHESIS OF COMPLEX APPLICATIONS. Speaker: Fabrizio Ferrandi, Politecnico di Milano, IT Abstract Advanced materials on High-Level Synthesis methods
11:45 CEST	M01.1.3	END-TO-END DEMONSTRATION FROM HIGH-LEVEL FRAMEWORKS TO SILICON WITH SODA-OPT, BAMBU, AND OPENROAD Speakers: Nicolas Bohm Agostini¹ and Serena Curzel² ¹Pacific Northwest National Laboratory and Northeastern University, US; ²Pacific Northwest National Laboratory, US and Politecnico di Milano, IT Abstract Hands on on the end-to-end toolchain
12:10 CEST	M01.1.4	ADVANCED HIGH-LEVEL SYNTHESIS WITH BAMBU Speakers: Serena Curzel¹ and Michele Fiorito² ¹Pacific Northwest National Laboratory, US and Politecnico di Milano, IT; ²Politecnico di Milano, IT Abstract Hands on on advanced High-Level Synthesis with Bambu

M04 Remote Side-Channel and Fault Attacks in FPGAs

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 11:00 CEST - 12:30 CEST
Location / Room: Okapi Room 0.8.3

Organisers:
Mehdi Tahoori, Karlsruhe Institute of Technology, DE
Jonas Krautter, Karlsruhe Institute of Technology, DE
Dennis Gnad, Karlsruhe Institute of Technology, DE

The shared FPGA platform in the cloud is based on the concept that the FPGA real estate can be shared among various users, probably event at different privilege levels. Such multi-tenancy comes with new security challenges, in which one user, while being completely logically isolated from another, can cause security breaches to another user on the same FPGA. In addition, such a hardware security vulnerability does not require physical access to the hardware to perform measurements or fault attacks, hence it can be done completely remotely. The main objective of this tutorial, which consists of the three components of in-depth lecture, live demo and hands-on experience is to introduce the new challenges coming from sharing FPGAs in both cloud as well as state of the art heterogeneous Systems on Chip (SoCs). It will explore the remote active and passive attacks at the electrical level for multi-tenant FPGAs in the cloud and SoCs and discusses possible countermeasures to deal with such security vulnerabilities.

The first part of this tutorial is an in-depth lecture covering the new trends in design of heterogeneous FPGA-SoCs as well as sharing the FPGAs in the clouds and the associated security vulnerabilities. The lecture part is given by Mehdi Tahoori. In this part, the traditional side channel and fault attacks are reviewed. We also show how the power delivery network (PDN) on the chip, board and system level can be utilized as a side channel medium and how the legitimate programmable logic constructs of the FPGA can be exploited for side channel voltage fluctuation measurements as well as injecting faults on the PDN for fault attacks and denial of service. Also, various countermeasures in terms of offline bitstream checking and online approaches based on fencing and sandboxing will be covered.

In the second part of the tutorial, we present attacks live on recent cloud FPGAs, such as the Intel Stratix 10 and the Xilinx Virtex Ultrascale+. The respective attacks, which are Correlation Power Analysis as well as a Differential Fault Attack on the AES will be explained in details to the attendees, who will be able to learn how to derive secret AES keys from faulty ciphertexts and side-channel measurements in a real system. Moreover, we demonstrate how recent FPGAs can be crashed in a Denial-of-Service attack, making recovery without power cycling impossible. This part is administrated by Dennis Gnad and Jonas Krautter.

Finally, the third part is a hands-on experience using low cost Lattice iCE40-HX8K breakout boards together with a comprehensive graphical interface, which can be used to control various parameters of the measurement or fault injection process on the FPGA. On this platform, participants of the tutorial are able to perform the demonstrated attacks themselves and learn about the importance of the respective parameters as well as the details of the attacked implementation.

MPP1 Multi-partner projects

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 11:00 CEST - 12:30 CEST
Location / Room: Gorilla Room 1.5.1

Session chair:
Luca Sterpone, Politecnico di Torino, IT

Time	Label	Presentation Title Authors
11:00 CEST	MPP1.1	NIMBLEAI: TOWARDS NEUROMORPHIC SENSING-PROCESSING 3D-INTEGRATED CHIPS Speaker: Xabier Iturbe, Ikerlan, ES Authors: Xabier Iturbe¹, Nassim Abderrahmane², Jaume Abella³, Sergi Alcaide³, Eric Beyne⁴, Henri-Pierre CHARLES⁵, Christelle Charpin-Nicolle⁶, Lars Chittka⁷, Angelica Davila¹, Arne Erdmann⁸, Carles Estrada⁹, Ander Fernandez⁹, Anna Fontanelli¹⁰, Josè Flich¹¹, Gianluca Furano¹², Alejandro Hernan-Gloriani¹³, Erik Isusquiza¹⁴, radu grosu¹⁵, Carles Hernandez¹⁶, Daniele Ielmini¹⁷, David Jackson¹⁸, Maha Kooli¹⁹, Nicola Lepri²⁰, Bernabe Linares-Barranco²¹, Jean-Loup Lachese², Eric Laurent², Menno Lindwer²², Frank Linsenmaier¹³, Mikel Lujan¹⁸, Karel Masarik²³, Nele Mentens²⁴, Orlando Moreira²², Chinmay Nawghane⁴, Luca Peres¹⁸, Jean-Philippe Noel⁵, Arash Pourtaherian²², Christoph Posch²⁵, Peter Priller²⁶, Zdenek Prikryl²³, Felix Resch²⁷, Oliver Rhodes¹⁸, Todor Stefanov²⁸, Moritz Storring⁴, Michele Taliercio¹⁰, Rafael Tornero¹⁶, Marcel van de Burgwal⁴, Geert van der Plas⁴, Elisa Vianello⁶ and Pavel Zaykov²³ ¹Ikerlan, ES; ²MENTA, FR; ³Barcelona Supercomputing Center (BSC-CNS), ES; ⁴IMEC, BE; ⁵CEA, FR; ⁶CEA-Leti, FR; ⁷Queen Mary University Of London, GB; ⁸RAYTRIX, DE; ⁹IKERLAN, ES; ¹⁰MZ TECHNOLOGIES, IT; ¹¹Associate Professor, Universitat Politècnica de València, ES; ¹²ESA ESTEC, NL; ¹³VIEWPOINTSYSTEM, AT; ¹⁴ULMA MEDICAL TECHNOLOGIES, ES; ¹⁵TU Wien, AT; ¹⁶UNIVERSIDAD POLITECNICA DE VALENCIA, ES; ¹⁷Poltecnico di Milano, IT; ¹⁸The University of Manchester, GB; ¹⁹CEA/LIST, FR; ²⁰POLITECNICO DI MILANO, IT; ²¹CSIC, ES; ²²GRAI MATTER LABS, NL; ²³CODASIP, CZ; ²⁴UNIVERSITY OF LEIDEN, NL; ²⁵PROPHESEE, FR; ²⁶AVL LIST, AT; ²⁷TU WIEN, AT; ²⁸Leiden University, NL Abstract The NimbleAI Horizon Europe project leverages key principles of energy-efficient visual sensing and processing in biological eyes and brains, and harnesses the latest advances in 3D stacked silicon integration, to create an integral sensing-processing neuromorphic architecture that efficiently and accurately runs computer vision algorithms in area-constrained endpoint chips. The rationale behind the NimbleAI architecture is: sense data only with high information value and discard data as soon as they are found not to be useful for the application (in a given context). The NimbleAI sensing-processing architecture is to be specialized after-deployment by tunning system-level trade-offs for each particular computer vision algorithm and deployment environment. The objectives of NimbleAI are: (1) 100x performance per mW gains compared to state-of-the-practice solutions (i.e., CPU/GPUs processing frame-based video); (2) 50x processing latency reduction compared to CPU/GPUs; (3) energy consumption in the order of tens of mWs; and (4) silicon area of approx. 50 mm^2.
11:03 CEST	MPP1.2	OPTIMIZING INDUSTRIAL APPLICATIONS FOR HETEROGENEOUS HPC SYSTEMS: THE OPTIMA PROJECT Speaker: Dimitris Theodoropoulos, Institute of Communication and Computation Systems, GR Authors: Dimitris Theodoropoulos¹, Oliver Michel², PAVLOS MALAKONAKIS³, Konstantinos Georgopoulos⁴, Giovanni Isotton⁵, Dionisios Pnevmatikatos⁶, Ioannis Papaefstathiou⁷, Gino Perna⁸, Panagiotis Miliadis⁹, Mariza Zanotti⁸, Chloe Alverti⁹, Aggelos Ioannou¹⁰, Max Engelen¹¹, Valeria Bartsch¹², Mathias Balzer¹² and Iakovos Mavroidis¹³ ¹Institute of Communication and Computer Systems, GR; ²Cyberbotics, CH; ³TU Crete, GR; ⁴Telecommunication Systems Institute, TU Crete, GR; ⁵M3E, IT; ⁶National TU Athens & ICCS, GR; ⁷Aristotle University of Thessaloniki, GR; ⁸EnginSoft SpA, Trento, IT; ⁹National TU Athens, GR; ¹⁰School of Electrical & Computer Engineering, TU Crete, Chania, Greece, GR; ¹¹Maxeler IoT Labs, Delft, Netherlands, NL; ¹²Fraunhofer ITWM Kaiserslautern, DE; ¹³Telecommunication Systems Institute, GR Abstract OPTIMA is an SME-driven project (intermediate stage) that aims to port and optimize industrial applications and a set of open-source libraries into two novel FPGA-populated HPC systems. Target applications are from the domain of robotics simulation, underground analysis and computational fluid dynamics (CFD), where data processing is based on differential equations, matrix-matrix and matrix-vector operations. Moreover, the OPTIMA OPen Source (OOPS) library will support basic linear algebraic operations, sparse matrix-vector arithmetic, as well as computer-aided engineering (CAE) solvers. The OPTIMA target platforms are JUMAX, an HPC system that couples an AMD Epyc Server with Maxeler FPGA-based Dataflow Engines (DFEs), and server-class machines with Alveo FPGA cards installed. Experimental results on applications up to now, show that performance on robotic simulation can be enhanced up to 1.2x, CFD calculations up to 4.7x, and BLAS routines up to 7x compared to optimized software implementations from OpenBLAS.
11:06 CEST	MPP1.3	DESIGN ENABLEMENT FLOW FOR CIRCUITS WITH INHERENT OBFUSCATION BASED ON RECONFIGURABLE TRANSISTORS Speaker: Jens Trommer, NaMLab gGmbH, DE Authors: Jens Trommer¹, Niladri Bhattacharjee¹, Thomas Mikolajick², Sebastian Huhn³, Marcel Merten³, Mohammed Djeridane³, Muhammad Hassan⁴, Rolf Drechsler⁵, Shubham Rai⁶, Nima Kavand⁶, Armin Darjani⁶, Akash Kumar⁶, Violetta Sessi⁷, Maximilian Drescher⁷, Sabine Kolodinski⁷ and Maciej Wiatr⁷ ¹Namlab gGmbH, DE; ²NaMLab Gmbh / TU Dresden, DE; ³University of Bremen, DE; ⁴University of Bremen/Cyber Physical Systems, DFKI, DE; ⁵University of Bremen \| DFKI, DE; ⁶TU Dresden, DE; ⁷Globalfoundries Fab 1, DE Abstract Reconfigurable transistors are a new emerging type of device, which offer the promise to improve the resistance of electronic components against know-how theft. In order to enable a product development of such an emerging device, a cross-layer design enablement strategy is needed, as emerging technologies are not necessarily compatible with standard tools used in the industry. In ‘CirroStrato', we aim on the development of such a complete flow enabling CMOS co-integration of reconfigurable transistors, ranging from process adjustments, device modeling, library characterization, physical and logical synthesis up towards sophisticated hardware security tests. In this multi-partner-project (MPP) paper, our aim is to elucidate the overall design enablement flow, as well as current research challenges on the individual stages.
11:09 CEST	MPP1.4	SAFEXPLAIN: SAFE AND EXPLAINABLE CRITICAL EMBEDDED SYSTEMS BASED ON AI Speaker: Francisco J Cazorla, BSC, ES Authors: Jaume Abella¹, Jon Perez², Cristofer Englund³, Bahram Zonooz⁴, Gabriele Giordana⁵, Carlo Donzella⁶, Francisco J Cazorla⁷, Enrico Mezzetti⁷, Isabel Serra⁷, Axel Brando⁷, Irune Agirre², Fernando Eizaguirre², Thanh Bui³, Elahe Arani⁴, Fahad Sarfraz⁴, Ajay Balasubramaniam⁴, Ahmed Badar⁴, Ilaria Bloise⁵, Lorenzo Feruglio⁵, Ilaria Cinelli⁵, Davide Brighenti⁸ and Davide Cunial⁸ ¹Barcelona Supercomputing Center (BSC-CNS), ES; ²Ikerlan, ES; ³RISE, SE; ⁴Navinfo Europe, NL; ⁵AIKO s.r.l., IT; ⁶Exida Development, s.r.l., IT; ⁷BSC, ES; ⁸Exida Engineering, s.r.l., IT Abstract Deep Learning (DL) techniques are at the heart of most future advanced software functions in Critical Autonomous AI-based Systems (CAIS), where they also represent a major competitive factor. Hence, the economic success of CAIS industries (e.g., automotive, space, railway) depends on their ability to design, implement, qualify, and certify DL-based software products under bounded effort/cost. However, there is a fundamental gap between Functional Safety (FUSA) requirements on CAIS and the nature of DL solutions. This gap stems from the development process of DL libraries and affects high level concepts such as (1) explainability and traceability, (2) suitability for varying safety requirements, (3) FUSA-compliant implementations, and (4) real-time constraints. As a matter of fact, the data-dependent and stochastic nature of DL algorithms clash with current FUSA practice, which instead builds on deterministic, verifiable, and pass/fail test-based software. The SAFEXPLAIN project tackles these challenges by providing a novel and flexible approach to allow the certification – hence adoption – of DL-based solutions in CAIS building on (1) DL solutions that provide end-to-end traceability, with specific approaches to explain whether predictions can be trusted and strategies to reach (and prove) correct operation, in accordance to certification standards; (2) alternative and increasingly sophisticated design safety patterns for DL with varying requirements of criticality and fault tolerance; (3) DL library implementations that adhere to safety requirements; and (4) computing platform configurations, to regain determinism, and probabilistic timing analyses, to handle the remaining nondeterminism.
11:12 CEST	MPP1.5	THE FORA EUROPEAN TRAINING NETWORK ON FOG COMPUTING FOR ROBOTICS AND INDUSTRIAL AUTOMATION Speaker: Paul Pop, TU Denmark, DK Authors: Mohammadreza Barzegaran and Paul Pop, TU Denmark, DK Abstract Fog Computing for Robotics and Industrial Automation, FORA, was a European Training Network which focused on future industrial automation architectures and applications based on an emerging technology, called Fog Computing. The research project focused on research related to Fog Computing with applicability to industrial automation and manufacturing. The main outcome of the FORA project was the development of a deterministic Fog Computing Platform (FCP) to be used for implementing industrial automation and robotics solutions for Industry 4.0. This paper reports on the scientific outcomes of the FORA project. FORA has proposed a reference system architecture for Fog Computing, which was published as an open Architecture Analysis Design Language (AADL) model. The technologies developed in FORA include fog nodes and hypervisors, resource management mechanisms and middleware for deploying scalable Fog Computing applications, while guaranteeing the non-functional properties of the virtualized industrial control applications, and methods and processes for assuring the safety and security of the FCP. Several industrial use cases were used to evaluate the suitability of the FORA FCP for the Industrial IoT area, and to demonstrate how the platform can be used to develop industrial control applications and data analytics applications.
11:15 CEST	MPP1.6	PETAOPS/W EDGE-AI μPROCESSORS: MYTH OR REALITY? Speaker: Manil Dev Gomony, Eindhoven University of Technology, NL Authors: Manil Dev Gomony¹, Floran de Putter², Anteneh Gebregiorgis³, Gianna Paulin⁴, Linyan Mei⁵, Vikram Jain⁵, Said Hamdioui³, Victor Sanchez², Tobias Grosser⁶, Marc Geilen², Marian Verhelst⁵, Friedemann Zenke⁷, Frank Gurkaynak⁴, Barry de Bruin², Sander Stuijk², Simon Davidson⁸, Sayandip De², Mounir Ghogho⁹, Alexandra Jimborean¹⁰, Sherif Eissa², Luca Benini¹¹, Dimitrios Soudris¹², Rajendra Bishnoi³, Sam Ainsworth¹³, Federico Corradi², Ouassim Karrakchou⁹, Tim Güneysu¹⁴ and Henk Corporaal² ¹Eindhoven Unversity of Technology, NL; ²Eindhoven University of Technology, NL; ³TU Delft, NL; ⁴ETH Zurich, CH; ⁵KU Leuven, BE; ⁶University of Edinburgh, GB; ⁷Friedrich Miescher Institute, CH; ⁸The University of Manchester, GB; ⁹Universite Internationale de Rabat, MA; ¹⁰University of Murcia, ES; ¹¹ETH Zurich, CH \| Università di Bologna, IT; ¹²National Technical University of Athens, GR; ¹³University of Edinburgh,, GB; ¹⁴Ruhr-Universität Bochum & DFKI, DE Abstract With the rise of DL, our world braces for AI in every edge device, creating an urgent need for edge-AI SoCs. This SoC hardware needs to support high throughput, reliable and secure AI processing at ULP, with a very short time to market. With its strong legacy in edge solutions and open processing platforms, the EU is well-positioned to become a leader in this SoC market. However, this requires AI edge processing to become at least 100 times more energy-efficient, while offering sufficient flexibility and scalability to deal with AI as a fast-moving target. Since the design space of these complex SoCs is huge, advanced tooling is needed to make their design tractable. The CONVOLVE project (currently in Inital stage) addresses these roadblocks. It takes a holistic approach with innovations at all levels of the design hierarchy. Starting with an overview of SOTA DL processing support and our project methodology, this paper presents 8 important design choices largely impacting the energy efficiency and flexibility of DL hardware. Finding good solutions is key to making smart-edge computing a reality.
11:18 CEST	MPP1.7	VE-FIDES: DESIGNING TRUSTWORTHY SUPPLY CHAINS USING INNOVATIVE FINGERPRINTING IMPLEMENTATIONS Speaker: Bernhard Lippmann, Infineon Technologies, DE Authors: Bernhard Lippmann¹, Joel Hatsch¹, Stefan Seidl¹, Detlef Houdeau¹, Niranjana Papagudi Subrahmanyam², Daniel Schneider³, Malek Safieh³, Anne Passarelli³, Aliza Maftun³, Michaela Brunner⁴, Tim Music⁴, Michael Pehl⁴, Tauseef Siddiqui⁴, Ralf Brederlow⁵, Ulf Schlichtmann⁴, Bjoern Driemeyer⁶, Maurits Ortmanns⁷, Robert Hesselbarth⁸ and Matthias Hiller⁸ ¹Infineon Technologies, DE; ²Siemens AG, DE; ³Siemens, DE; ⁴TU Munich, DE; ⁵TUM School of EDA, DE; ⁶Uni Ulm, DE; ⁷University of Ulm, DE; ⁸Fraunhofer AISEC, DE Abstract The project VE-FIDES will contribute with a solu- tion based on an innovative multi-level fingerprinting approach to secure electronics supply chains against the threats of malicious modification, piracy, and counterfeiting. Hardware-fingerprints are derived from minuscule, unavoidable process variations using the technology of Physical Unclonable Functions (PUFs). The derived fingerprints are processed to a system fingerprint enabling unique identification, not only of single components but also on PCB level. With the proposed concept, we show how the system fingerprint can enhance the trustworthiness of the overall system. For this purpose, the complete system including tiny sensors, a secure element and its interface to the application is considered in VE-FIDES. New insights into methodologies to derive component and system fingerprints are gained. These techniques for the verification of system integrity are complemented by methods for preventing reverse engineering. Two application scenarios are in the focus of VE-FIDES: Industrial control systems and an automotive use case are considered, giving insights to a wide spectrum of requirements for products built from components provided by international supply chains.
11:21 CEST	MPP1.8	INTERACTIVE TECHNICAL PRESENTATIONS BY THE AUTHORS Speaker: Authors of the session, DATE, BE Author: Session Chairs, DATE, BE Abstract Participants can freely interact with authors during their interactive technical presentations.

LK1 IEEE CEDA Distinguished Lecturer Lunchtime Keynote

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 13:00 CEST - 14:00 CEST
Location / Room: Darwin Hall

Session chair:
Gi-Joon Nam, IBM, IEEE-CEDA President, US

Session co-chair:
Robert Wille, TU Munich, DE

hosted by IEEE CEDA

Time	Label	Presentation Title Authors
13:00 CEST	LK1.1	RESTORING THE MAGIC IN DESIGN Presenter: Jan Rabaey, IMEC / UC Berkeley, US Author: Jan Rabaey, IMEC / UC Berkeley, US Abstract The emergence of "Very Large Scale Integration (VLSI)” in the late 1970's created a groundswell of feverish innovation. Inspired by the vision laid out in Mead and Conway's "Introduction to VLSI Design”, numerous researchers embarked on venues to unleash the capabilities offered by integrated circuit technology. The introduction of design rules, separating manufacturing from design, combined with an intermediate abstraction language (CIF) and a silicon brokerage service (MOSIS) gave access to silicon for a large population of eager designers. The magic however expanded way beyond these circuit enthusiasts and attracted a whole generation of software experts to help automate the design process, given rise to concepts such as layout generation, logic synthesis, and silicon compilation. It is hard to overestimate the impact that this revolution has had on information technology and society at large. About fifty years later, Integrated Circuits are everywhere. Yet, the process of creating these amazing devices feels somewhat tired. CMOS scaling, the engine behind the evolution in complexity over all these decades, is slowing down and will most likely peter out in about a decade. So has innovation in design tools and methodologies. As a consequence, the lure of IC design and design tool development has faded, causing a talent shortage worldwide. Yet, at the same time, this moment of transition offers a world of opportunity and excitement. Novel technologies and devices, integrated in three-dimensional artifacts are emerging and are opening the door for truly transformational applications such as brain-machine interfaces and swarms of nanobots. Machine learning, artificial intelligence, optical and quantum computing present novel models of computation surpassing the instruction-set processor paradigm. With this comes a need again to re-invent the design process, explicitly exploiting the capabilities offered by this next generation of computing sysyems. In summary, it is time to put the magic in design again.

ASD2 ASD special session: Information Processing Factory, Take Two on Self-Aware Systems of MPSoCs

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 14:00 CEST - 15:30 CEST
Location / Room: Gorilla Room 1.5.4/5

Session chair:
Bryan Donyanavard, San Diego State University, US

Session co-chair:
Smail Niar, UPHF, FR

The Information Processing Factory (IPF) project is a collaboration between research teams in the US (UC Irvine) and Germany (TU Munich and TU Braunschweig) looking into Self-aware MPSoCs. IPF 1.0, was first introduced in ESWEEK 2016 as a paradigm to master complex dependable systems. The IPF paradigm applies principles inspired by factory management to the continuous operation and optimization of highly-integrated embedded systems. IPF 2.0 is an extension of the IPF for recent data-centric approaches and decentralization methodologies. While an IPF 1.0 system can operate independently, IPF 2.0 has a system-of-systems structure in which several IPF 1.0 “factories” interact, thus providing an additional layer of abstraction aimed at this data-centric approach. It horizontally extends core concepts such as self-optimization, self-construction, and runtime verification, while maintaining the strengths of the existing IPF methodology. Four talks in this session highlight the various concepts in IPF 2.0 illustrated through a truck platooning exemplar.

The talks outline the challenges introduced when moving from self-organizing local systems in IPF 1.0 to autonomous systems collaboration in IPF 2.0, using commercial vehicle platooning as a use case. The first talk explains how the self-aware truck control systems collaborate towards a platoon-level runtime verification that continuously supervises the state of a platoon, even under a changing platoon formation and external disturbance, e.g., by intersecting traffic participants. The second talk outlines the challenges related to managing enormous amounts of dynamic data in the system, and discusses how self-aware caching can help in mastering the resulting communication and data management requirements. The third talk proposes approaches to mitigate the energy cost of data management across multiple systems. The fourth talk addresses lack of explainability in the underlying machine learning technology in collaborative autonomous systems.

Time	Label	Presentation Title Authors
14:00 CEST	ASD2.1	TRUST, BUT VERIFY: TOWARDS SELF-AWARE, SAFE, AUTONOMOUS SELF-DRIVING SYSTEMS Presenter: Fadi Kurdahi, University of California, Irvine, US Author: Fadi Kurdahi, University of California, Irvine, US Abstract .
14:22 CEST	ASD2.2	VEHICLE AS A CACHE – A DATA CENTRIC PLATFORM FOR THE IPF PARADIGM Presenter: Rolf Ernst, TU Braunschweig, DE Author: Rolf Ernst, TU Braunschweig, DE Abstract .
14:45 CEST	ASD2.3	COMPUTATIONAL SELF-AWARENESS FOR ENERGY-EFFICIENT MEMORY SYSTEMS Presenter: Nikil Dutt, UC Irvine, US Author: Nikil Dutt, UC Irvine, US Abstract .
15:07 CEST	ASD2.4	LEARNING CLASSIFIER TABLES - TURNING ML DECISION MAKING EXPLAINABLE Presenter: Andreas Herkersdorf, TU Munich, DE Author: Andreas Herkersdorf, TU Munich, DE Abstract .

BPA6 Logic synthesis and verification

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 14:00 CEST - 16:00 CEST
Location / Room: Okapi Room 0.8.2

Session chair:
Rolf Drechsler, Bremen University, DE

Time	Label	Presentation Title Authors
14:00 CEST	BPA6.1	COMPUTING EFFECTIVE RESISTANCES ON LARGE GRAPHS BASED ON APPROXIMATE INVERSE OF CHOLESKY FACTOR Speaker: Zhiqiang Liu, Tsinghua University, CN Authors: Zhiqiang Liu and Wenjian Yu, Tsinghua University, CN Abstract Effective resistance, which originates from the field of circuits analysis, is an important graph distance in spectral graph theory. It has found numerous applications in various areas, such as graph data mining, spectral graph sparsification, circuits simulation, etc. However, computing effective resistances accurately can be intractable and we still lack efficient methods for estimating effective resistances on large graphs. In this work, we propose an efficient algorithm to compute effective resistances on general weighted graphs, based on a sparse approximate inverse technique. Compared with a recent competitor, the proposed algorithm shows several hundreds of speedups and also one to two orders of magnitude improvement in the accuracy of results. Incorporating the proposed algorithm with the graph sparsification based power grid (PG) reduction framework, we develop a fast PG reduction method, which achieves an average 6.4X speedup in the reduction time without loss of reduction accuracy. In the applications of power grid transient analysis and DC incremental analysis, the proposed method enables 1.7X and 2.5X speedup of overall time compared to using the PG reduction based on accurate effective resistances, without increase in the error of solution.
14:25 CEST	BPA6.2	FANOUT-BOUNDED LOGIC SYNTHESIS FOR EMERGING TECHNOLOGIES - A TOP-DOWN APPROACH Speaker: Dewmini Sudara Marakkalage, EPFL, CH Authors: Dewmini Marakkalage and Giovanni De Micheli, EPFL, CH Abstract In logic circuits, the number of fanouts a gate can drive is limited, and such limits are tighter in emerging technologies such as superconducting electronic circuits. In this work, we study the problem of resynthesizing a logic network with bounded-fanout gates while minimizing area. We 1) formulate this problem for a fixed target logic depth as an integer linear program (ILP) and present exact solutions for small logic networks, and 2) propose a top-down approach to construct a feasible solution to the ILP which yields an efficient algorithm for fanout bounded synthesis. When using the minimum depth achievable with unbounded fanouts as the target logic depth, our top-down approach achieves 11.82% better area as compared to the state-of-the-art with matching or better delays.
14:50 CEST	BPA6.3	SYNTHESIS WITH EXPLICIT DEPENDENCIES Speaker: Priyanka Golia, National University of Singapore and Indian Institute of Technology Kanpur, SG Authors: Priyanka Golia¹, Subhajit Roy² and Kuldeep S Meel³ ¹IIT Kanpur and NUS Singapore, SG; ²IIT Kanpur, IN; ³National University of Singapore, SG Abstract Quantified Boolean Formulas (QBF) extend propositional logic with quantification /forall exists for propositional variables. In QBF, an existentially quantified variable is allowed to depend on all universally quantified variables in its scope. Dependency Quantified Boolean Formulas (DQBF) restrict the dependencies of existentially quantified variables. In DQBF, existentially quantified variables have explicit dependencies on a subset of universally quantified variables, called Henkin dependencies. Given a Boolean specification between the set of inputs and outputs, the problem of Henkin synthesis is to synthesize each output variable as a function of its Henkin dependencies such that the specification is met. Henkin synthesis has wide-ranging applications, including verification of partial circuits, controller synthesis, and circuit realizability. In this work, we propose a data-driven approach for Henkin synthesis called Manthan3. On an extensive evaluation of over 563 instances arising from past DQBF solving competitions, we demonstrate that Manthan3 is competitive with state-of-the-art tools. Furthermore, Manthan3 solves 26 benchmarks that none of the current state-of-the-art techniques could solve.
15:15 CEST	BPA6.4	INTERACTIVE TECHNICAL PRESENTATIONS BY THE AUTHORS Speaker: Authors of the session, DATE, BE Author: Session Chairs, DATE, BE Abstract Participants can freely interact with authors during their interactive technical presentations.

BPA9 Memory-centric computing

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 14:00 CEST - 16:00 CEST
Location / Room: Gorilla Room 1.5.1

Session chair:
Said Hamdioui, TU Delft, NL

Time	Label	Presentation Title Authors
14:00 CEST	BPA9.1	MINIMIZING COMMUNICATION CONFLICTS IN NETWORK-ON-CHIP BASED PROCESSING-IN-MEMORY ARCHITECTURE Speaker: Hanbo Sun, Tsinghua University, CN Authors: Hanbo Sun¹, Tongxin Xie¹, Zhenhua Zhu¹, Guohao Dai², Huazhong Yang¹ and Yu Wang¹ ¹Tsinghua University, CN; ²Shanghai Jiao Tong University, CN Abstract Deep Neural Networks (DNNs) have made significant breakthroughs in various fields. However, their enormous computation and parameters seriously hinder their application. Emerging Processing-In-Memory (PIM) architectures provide extremely high energy efficiency to accelerate DNN computing. Moreover, Network-on-Chip (NoC) based PIM architectures significantly improve the scalability of PIM architectures. However, the contradiction between high communication and limited NoC bandwidth introduces severe communication conflicts. Existing work neglects the impact of communication conflicts. On the one hand, neglecting communication conflicts leads to the lack of precise performance estimations in the mapping process, making it hard to find optimal results. On the other hand, communication conflicts cause low NoC bandwidth utilization in the schedule process. And there is over 70% latency gap in existing work caused by communication conflicts. This paper proposes communication conflict optimized mapping and schedule strategies for NoC based PIM architectures. The proposed mapping strategy constructs communication conflict graphs to model communication conflicts. Based on this constructed graph, we adopt a Graph Neural Network (GNN) as a precise performance estimator. Our schedule strategy predefines the communication priority and NoC communication behavior tables for target DNN workloads. In this way, it can improve the NoC bandwidth utilization effectively. Compared with existing work, for typical classification DNNs on the CIFAR and ImageNet datasets, the proposed strategies reduce 78% latency and improve the throughput by 3.33x on average with negligible deployment and hardware overhead. Experimental results also show that our strategies decrease the average gap to ideal cases without communication conflicts from 80.7% and 70% to 12.3% and 1.26% for latency and throughput, respectively.
14:25 CEST	BPA9.2	HIERARCHICAL NON-STRUCTURED PRUNING FOR COMPUTING-IN-MEMORY ACCELERATORS WITH REDUCED ADC RESOLUTION REQUIREMENT Speaker: Wenlu Xue, Beihang University, CN Authors: Wenlu Xue¹, Jinyu Bai², Sifan Sun³ and Wang Kang² ¹Beihang Universiry, CN; ²Beihang University, CN; ³Beiahng University, CN Abstract The crossbar architecture, which is comprised of novel nano-devices, enables high-speed and energy-efficient computing-in-memory (CIM) for neural networks. However, the overhead from analog-to-digital converters (ADCs) substantially degrades the energy efficiency of CIM accelerators. In this paper, we introduce a hierarchical non-structured pruning strategy where value-level and bit-level pruning are performed jointly on neural networks to reduce the resolution of ADCs by using the famous alternating direction method of multipliers (ADMM). To verify the effectiveness, we deployed the proposed method to a variety of state-of-the-art convolutional neural networks on two image classification benchmark datasets: CIFAR10, and ImageNet. The results show that our pruning method can reduce the required resolution of ADCs to 2 or 3 bits with only slight accuracy loss (∼0.25%), and thus can improve the hardware efficiency by 180%.
14:50 CEST	BPA9.3	PIC-RAM: PROCESS-INVARIANT CAPACITIVE MULTIPLIER BASED ANALOG IN MEMORY COMPUTING IN 6T SRAM Speaker: Kailash Prasad, IIT Gandhinagar, IN Authors: Kailash Prasad, Aditya Biswas, Arpita Kabra and Joycee Mekie, IIT Gandhinagar, IN Abstract In-Memory Computing (IMC) is a promising approach to enabling energy-efficient Deep Neural Network-based applications on edge devices. However, analog domain dot product and multiplication suffers accuracy loss due to process variations. Furthermore, wordline degradation limits its minimum pulsewidth, creating additional non-linearity and limiting IMC's dynamic range and precision. This work presents a complete end-to-end process invariant capacitive multiplier based IMC in 6T-SRAM (PIC-RAM). The proposed architecture employs the novel idea of two-step multiplication in column-major IMC to support 4-bit multiplication. The PIC-RAM uses an operational amplifier-based capacitive multiplier to reduce bitline discharge allowing good enough WL pulse width. Further, it employs process tracking voltage reference and fuse capacitor to tackle dynamic and post-fabrication process variations, respectively. Our design is compute-disturb free and provides a high dynamic range. To the best of our knowledge, PIC-RAM is the first analog SRAM IMC approach to tackle process variation with a focus on its practical implementation. PIC-RAM has a high energy efficiency of about 25.6 TOPS/W for 4-bit X 4-bit multiplication and has only 0.5% area overheads due to the use of the capacitance multiplier. We obtain 409 bit-wise TOPS/W, which is about 2X better than state-of-the-art. PIC-RAM shows the TOP-1 accuracy for ResNet-18 on CIFAR10 and MNIST is 89.54% and 98.80% for 4bitX4bit multiplication.
15:15 CEST	BPA9.4	INTERACTIVE TECHNICAL PRESENTATIONS BY THE AUTHORS Speaker: Authors of the session, DATE, BE Author: Session Chairs, DATE, BE Abstract Participants can freely interact with authors during their interactive technical presentations.

CF1.1 Careers Fair – Company Presentations

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 14:00 CEST - 14:45 CEST
Location / Room: Marble Hall

Session chair:
Anton Klotz, Cadence Design Systems, DE

This is a Young People Programme event. During the Company Presentation Session, participating companies will introduce themselves and explain their business and working environment. Presenting companies include Cadence Design Systems, Synopsys, imec, X-FAB, Bosch, ICSense, Springer Nature, Siemens and RacyICs.

Time	Label	Presentation Title Authors
14:05 CEST	CF1.1.1	INTRODUCING CADENCE DESIGN SYSTEMS Presenter: Ben Woods, Cadence Design Systems, IE Author: Ben Woods, Cadence Design Systems, IE Abstract .
14:12 CEST	CF1.1.2	INTRODUCING SYNOPSYS Presenter: Xander Bergen Henegouwen, Synopsys, NL Author: Xander Bergen Henegouwen, Synopsys, NL Abstract .
14:19 CEST	CF1.1.3	INTRODUCING BOSCH Presenter: Matthias Kühnle, Bosh, DE Author: Matthias Kühnle, Bosh, DE Abstract .
14:26 CEST	CF1.1.4	INTRODUCING XFAB Presenter: Rachid Hamani, X-FAB, FR Author: Rachid Hamani, X-FAB, FR Abstract .
14:32 CEST	CF1.1.5	INTRODUCING RACYICS Presenter: Florian Bilstein, RacyICs, DE Author: Florian Bilstein, RacyICs, DE Abstract .
14:38 CEST	CF1.1.6	INTRODUCING SIEMENS Presenter: Jaclyn Krieger, Siemens, DE Author: Jaclyn Krieger, Siemens, DE Abstract .

FS3 Focus session: Integrated Photonics, a key technology for the future of semiconductor based systems

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 14:00 CEST - 15:30 CEST
Location / Room: Okapi Room 0.8.1

Session chair:
Twan Korthorst, Synopsys, NL

In this session we will discuss all about silicon / integrated photonics technology and devices, current use in pluggable optical transceivers and the roadmap towards optical I/O for high performance compute, programmable photonics, and optical accelerators for AI/ML. Basics of light, optics and integrated photonics and design tool requirements, solutions and trends will be discussed, both for photonic ICs as well as 3DIC and 3DHI multi-die/multi-domain systems.

Time	Label	Presentation Title Authors
14:00 CEST	FS3.1	DESIGN SOLUTIONS FOR PHOTONIC ICS AND HETEROGENOUS SYSTEMS Presenter: Twan Korthorst, Synopsys, NL Author: Twan Korthorst, Synopsys, NL Abstract This presentation will introduce integrated silicon photonics technology and applications, and dive into more detail about the required design tools and solutions. Not only for photonic devices and circuits, but also in the context of a 3D heterogeneously integrated system with digital and analog electrical chips, interposer and photonic ICs.
14:30 CEST	FS3.2	PUTTING THE LASER IN SILICON: HETEROGENEOUS OR HYBRID INTEGRATION Presenter: Martijn Heck, Eindhoven University of Technology, NL Author: Martijn Heck, Eindhoven University of Technology, NL Abstract Silicon is an important material for photonic integration, due to its compatibility with mature manufacturing infrastructure. However, owing to its indirect bandgap, it has no native lasers and amplifiers available, which severely limits its application and increases packaging costs and packaging challenges. In this talk I will outline the options for laser integration, and the associated challenges with respect to the design of such hybrid or heterogeneous photonic integrated circuits.
15:00 CEST	FS3.3	THE CRUCIAL ROLE OF INTEGRATED PHOTONICS IN THE EVOLUTION TOWARDS LOW-ENERGY OPEN AND PROGRAMMABLE OPTICAL NETWORKS Presenter: Vittorio Curri, Politecnico di Torino, IT Author: Vittorio Curri, Politecnico di Torino, IT Abstract Networking technologies are fast evolving to support the request for ubiquitous Internet access that is becoming a fundamental need for the modern and inclusive society. Such evolution needs the development of networks into open, disaggregated and programmable systems according to the software-defined networking (SDN) paradigm. To enable such an evolution the infrastructure control must be separated by data networking operations performed by the transceivers (TRXs) for optical circuits deployment and optical switches for transparent lightpaths routing. Moreover, spatial and power consumption reduction are fundamental need. Integrated photonics is the crucial technology to enable such an evolution. Regarding the TRXs, all commercial solutions have already adopted such technologies enabling pluggable TRX currently operating at data-rate up to 1.2 Tbps/wavelength with substantial reduction of space occupation and power consumption, besides costs. While, for the switching, solutions are still at the prototype level, and preliminary solutions are already available on the market. We will discuss on the different photonics integrated solutions focusing on the possible revolutionary impact on the optical networking.

LKS2 Later … with the keynote speakers

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 14:00 CEST - 15:30 CEST
Location / Room: Darwin Hall

Session chair:
Gi-Joon Nam, IBM, IEEE-CEDA President, US

Session co-chair:
Robert Wille, TU Munich, DE

W01 Eco-ES: Eco-design and circular economy of Electronic Systems

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 14:00 CEST - 18:00 CEST
Location / Room: Nightingale Room 2.6.1/2

Organisers:
Chiara Sandionigi, CEA, FR
Jean-Christophe Crebier, CNRS/G-INP/UGA, FR
Jonas Gustafsson, RISE, SE
David Bol, UCLouvain, BE

The impact of electronics on the environment is becoming an important issue, especially because of the number of systems growing exponentially. Eco-design and circular economy applied to Electronic Systems are thus becoming major challenges for our society to respond to the dangers for the environment: exponential increase in electronic waste generation, depletion of resources, contribution to climate change and poor resiliency to supply-chain issues. Electronic Systems designers willing to engage in eco-design face several difficulties, related in particular to a limited knowledge of the environmental impact from the design phase and the uncertain extension of the service lifetime of the system or parts of the system, owing to the variability in user behaviour and business models.

At DATE 2023, the workshop Eco-ES is devoted to Eco-design and circular economy of Electronic Systems. The objective of Eco-ES is to gather experts from both academia and industry, covering a wide scope in the environmental sustainability of Electronic Systems. Besides regular sessions with talks, a debate panel will offer a place for the audience to discuss and share ideas.

Workshop topics include:

Specification and modelling of sustainable Electronic Systems
Life Cycle Assessment tools and techniques
Electronic Design Automation tools for eco-design
Design Space Exploration including environmental aspects
Eco-reliability techniques to design sustainable systems with extended lifetime
Reparability methods
Reuse strategies
Recycling of Electronic Systems
Refurbish for a second life of the products
Sustainable cloud computing and datacenters
Inter-disciplinary works linking the technology aspects of eco-design and circular economy to social and economic sciences

W01.1 Workshop introduction and Keynote

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 14:00 CEST - 14:40 CEST
Location / Room: Nightingale Room 2.6.1/2

14:00 - 14:10: Workshop introduction

Chiara Sandionigi, CEA, France

14:10 - 14:40: Transitioning to a Circular Economy for Greener Electronic Systems

Manuel Rei, 3DS, France

W01.2 Circular economy for Electronic Systems

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 14:40 CEST - 15:40 CEST
Location / Room: Nightingale Room 2.6.1/2

Session chair:
Jonas Gustafsson, RISE, SE

14.40 - 15.00: The environmental footprint of semiconductor manufacturing

Cédric Rolin, IMEC, Belgium

15.00 - 15.20: Ecodesign engineering sticking to actual end-of-life operations

Marc Heude, Thales, France

15.20 - 15.40: A circular economy approach for strategic metals in electronics

Serge Kimbel, Weeecycling, France

W01.3 Poster session & Coffee break

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 15:40 CEST - 16:15 CEST
Location / Room: Nightingale Room 2.6.1/2

Aniah: A chip design methodology for Eco-design (Aniah)
Repair, refurbishment and recycling of electronic devices with Lithium-ion batteries (DTI)
Energy-efficient hardware reuse for sustainable data centers (LIRMM)
EECONE: European Ecosystem for green Electronics
Eco-innovation for Digital Systems and Integrated Circuits (CEA)

W01.4 Open call talks

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 16:15 CEST - 17:15 CEST
Location / Room: Nightingale Room 2.6.1/2

Session chair:
Jonas Gustafsson, RISE, SE

16.15 - 16.35: Sustainability analysis of indium phosphide technologies for RF applications

Benjamin Vanhouche, IMEC, Belgium

16.35 - 16.55: Eco-design and optimization of the edge cloud

Jonas Gustafsson, RISE, Sweden

16.55 - 17.15: Twinning digital ICT products: the digital product passport

Leandro Navarro, Universitat Politècnica de Catalunya, Spain

W01.5 Debate panel

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 17:15 CEST - 18:00 CEST
Location / Room: Nightingale Room 2.6.1/2

Session chairs:
David Bol, UC Louvain, BE
Chiara Sandionigi, CEA, FR

This session provides to the audience a place to debate about ecodesign, circular economy and end of life of electronic systems.

Invited speakers:

Manuel Rei, 3DS, France
Cédric Rolin, IMEC, Belgium
Marc Heude, Thales, France
Serge Kimbel, Weeecycling, France

W01.1 Workshop introduction and Keynote

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 14:00 CEST - 14:40 CEST
Location / Room: Nightingale Room 2.6.1/2

14:00 - 14:10: Workshop introduction

Chiara Sandionigi, CEA, France

14:10 - 14:40: Transitioning to a Circular Economy for Greener Electronic Systems

Manuel Rei, 3DS, France

W04 3rd Workshop Open-Source Design Automation (OSDA 2023)

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 14:00 CEST - 18:00 CEST
Location / Room: Okapi Room 0.8.3

Organisers:
Christian Krieg, TU Wien, AT
Claire Xenia Wolf, YosysHQ, AT
Andrea Borga, oliscience, NL

OSDA intends to provide an avenue for industry, academics, and hobbyists to collaborate, network, and share their latest visions and open-source contributions, with a view to promoting reproducibility and re-usability in the design automation space. DATE provides the ideal venue to reach this audience since it is the flagship European conference in this field - particularly poignant due to the recent efforts across the European Union (and beyond) that mandate “open access” for publicly funded research to both published manuscripts as well as software code necessary for reproducing its conclusions.

We invited authors of major tools and flows to talk about their recent activities to promote open-source hardware, and open-source design automation. Below you will find the list of speakers who kindly accepted our invitation already. The list is not yet complete, so hang on and watch out for updates!

The list is given in alphabetical order.

Andrew Kahng (OpenROAD), University of California San Diego, USA
Antonino Tumeo (SODA Synthesizer), Pacific Northwest National Laboratory (PNNL), USA
Claire Xenia Wolf (Yosys), YosysHQ, Austria
Frans Skarman (Spade), Linköping University, Sweden
Jean-Paul Chaput (Coriolis2), Sorbonne Université, France
Jim Lewis (OSVVM), SynthWorks, USA
Larry Doolittle (vhd2vl), Lawrence Berkeley National Labs, USA
Matthew Guthaus (OpenRAM), University of California Santa Cruz, USA
Myrtle Shah (nextpnr, FABulous), Heidelberg University, Germany
Rishiyur Nikhil (BSV and BH), Bluespec Inc., USA
Tim Edwards (Caravel), Efabless, Inc., USA
Tristan Gingold (GHDL), CERN, Switzerland
Tsung-Wei Huang (TaskFlow), University of Utah, USA

A secondary objective of this workshop is to provide a peer-reviewed forum for researchers to publish “enabling” technology such as infrastructure or tooling as open-source contributions -- standalone technology that would not normally be regarded as novel by traditional conferences -- such that others inside and outside of academia may build upon it.

W04.1 Welcome Session

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 14:00 CEST - 14:15 CEST
Location / Room: Okapi Room 0.8.3

Workshop opening and poster pitch

W04.2 Front-end and Applications

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 14:15 CEST - 16:00 CEST
Location / Room: Okapi Room 0.8.3

Time	Label	Presentation Title Authors
14:15 CEST	W04.2.1	LARRY DOOLITTLE Abstract vhd2vl is a simple and open-source stand-alone program that converts synthesizable VHDL to Verilog. While it has plenty of limitations, it has proved useful to many developers since its start in 2004. This talk will cover its strengths, weaknesses, and alternatives.
14:30 CEST	W04.2.2	ANTONINO TUMEO Abstract This talk presents the SODA (Software Defined Accelerators) framework, an open-source modular, multi-level, no-human-in-the-loop, hardware compiler that enables end-to-end generation of specialized accelerators from high-level data science frameworks. SODA is composed of SODA-Opt, a high-level frontend developed in MLIR that interfaces with domain-specific programming environments and allows performing system level design, and Bambu, a state-of-the-art high-level synthesis (HLS) engine that can target different device technologies. The framework implements design space exploration as compiler optimization passes. We show how the modular, yet tight, integration of the high-level optimizer and lower-level HLS tools enables the generation of accelerators optimized for the computational patterns of novel "converged" applications. We then discuss some of the research opportunities that such an open-source framework allows.
14:45 CEST	W04.2.3	MATTHEW GUTHAUS Abstract In this talk, Prof. Guthaus presents the current status of the OpenRAM project including Skywater 130 tape-out results. In addition, Prof. Guthaus will discuss the future roadmap of the OpenRAM project features and support for newer technologies.
15:00 CEST	W04.2.4	TSUNG-WEI HUANG Abstract Today's EDA algorithms demand large parallel and heterogeneous computing resources for performance. However, writing parallel EDA algorithms is extremely challenging due to highly complex and irregular patterns. This talk will present a novel programming system to help tackle the parallelization challenges of building high-performance EDA algorithms.
15:15 CEST	W04.2.5	TIM EDWARDS Abstract This talk explores how hardware projects designed using an open source PDK rely too much on precise data which may not be available, and how problems can be avoided by certain design methodologies such as two-phase clocking, negative-edge clocking, margining, and monte carlo simulation. While open PDK data can be made more reliable by cross validation with multiple tools and, ultimately, measurement, good design practices can achieve working silicon without absolute certainty.
15:30 CEST	W04.2.6	RISHIYUR NIKHIL Abstract BSV and BH, the Bluespec HLHDLs (High-Level Languages for Hardware Design), emerged from ideas in formal specification (Term Rewriting Systems), functional programming (Haskell), and automatic synthesis of RTL from specifications. BSV has been used in some major commercial ASIC designs and is used widely in FPGA projects. The BSV/BH compiler (written in Haskell) was open-sourced in 2020 (https://github.com/B-Lang-org/bsc) and today's projects are centered around RISC-V design and verification, and on accelerators.
15:45 CEST	W04.2.7	FRANS SKARMAN Abstract Frans will present Spade, a new open source standalone hardware description language. He will show how Spade's abstractions and tooling, which is inspired by software languages, improves the productivity of an HDL without sacrificing low level control.

W04.3 Poster Session (coffee break)

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 16:00 CEST - 16:30 CEST
Location / Room: Okapi Room 0.8.3

Davide Cieri, Nicolò Vladi Biesuz, Rimsky Alejandro Rojas Caballero, Francesco Gonnella, Nico Giangiacomi, Guillermo Loustau De Linares and Andrew Peck: Hog 2023.1: a collaborative management tool to handle Git-based HDL repository
Lucas Klemmer and Daniel Grosse: Programming Language Assisted Waveform Analysis: A Case Study on the Instruction Performance of SERV
Vamsi Vytla and Larry Doolittle: Newad: A register map automation tool for Verilog
Stefan Riesenberger and Christian Krieg: Towards Power Characterization of FPGA Architectures To Enable Open-Source Power Estimation Using Micro-Benchmarks

W04.4 Back-End and Verification

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 16:30 CEST - 18:00 CEST
Location / Room: Okapi Room 0.8.3

Time	Label	Presentation Title Authors
16:30 CEST	W04.4.1	ANDREW KAHNG Abstract OpenROAD (https://theopenroadproject.org) is an open-source RTL-to-GDS tool that generates manufacturable layout from a given hardware description – in 24 hours, at advanced foundry nodes. OpenROAD lowers the cost, expertise and schedule barriers to hardware design, thus providing a platform for research, education and system innovation. This talk will present current status of the OpenROAD project and the roadmap for OpenROAD as it seeks to enable VLSI/EDA education, early design space exploration for system designers, research on machine learning in EDA, and more.
16:45 CEST	W04.4.2	JEAN-PAUL CHAPUT Abstract The talk will be focused on two majors points : why Open Hardware is as important as Open Source Software and the major challenges in building FOSS EDA tools.
17:00 CEST	W04.4.3	MYRTLE SHAH Abstract Myrtle will introduce some of the recent developments in nextpnr; including easier ways of prototyping new architectures as well as some core algorithm improvements. They will also introduce FABulous, a highly flexible open source eFPGA fabric generator, and its close integration with nextpnr.
17:15 CEST	W04.4.4	TRISTAN GINGOLD Abstract GHDL is an open-source VHDL simulator and synthesis tool. This talk will present the latest added features and some ideas for future development (in particular mixed simulation)
17:30 CEST	W04.4.5	JIM LEWIS Abstract Open Source VHDL Verification Methodology (OSVVM) provides VHDL with buzz word verification capabilities including Transaction Level Modeling, Constrained Random, Functional Coverage, Scoreboards, FIFOs, Memory Models, Error and Message handling, and Test Reporting that are simple to use and feel like built-in language features. OSVVM has grown rapidly during the COVID years, giving us better capability, better test reporting (HTML and Junit), and scripting that is simple to use (and works with most VHDL simulators). This presentation shows how these advances fit into the overall OSVVM Methodology.
17:45 CEST	W04.4.6	CLAIRE XENIA WOLF Abstract In her talk, Claire will discuss recent developments in open-source verification tools. Claire will briefly present equivalence checking with Yosys (EQY) and mutation cover with Yosys (MCY), and will highlight potential future directions.

W04.1 Welcome Session

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 14:00 CEST - 14:15 CEST
Location / Room: Okapi Room 0.8.3

Workshop opening and poster pitch

W04.2 Front-end and Applications

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 14:15 CEST - 16:00 CEST
Location / Room: Okapi Room 0.8.3

Time	Label	Presentation Title Authors
14:15 CEST	W04.2.1	LARRY DOOLITTLE Abstract vhd2vl is a simple and open-source stand-alone program that converts synthesizable VHDL to Verilog. While it has plenty of limitations, it has proved useful to many developers since its start in 2004. This talk will cover its strengths, weaknesses, and alternatives.
14:30 CEST	W04.2.2	ANTONINO TUMEO Abstract This talk presents the SODA (Software Defined Accelerators) framework, an open-source modular, multi-level, no-human-in-the-loop, hardware compiler that enables end-to-end generation of specialized accelerators from high-level data science frameworks. SODA is composed of SODA-Opt, a high-level frontend developed in MLIR that interfaces with domain-specific programming environments and allows performing system level design, and Bambu, a state-of-the-art high-level synthesis (HLS) engine that can target different device technologies. The framework implements design space exploration as compiler optimization passes. We show how the modular, yet tight, integration of the high-level optimizer and lower-level HLS tools enables the generation of accelerators optimized for the computational patterns of novel "converged" applications. We then discuss some of the research opportunities that such an open-source framework allows.
14:45 CEST	W04.2.3	MATTHEW GUTHAUS Abstract In this talk, Prof. Guthaus presents the current status of the OpenRAM project including Skywater 130 tape-out results. In addition, Prof. Guthaus will discuss the future roadmap of the OpenRAM project features and support for newer technologies.
15:00 CEST	W04.2.4	TSUNG-WEI HUANG Abstract Today's EDA algorithms demand large parallel and heterogeneous computing resources for performance. However, writing parallel EDA algorithms is extremely challenging due to highly complex and irregular patterns. This talk will present a novel programming system to help tackle the parallelization challenges of building high-performance EDA algorithms.
15:15 CEST	W04.2.5	TIM EDWARDS Abstract This talk explores how hardware projects designed using an open source PDK rely too much on precise data which may not be available, and how problems can be avoided by certain design methodologies such as two-phase clocking, negative-edge clocking, margining, and monte carlo simulation. While open PDK data can be made more reliable by cross validation with multiple tools and, ultimately, measurement, good design practices can achieve working silicon without absolute certainty.
15:30 CEST	W04.2.6	RISHIYUR NIKHIL Abstract BSV and BH, the Bluespec HLHDLs (High-Level Languages for Hardware Design), emerged from ideas in formal specification (Term Rewriting Systems), functional programming (Haskell), and automatic synthesis of RTL from specifications. BSV has been used in some major commercial ASIC designs and is used widely in FPGA projects. The BSV/BH compiler (written in Haskell) was open-sourced in 2020 (https://github.com/B-Lang-org/bsc) and today's projects are centered around RISC-V design and verification, and on accelerators.
15:45 CEST	W04.2.7	FRANS SKARMAN Abstract Frans will present Spade, a new open source standalone hardware description language. He will show how Spade's abstractions and tooling, which is inspired by software languages, improves the productivity of an HDL without sacrificing low level control.

W01.2 Circular economy for Electronic Systems

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 14:40 CEST - 15:40 CEST
Location / Room: Nightingale Room 2.6.1/2

Session chair:
Jonas Gustafsson, RISE, SE

14.40 - 15.00: The environmental footprint of semiconductor manufacturing

Cédric Rolin, IMEC, Belgium

15.00 - 15.20: Ecodesign engineering sticking to actual end-of-life operations

Marc Heude, Thales, France

15.20 - 15.40: A circular economy approach for strategic metals in electronics

Serge Kimbel, Weeecycling, France

CF1.2 Careers Fair – Panel on Industry Career Perspectives

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 14:45 CEST - 15:30 CEST
Location / Room: Marble Hall

Session chair:
Oliver Bringmann, University of Tübingen, DE

Panellists:
Heinz Riener, Cadence Design Systems, DE
Björn Hartmann, Synopsys, DE
Johannes Sanwald, Robert Bosch GmbH, DE
Alessandro Brunetti, iQrypto, BE

Presenter:
Heinz Riener, Cadence Design Systems, DE

This is a Young People Programme event. At the Panel on Industry Career Perspectives, Young Professionals from Companies and startups will talk about their experience changing from academia to industry or starting a startup.

CF2 Careers Fair – Speed Dating

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 15:30 CEST - 16:00 CEST
Location / Room: Marble Hall

Session chair:
Anton Klotz, Cadence Design Systems, DE

This is a Young People Programme event. At the Speed Dating event, attendees of the Young People Programme can meet the recruiters and exchange business cards and CVs. Recruiters from Cadence Design Systems, X-FAB, Synopsys and Bosch will attend.

W01.3 Poster session & Coffee break

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 15:40 CEST - 16:15 CEST
Location / Room: Nightingale Room 2.6.1/2

Aniah: A chip design methodology for Eco-design (Aniah)
Repair, refurbishment and recycling of electronic devices with Lithium-ion batteries (DTI)
Energy-efficient hardware reuse for sustainable data centers (LIRMM)
EECONE: European Ecosystem for green Electronics
Eco-innovation for Digital Systems and Integrated Circuits (CEA)

CF3 Careers Fair – Academia

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 16:00 CEST - 17:30 CEST
Location / Room: Marble Hall

Session chair:
Nele Mentens, KU Leuven, BE

Careers Fair Academia brings together researchers from academia with open positions and enthusiastic students who are looking for a position in academia. The academics can present their exciting research plans and students can get in touch with them to learn more about it. In addition, you will get the chance to hear about different academic career path across Europe, opportunities, challenges, similarities, and differences. Our panelists, Prof. Diana Goehringer (TU Dresden), Prof. Ahmed Hemani (KIT), Prof. Alberto Bosio (ECL), and Prof. Lukas Sekanina (VUTBR) share their valuable experiences and discuss with you any questions you may have about your future career path in academia.

Time	Label	Presentation Title Authors
16:00 CEST	CF3.1	OPEN POSITIONS Presenter: Careers Fair – Academia Participants, DATE, BE Author: Careers Fair – Academia Participants, DATE, BE Abstract Fair participants advertise new and upcoming research initiatives with academic open positions.
	CF3.2	PANEL DISCUSSION Presenter: Careers Fair – Academia Panelists, DATE, BE Author: Careers Fair – Academia Panelists, DATE, BE Abstract Panel discussion on academic career paths in different countries.

W04.3 Poster Session (coffee break)

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 16:00 CEST - 16:30 CEST
Location / Room: Okapi Room 0.8.3

Davide Cieri, Nicolò Vladi Biesuz, Rimsky Alejandro Rojas Caballero, Francesco Gonnella, Nico Giangiacomi, Guillermo Loustau De Linares and Andrew Peck: Hog 2023.1: a collaborative management tool to handle Git-based HDL repository
Lucas Klemmer and Daniel Grosse: Programming Language Assisted Waveform Analysis: A Case Study on the Instruction Performance of SERV
Vamsi Vytla and Larry Doolittle: Newad: A register map automation tool for Verilog
Stefan Riesenberger and Christian Krieg: Towards Power Characterization of FPGA Architectures To Enable Open-Source Power Estimation Using Micro-Benchmarks

W01.4 Open call talks

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 16:15 CEST - 17:15 CEST
Location / Room: Nightingale Room 2.6.1/2

Session chair:
Jonas Gustafsson, RISE, SE

16.15 - 16.35: Sustainability analysis of indium phosphide technologies for RF applications

Benjamin Vanhouche, IMEC, Belgium

16.35 - 16.55: Eco-design and optimization of the edge cloud

Jonas Gustafsson, RISE, Sweden

16.55 - 17.15: Twinning digital ICT products: the digital product passport

Leandro Navarro, Universitat Politècnica de Catalunya, Spain

ASD3 ASD technical session: Autonomy for systems perception, control and optimization

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 16:30 CEST - 18:00 CEST
Location / Room: Gorilla Room 1.5.4/5

Session chair:
Rolf Ernst, TU Braunschweig, DE

Time	Label	Presentation Title Authors
16:30 CEST	ASD3.1	AUTONOMOUS HYPERLOOP CONTROL ARCHITECTURE DESIGN USING MAPE-K Speaker: Julian Demicoli, TU Munich, DE Authors: Julian Demicoli, Laurin Prenzel and Sebastian Steinhorst, TU Munich, DE Abstract In the very recent past, there has been a trend for passenger transport towards electrification of the vehicles to reduce greenhouse gas emissions. However, due to the low energy density of battery technology, electrification of airplanes is not possible with current technologies. Here, Hyperloop systems can offer a climate-friendly alternative to short-haul flights but face some technical challenges to be resolved. In contrast to conventional rail systems, the Hyperloop concept uses magnetic propulsion and levitation to operate and has no physical contact with the environment. Consequently, mechanical backup solutions do not suffice to avoid catastrophic events in case of failure. Software solutions must, therefore, ensure fail-operational behavior, which requires autonomous adaptability to uncertain states. The MAPE-K approach offers a solution to achieve such adaptability. In this paper, we present a hierarchical architecture that combines the MAPE-K concept with the Simplex concept to achieve self-adaptive behavior. We impose our autonomous architecture on the controller design for the levitation system of a Hyperloop pod and show that this controller, designed using our methodology, outperforms a conventional PID controller by up to 76%.
16:53 CEST	ASD3.2	REINFORCEMENT-LEARNING-BASED JOB-SHOP SCHEDULING FOR INTELLIGENT INTERSECTION MANAGEMENT Speaker: Shao-Ching Huang, National Taiwan University, TW Authors: Shao-Ching Huang¹, Kai-En Lin¹, Cheng-Yen Kuo¹, Li-Heng Lin¹, Muhammed Sayin² and Chung-Wei Lin¹ ¹National Taiwan University, TW; ²Bilkent University, TR Abstract The goal of intersection management is to organize vehicles to pass the intersection safely and efficiently. Due to the technical advance of connected and autonomous vehicles, intersection management becomes more intelligent and potentially unsignalized. In this paper, we propose a reinforcement-learning-based methodology to train a centralized intersection manager. We define the intersection scheduling problem with a graph-based model and transform it to the job-shop scheduling problem (JSSP) with additional constraints. To utilize reinforcement learning, we model the scheduling procedure as a Markov decision process (MDP) and train the agent with the proximal policy optimization (PPO). A grouping strategy is also developed to apply the trained model to streams of vehicles. Experimental results show that the learning-based intersection manager is especially effective with high traffic densities. This paper is the first work in the literature to apply reinforcement learning on the graph-based model. The proposed methodology can flexibly deal with any conflicting scenario and indicate the applicability of reinforcement learning to intelligent intersection management.
17:15 CEST	ASD3.3	BIO-INSPIRED AUTONOMOUS EXPLORATION POLICIES WITH CNN-BASED OBJECT DETECTION ON NANO-DRONES Speaker: Lorenzo Lamberti, Università di Bologna, IT Authors: Lorenzo Lamberti¹, Luca Bompani¹, Victor Kartsch Morinigo¹, Manuele Rusci², Daniele Palossi³ and Luca Benini⁴ ¹Università di Bologna, IT; ²KU Leuven, BE; ³ETH Zurich, CH; ⁴University of Bologna, ETH Zurich, IT Abstract Nano-sized drones, with palm-sized form factor, are gaining relevance in the Internet-of-Things ecosystem. Achieving a high degree of autonomy for complex multi-objective missions (e.g., safe flight, exploration, object detection) is extremely challenging for the onboard chip-set due to tight size, payload (<10g), and power envelope constraints, which strictly limit both memory and computation. Our work addresses this complex problem by combining bio-inspired navigation policies, which rely on time-of-flight distance sensor data, with a vision-based convolutional neural network (CNN) for object detection. Our field-proven nano-drone is equipped with two microcontroller units (MCUs), a single-core ARM Cortex-M4 (STM32) for safe navigation and exploration policies, and a parallel ultra-low power octa-core RISC-V (GAP8) for onboard CNN inference, with a power envelope of just 134mW, including image sensors and external memories. The object detection task achieves a mean average precision of 50% (at 1.6 frame/s) on an in-field collected dataset. We compare four bio-inspired exploration policies and identify a pseudo-random policy to achieve the highest coverage area of 83% in a ~36m^2 unknown room in a 3 minutes flight. By combining the detection CNN and the exploration policy, we show an average detection rate of 90% on six target objects in a never-seen-before environment.
17:38 CEST	ASD3.4	BUTTERFLY EFFECT ATTACK: TINY AND SEEMINGLY UNRELATED PERTURBATIONS FOR OBJECT DETECTION Speaker: Nguyen Anh Vu Doan, Fraunhofer IKS, DE Authors: Nguyen Anh Vu Doan, Arda Yueksel and Chih-Hong Cheng, Fraunhofer IKS, DE Abstract This work aims to explore and identify tiny and seemly unrelated perturbations of images in object detection that will lead to performance degradation. While tininess can naturally be defined using L_p norms, we characterize the degree of "unrelatedness" of an object by the pixel distance between the occurred perturbation and the object. Triggering errors in prediction while satisfying two objectives can be formulated as a multi-objective optimization problem where we utilize genetic algorithms to guide the search. The result successfully demonstrates that (invisible) perturbations on the right part of the image can drastically change the outcome of object detection on the left. An extensive evaluation reaffirms our conjecture that transformer-based object detection networks are more susceptible to butterfly effects in comparison to single-stage object detection networks such as YOLOv5.

FS4 Focus session: The Past, Present and Future of Chiplets

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 16:30 CEST - 18:00 CEST
Location / Room: Okapi Room 0.8.1

Session chair:
Krishnendu Chakrabarty, Arizona State University,, US

Time	Label	Presentation Title Authors
16:30 CEST	FS4.1	THE NEXT ERA FOR CHIPLET INNOVATION Speaker: Gabriel Loh, Advanced Micro Devices, Inc., US Authors: Gabriel Loh and Raja Swaminathan, Advanced Micro Devices, Inc., US Abstract Moore's Law is slowing down and the associated costs are simultaneously increasing. These pressures have given rise to new approaches utilizing advanced packaging and integration such as chiplets, interposers, and 3D stacking. We first describe the key technology drivers and constraints that motivate chiplet-based architectures, exploring several product case studies to highlight how different chiplet strategies have been developed to address different design objectives. We detail multiple generations of chiplet-based CPU architectures as well as the recent addition of 3D stacking options to further enhance processor capabilities. Across the industry, we are still collectively in the relatively early days of advanced packaging and 3D integration. As silicon scaling only gets more challenging and expensive while demand for computation continues to soar, we anticipate the transition to a new generation of chiplet architectures that utilize increasing combinations of 2D, 2.5D, and 3D integration and packaging technologies to continue to deliver compelling SoC solutions. However, this next era for chiplet innovation will face a variety of challenges. We will explore many of these technical topics, which in turn provide rich research opportunities for the community to explore and innovate.
17:00 CEST	FS4.2	ACHIEVING DATACENTER-SCALE PERFORMANCE THROUGH CHIPLET-BASED MANYCORE ARCHITECTURES Speaker: Partha Pande, Washington State University, US Authors: Harsh Sharma¹, Sumit Mandal², Jana Doppa¹, Umit Ogras³ and Partha Pratim Pande¹ ¹Washington State University, US; ²Indian Institute of Science, IN; ³University of Wisconsin - Madison, US Abstract Chiplet-based 2.5D systems that integrate multiple smaller chips on a single die are gaining popularity for executing both compute- and data-intensive applications. While smaller chips (chiplets) reduce fabrication costs, they also provide less functionality. Hence, manufacturing several smaller chiplets and combining them into a single system enables the functionality of a larger monolithic chip without prohibitive fabrication costs. The chiplets are connected through the network-on-interposer (NoP). Designing a high-performance and energy-efficient NoP architecture is essential as it enables large-scale chiplet integration. This paper highlights the challenges and existing solutions for designing suitable NoP architectures targeted for 2.5D systems catered to datacenter-scale applications. We also highlight the future research challenges stemming from the current state-of-the-art to make the NoP-based 2.5D systems widely applicable.
17:30 CEST	FS4.3	MACHINE LEARNING ACCELERATORS IN 2.5D CHIPLET PLATFORMS WITH SILICON PHOTONICS Speaker: Sudeep Pasricha, Colorado State University, US Authors: Febin Sunny, Ebadollah Taheri, Mahdi Nikdast and Sudeep Pasricha, Colorado State University, US Abstract Domain-specific machine learning (ML) accelerators such as Google's TPU and Apple's Neural Engine now dominate CPUs and GPUs for energy-efficient ML processing. However, the evolution of electronic accelerators is facing fundamental limits due to the limited computation density of monolithic processing chips and the reliance on slow metallic interconnects. We present a vision of how optical computation and communication can be integrated into 2.5D chiplet platforms to drive an entirely new class of sustainable and scalable ML hardware accelerators. We describe how cross-layer design and fabrication of optical devices, circuits, and architectures, and hardware/software codesign can help design efficient photonics-based 2.5D chiplet platforms to accelerate emerging ML workloads.

SD6 Reconfigurable architectures, machine learning and circuit design

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 16:30 CEST - 18:00 CEST
Location / Room: Gorilla Room 1.5.3

Session chair:
Jan Moritz Joseph, RWTH Aachen University, DE

16:30 CEST until 16:54 CEST: Pitches of regular papers

16:54 CEST until 18:00 CEST: Interactive technical presentations by the authors of regular papers and extended abstracts

Regular Papers

Time	Label	Presentation Title Authors
16:30 CEST	SD6.1	TOWARDS EFFICIENT NEURAL NETWORK MODEL PARALLELISM ON MULTI-FPGA PLATFORMS Speaker: David Rodriguez, Universitat Politècnica de València, ES Authors: David Rodriguez Agut¹, Rafael Tornero¹ and Josè Flich² ¹Universitat Politecnica de Valencia, ES; ²Associate Professor, Universitat Politècnica de València, ES Abstract Nowadays, convolutional neural networks (CNN) are common in a wide range of applications. Their high accuracy and efficiency contrast with their computing requirements, leading to the search for efficient hardware platforms. FPGAs are suitable due to their flexibility, energy efficiency and low latency. However, the ever increasing complexity of CNNs demands higher capacity devices, forcing the need for multi-FPGA platforms. In this paper, we present a multi-FPGA platform with distributed shared memory support for the inference of CNNs. Our solution, in contrast with previous works, enables combining different model parallelism strategies applied to CNNs, thanks to the distributed shared memory support. For a four FPGA setting, the platform reduces the execution time of 2D convolutions by a factor of 3.95 when compared to single FPGA. The inference of standard CNN models is improved by factors ranging 3.63-3.87.
16:33 CEST	SD6.2	HIGH-ACCURACY LOW-POWER RECONFIGURABLE ARCHITECTURES FOR DECOMPOSITION-BASED APPROXIMATE LOOKUP TABLE Speaker: Xingyue Qian, Shanghai Jiao Tong University, CN Authors: Xingyue Qian¹, Chang Meng¹, Xiaolong Shen², Junfeng Zhao², Leibin Ni² and Weikang Qian¹ ¹Shanghai Jiao Tong University, CN; ²2012Labs, Huawei technologies Co.,Ltd., CN Abstract Storing pre-computed results of frequently-used functions into lookup table (LUT) is a popular way to improve energy efficiency, but its advantage diminishes as the number of input bits increases. A recent work shows that by decomposing the target function approximately, the total LUT entries can be dramatically reduced, leading to significant energy saving. However, its heuristic approximate decomposition algorithm leads to sub-optimal approximation quality. Also, its rigid hardware architecture only supports disjoint decomposition and may have unnecessary extra power consumption sometimes. To address these issues, we develop a novel approximate decomposition algorithm based on beam search and simulated annealing, which can reduce 11.1% approximation error. We also propose a nondisjoint approximate decomposition method and two reconfigurable architectures. The first has 10.4% less error using 19.2% less energy and the second has 23.0% less error with same energy consumption compared to the state-of-the-art design.
16:36 CEST	SD6.3	FPGA ACCELERATION OF GCN IN LIGHT OF THE SYMMETRY OF GRAPH ADJACENCY MATRIX Speaker: Gopikrishnan Raveendran Nair, Arizona State University, US Authors: Gopikrishnan Raveendran Nair¹, Han-sok Suh¹, Mahantesh Halappanavar², Frank Liu³, Jae-sun Seo¹ and Yu Cao¹ ¹Arizona State University, US; ²Pacific Northwest National Laboratory, US; ³Oak Ridge National Lab, US Abstract Graph Convolutional Neural Networks (GCNs) are widely used to process large-scale graph data. Different from deep neural networks (DNNs), GCNs are sparse, irregular, and unstructured, posing unique challenges to hardware acceleration with regular processing elements (PEs). In particular, the adjacency matrix of a GCN is extremely sparse, leading to frequent but irregular memory access, low spatial/temporal data locality and poor data reuse. Furthermore, a realistic graph usually consists of unstructured data (e.g., unbalanced distributions), creating significantly different processing times and imbalanced workload for each node in GCN acceleration. To overcome these challenges, we propose an end-to-end hardware-software co-design to accelerate GCNs on resourceconstrained FPGAs with the features including: (1) A custom dataflow that leverages symmetry along the diagonal of the adjacency matrix to accelerate feature aggregation for undirected graphs. We utilize either the upper or the lower triangular matrix of the adjacency matrix to perform aggregation in GCN to improve data reuse. (2) Unified compute cores for both aggregation and transform phases, with full support to the symmetry-based dataflow. These cores can be dynamically reconfigured to the systolic mode for transformation or as individual accumulators for aggregation in GCN processing. (3) Preprocessing of the graph in software to rearrange the edges and features to match the custom dataflow. This step improves the regularity in memory access and data reuse in the aggregation phase. Moreover, we quantize the GCN precision from FP32 to INT8 to reduce the memory footprint without losing the inference accuracy. We implement our accelerator design in Intel Stratix10 MX FPGA board with HBM2, and demonstrate 1.3×-110.5× improvement in end-to-end GCN latency as compared to the state-of the-art FPGA implementations, on the graph datasets of Cora, Pubmed, Citeseer and Reddit.
16:39 CEST	SD6.4	PR-ESP: AN OPEN-SOURCE PLATFORM FOR DESIGN AND PROGRAMMING OF PARTIALLY RECONFIGURABLE SOCS Speaker: Biruk Seyoum, Columbia University, US Authors: Biruk Seyoum, Davide Giri, Kuan-Lin Chiu, Bryce Natter and Luca Carloni, Columbia University, US Abstract Despite its presence for more than two decades and its proven benefits in expanding the space of system design, dynamic partial reconfiguration (DPR) is rarely integrated into frameworks and platforms that are used to design complex reconfigurable system-on-chip (SoC) architectures. This is due to the complexity of the DPR FPGA flow as well as the lack of architectural and software runtime support to enable and fully harness DPR. Moreover, as DPR designs involve additional design steps and constraints, they often have a higher FPGA compilation (RTL-to-bitstream) runtime compared to equivalent monolithic designs. In this work, we present PR-ESP, an open-source platform for a system-level design flow of partially reconfigurable FPGA-based SoC architectures targeting embedded applications that are deployed on resource-constrained FPGAs. Our approach is realized by combining SoC design methodologies and tools from the open-source ESP platform with a fully-automated DPR flow that features a novel size-driven technique for parallel FPGA compilation. We also developed a software runtime reconfiguration manager on top of Linux. Finally, we evaluated our proposed platform using the WAMI-App benchmark application on Xilinx VC707.
16:42 CEST	SD6.5	ISOP: MACHINE LEARNING ASSISTED INVERSE STACK-UP OPTIMIZATION FOR ADVANCED PACKAGE DESIGN Speaker: Hyunsu Chae, University of Texas at Austin, US Authors: Hyunsu Chae¹, Bhyrav Mutnury², Keren Zhu¹, Douglas Wallace², Douglas Winterberg², Daniel de Araujo³, Jay Reddy², Adam Klivans¹ and David Z. Pan¹ ¹University of Texas at Austin, US; ²Dell Infrastructure Solutions Group, US; ³Siemens EDA, US Abstract Future computing calls for heterogeneous integration, e.g., the recent adoption of the chiplet methodology. However, high-speed cross-chip interconnects and packaging shall be critical for the overall system performance. As an example of advanced packaging, a high-density interconnect (HDI) printed circuit board (PCB) has been widely used in complex electronics from cell phones to computing servers. A modern HDI PCB may have over 20 layers, each with its unique material properties and geometrical dimensions, i.e., stack-up, to meet various design constraints and performance optimizations. However, stack-up design is usually done manually in the industry, where experienced designers may devote many hours to adjusting the physical dimensions and materials to meet the desired specifications. This process, however, is time-consuming, tedious, and sub-optimal, largely depending on the designer's expertise. In this paper, we propose to automate the stack-up design with a new framework, ISOP, using machine learning for inverse stack-up optimization for advanced package design. Given a target design specification, ISOP automatically searches for ideal stack-up design parameters while optimizing performance. We develop a novel machine learning-assisted hyper-parameter optimization method to make the search efficient and reliable. Experimental results demonstrate that ISOP is better in figure-of-merit (FoM) than conventional simulated annealing and Bayesian optimization algorithms, with all our design targets met with a shorter runtime. We also compare our fully-automated ISOP with expert designers in the industry and achieve very promising results, with orders of magnitude reduction of turn-around time.
16:45 CEST	SD6.6	FAST AND ACCURATE WIRE TIMING ESTIMATION BASED ON GRAPH LEARNING Speaker: Yuyang Ye, Southeast University, CN Authors: Yuyang Ye¹, Tinghuan Chen², Yifei Gao¹, Hao Yan¹, Bei Yu² and Longxing Shi¹ ¹Southeast University, CN; ²The Chinese University of Hong Kong, HK Abstract Accurate wire timing estimation has become a bottleneck in timing optimization since it needs a long turn-around time using a sign-off timer. The gate timing can be calculated accurately using lookup tables in cell libraries. In comparison, the accuracy and efficiency of wire timing calculation for complex RC nets are extremely hard to trade off. The limited number of wire paths opens a door for the graph learning method in wire timing estimation. In this work, we present a fast and accurate wire timing estimator based on a novel graph learning architecture, namely GNNTrans. It can generate wire path representations by aggregating local structure information and global relationships of whole RC nets, which cannot be collected with traditional graph learning work efficiently. Experimental results on both tree-like and non-tree nets demonstrate improved accuracy, with the max error of wire delay being lower than 5 ps. In addition, our estimator can predict the timing of over 200K nets in less than 100 secs. The fast and accurate work can be integrated into incremental timing optimization for routed designs.
16:48 CEST	SD6.7	DTOC: INTEGRATING DEEP-LEARNING DRIVEN TIMING OPTIMIZATION INTO STATE-OF-THE-ART COMMERCIAL EDA TOOL Speaker: Kyungjoon Chang, Seoul National University, KR Authors: Kyungjoon Chang¹, Heechun Park², Jaehoon Ahn¹, Kyu-Myung Choi¹ and Taewhan Kim¹ ¹Seoul National University, KR; ²Kookmon University, KR Abstract Recently, deep-learning (DL) models have paid a considerable attention to timing prediction in the placement and routing (P&R) flow. As yet, the DL-based prior works are con fined to timing prediction at the time-consuming global routing stage, and very few have addressed the timing prediction problem at the placement, i.e., at the pre-route stage. This is because it is not easy to "accurately” predict various timing parameters at the pre-route stage. Moreover, no work has addressed a seamless link of timing prediction at the pre-route stage to the final timing optimization through making use of commercial P&R tools. In this work, we propose a framework called DTOC, to be used at the pre-route stage for this end. Precisely, the framework is composed of two models: (1) a DL-driven arc delay and arc output slew prediction model, performing in two levels: (level-1) predicting net resistance (R), net capacitance (C), and arc length (Len), followed by (level-2) predicting arc delay and arc output slew from the R/C/Len prediction obtained in (level-1); (2) a timing optimization model, which uses the inference outcomes in our DL-driven prediction model to enable the commercial P&R tools to calculate the full path delays, setting update timing margins on paths, so that the P&R tools should use more accurate margins on timing optimization. Experimental results show that, by using our DTOC framework during timing optimization in P&R, we improve the pre-route prediction accuracy on arc delay and arc output slew by 20∼26% on average, and improve the WNS, TNS, and the number of timing violation paths by 50∼63% on average.
16:51 CEST	SD6.8	RL-LEGALIZER: REINFORCEMENT LEARNING-BASED CELL PRIORITY OPTIMIZATION IN MIXED-HEIGHT STANDARD CELL LEGALIZATION Speaker: Sung-Yun Lee, Pohang University of Science and Technology, KR Authors: Sung-Yun Lee¹, Seonghyeon Park², Daeyeon Kim², Minjae Kim², Tuyen Le³ and Seokhyeong Kang² ¹Pohang University of Science and Technology (POSTECH), KR; ²Pohang University of Science and Technology, KR; ³AgileSoDA, KR Abstract Cell legalization order has a substantial effect on the quality of modern VLSI designs, which use mixed-height standard cells. In this paper, we propose a deep reinforcement learning framework to optimize cell priority in the legalization phase of various designs. We extract the selected features of movable cells and their surroundings, then embed them into cell-wise deep neural networks. We then determine cell priority and legalize them in order using a pixel-wise search algorithm. The proposed framework uses a policy gradient algorithm and several training techniques, including grid-cell subepisode, data normalization, reduced-dimensional state, and network optimization. We aim to resolve the suboptimality of existing sequential legalization algorithms with respect to displacement and wirelength. On average, our proposed framework achieved 34% lower legalization costs in various benchmarks compared to that of the state-of-the-art legalization algorithm.

Extended Abstracts

Time	Label	Presentation Title Authors
16:54 CEST	SD6.9	NEURAL NETWORK ON THE EDGE: EFFICIENT AND LOW COST FPGA IMPLEMENTATION OF DIGITAL PREDISTORTION IN MIMO SYSTEMS Speaker: John Dooley, Maynooth University, IE Authors: Yiyue Jiang¹, Andrius Vaicaitis², John Dooley² and Miriam Leeser¹ ¹Department of Electrical and Computer Engineering at Northeastern University, US; ²Department of Electronic Engineering at Maynooth University, IE Abstract Base stations in cellular networks must operate linearly, power efficiently, and with ever increasing flexibility. Recent FPGA hardware advances have demonstrated linearization using neural networks, however the latency introduced by these solutions is a concern. We present a novel hardware implementation for a low digital cost, high throughput pipelined Real Valued Time Delay Neural Network (RVTDNN) structure with a hardware-efficient activation function. Network training times are reduced by minimizing the training signal samples used, based on a biased probability density function (pdf). The design has been experimentally validated using an AMD/Xilinx RFSoC ZCU216 board and surpasses the data throughput of conventional RVTDNN-based DPD while using a fraction of their hardware utilization.
16:54 CEST	SD6.10	QUANTISED NEURAL NETWORK ACCELERATORS FOR LOW-POWER IDS IN AUTOMOTIVE NETWORKS Speaker: Shashwat Khandelwal, Ph.D. Student, Electronic and Electrical Engineering, Trinity College Dublin, IE Authors: Shashwat Khandelwal, Anneliese Walsh and Shreejith Shanker, Trinity College Dublin, IE Abstract In this paper, we explore low-power custom quantised Multi-Layer Perceptrons (MLPs) as an Intrusion Detection System (IDS) for automotive controller area network (CAN). We utilise the FINN framework from AMD/Xilinx to quantise, train and generate hardware IP of our MLP to detect denial of service (DoS) and fuzzying attacks on CAN network, using ZCU104 (XCZU7EV) FPGA as our target ECU architecture with integrated IDS capabilities. Our approach achieves significant improvements in latency (0.12 ms per-message processing latency) and inference energy consumption (0.25 mJ per inference) while achieving similar classification performance as state-of-the-art approaches in the literature.

SD7 Logical and physical analysis and design

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 16:30 CEST - 18:00 CEST
Location / Room: Gorilla Room 1.5.1

Session chair:
Patrick Groeneveld, CEREBRAS & STANFORD, US

16:30 CEST until 16:54 CEST: Pitches of regular papers

16:54 CEST until 18:00 CEST: Interactive technical presentations by the authors of regular papers and extended abstracts

Regular Papers

Time	Label	Presentation Title Authors
16:30 CEST	SD7.1	SYNTHESIS AND UTILIZATION OF STANDARD CELLS AMENABLE TO GEAR RATIO OF GATE-METAL PITCHES FOR IMPROVING PIN ACCESSIBILITY Speaker: Jooyeon Jeong, Seoul National University, Pl Authors: Jooyeon Jeong, Sehyeon Chung, Kyeongrok Jo and Taewhan Kim, Seoul National University, KR Abstract Traditionally, the synthesis of standard cells invariably assumes that the gear ratio (GR) between the gate poly pitch in the cells and the metal pitch of the first vertical metal layer (to be used for routing) over the gate poly is 1:1 for chip implementation. However, the scaling trend in sub-10nm node CMOS designs is that GR is changing from 1:1 to 3:2 or 4:3, which means the number and location of pin access points vary depending on the cell placement location, thereby causing hard-to-pin-access if the pin access points were aligned on the offtrack routing pattern. This work overcomes the pin inaccessibility problem caused by non-1:1 GR in chip implementation. Precisely, we propose a non-1:1 GR aware DTCO (design and technology co-optimization) flow to generate cells with pin patterns that are best suited to the implementation of target design. To this end, we propose two new tasks to be installed in our DTCO framework: (1) from the existing cells optimized for 1:1 GR, we relocate their pin patterns amenable to non-1:1 GR, so that a maximal pin accessibility should be achieved; (2) we incrementally update the pin patterns of the cell instances with routing failures due to pin inaccessibility in the course of the DTCO iterations to produce the cells with best fitted pin patterns to the implementation of target design. Through experiments with benchmark circuits, it is shown that our DTCO methodology optimizing pin patterns amenable to non-1:1 GR is able to produce chip implementations with on average 5.88× fewer routing failures at no additional wirelength, timing, and power cost.
16:33 CEST	SD7.2	CENTER-OF-DELAY: A NEW METRIC TO DRIVE TIMING MARGIN AGAINST SPATIAL VARIATION IN COMPLEX SOCS Speaker: Christian Lutkemeyer, Marvell Semiconductor, Inc., US Authors: Christian Lutkemeyer¹ and Anton Belov² ¹Marvell Semiconductor, Inc., US; ²Synopsys, IE Abstract Complex VLSI SOCs are manufactured on large 300mm wafers. Individual SOCs can show significant spatial performance gradients in the order of 10% per 10mm. The traditional approach to handling this variation in STA tools is a margin look-up table indexed by the diagonal of the bounding box around the gates in a timing path. In this paper we propose a new approach based on the concept of the Center-of-Delay of a timing path. We justify this new approach theoretically for linear performance gradients and present experimental data that shows that the new approach is both safe, and significantly less pessimistic than the existing method.
16:36 CEST	SD7.3	A NOVEL DELAY CALIBRATION METHOD CONSIDERING INTERACTION BETWEEN CELLS AND WIRES Speaker: Leilei Jin, The National ASIC System Engineering Technology Research Center, Southeast University, CN Authors: Leilei Jin, Jia Xu, Wenjie Fu, Hao Yan, Xiao Shi, Ming Ling and Longxing Shi, Southeast University, CN Abstract In the advanced technology, the accuracy of cell and wire delay modeling are the key metrics for timing analysis. However, when the supply voltage decreases to the near-threshold regime, the complicated process variation effect causes the cell delay and the wire delay hard to model. Most researchers study cell or wire delay separately, ignoring the coefficients between them. In this paper, we propose an N-sigma delay model by characterizing different sigma levels (-3σ to +3σ) of the cell and wire delay distribution. The N-sigma cell delay model is represented by the first four moments and calibrated by the operating conditions (input slew, output load). Meanwhile, based on the Elmore model, the wire delay variability is calculated by considering the effect of drive and load cells. The delay models are verified through the ISCAS85 benchmarks and the functional units of PULPino processor with TSMC 28 nm technology. Compared to the SPICE results, the average errors for estimating the +⁄- 3σ cell delay are 2.1% and 2.7% and those of the wire delay are 2.4% and 1.6%, respectively. The errors of path delay analysis keep below 6.6% and the speed is 103X over SPICE MC simulations.
16:39 CEST	SD7.4	RETHINKING NPN CLASSIFICATION FROM FACE AND POINT CHARACTERISTICS OF BOOLEAN FUNCTIONS Speaker: Jiaxi Zhang, Peking University, CN Authors: Jiaxi Zhang¹, Shenggen Zheng², Liwei Ni³, Huawei Li³ and Guojie Luo¹ ¹Peking University, CN; ²Peng Cheng Laboratory, CN; ³Chinese Academy of Sciences, CN Abstract NPN classification is an essential problem in the design and verification of digital circuits. Most existing works explored variable symmetries and cofactor signatures to develop their classification methods. However, cofactor signatures only consider the face characteristics of Boolean functions. In this paper, we propose a new NPN classifier using both face and point characteristics of Boolean functions, including cofactor, influence, and sensitivity. The new method brings a new perspective to the classification of Boolean functions. The classifier only needs to compute some signatures, and the equality of corresponding signatures is a prerequisite for NPN equivalence. Therefore, these signatures can be directly used for NPN classification, thus avoiding the exhaustive transformation enumeration. The experiments show that the proposed NPN classifier gains better NPN classification accuracy with comparable speed.
16:42 CEST	SD7.5	EXACT SYNTHESIS BASED ON SEMI-TENSOR PRODUCT CIRCUIT SOLVER Speaker: Hongyang Pan, Ningbo University, CN Authors: Hongyang Pan¹ and Zhufei Chu² ¹Ningbo university, CN; ²Ningbo University, CN Abstract In logic synthesis, Boolean satisfiability (SAT) is widely used as a reasoning engine, especially for exact synthesis. By representing input formulas as logic circuits instead of conjunction normal forms (CNFs) as in off-the-shelf CNF-based SAT solvers, circuit-based SAT solvers enable decoding after solution to be easier. An exact synthesis method based on a semi-tensor product (STP) circuit solver is presented in this paper. As opposed to other SAT-based exact synthesis algorithms, synthesized Boolean functions are encoded into STP canonical forms and can be solved by STP-based circuit SAT solver in our method. It can also obtain all optimal solutions in one pass. In particular, all solutions are expressed as 2-lookup tables (LUTs), rather than homogeneous logic representations. Hence, different costs can be considered when selecting the optimal circuit. In experiments, we demonstrate that our method accelerates the runtime up to 225.6X while reducing timeout instances by up to 88\%.
16:45 CEST	SD7.6	AN EFFECTIVE AND EFFICIENT HEURISTIC FOR RATIONAL-WEIGHT THRESHOLD LOGIC GATE IDENTIFICATION Speaker: Ting Yu Yeh, National Taiwan University of Science and Technology, TW Authors: Ting Yu Yeh, Yueh Cho and Yung Chih Chen, National Taiwan University of Science and Technology, TW Abstract In CMOS-based current mode realization, the threshold logic gate (TLG) implementation with rational weights has been shown to be more cost-effective than the conventional TLG implementation without rational weights. The existing method for the rational-weight TLG identification is an integer linear programming (ILP)-based method, which could suffer from inefficiency for a Boolean function with a large number of inputs. This paper presents a heuristic for rational-weight TLG identification. We observe that in the ILP formulation, many variables related to the rational weights are redundant according to the ILP solutions. Additionally, a rational-weight TLG could be transformed from a conventional TLG. Thus, the proposed method aims to identify the conventional TLG that can be transformed to a rational-weight TLG with lower implementation cost. We conducted the experiments on a set of TLGs with 4 ∼ 15 inputs. The results show that the proposed method has a competitive quality and is much more efficient, compared to the ILP-based method.
16:48 CEST	SD7.7	FAST STA GRAPH PARTITIONING FRAMEWORK FOR MULTI-GPU ACCELERATION Speaker: Tsung-Wei Huang, University of Utah, US Authors: Guannan Guo¹, Tsung-Wei Huang² and Martin Wong³ ¹University of Illinois at Urbana-Champaign, US; ²University of Utah, US; ³The Chinese University of Hong Kong, HK Abstract Path-based Analysis (PBA) is a key process in Static Timing Analysis (STA) to reduce excessive slack pessimism. However, PBA can easily become the major performance bottleneck due to its extremely long execution time. To overcome this bottleneck, recent STA researches have proposed to accelerate PBA algorithms with manycore CPU and GPU parallelisms. However, GPU memory is rather limited when we compute PBA on large industrial designs with millions of gates. In this work, we introduce a new endpoint-oriented partitioning framework that can separate STA graphs and dispatch the PBA workload onto multiple GPUs. Our framework can quickly identify logic overlaps among endpoints and group endpoints based on the size of shared logic. We then recover graph partitions from the endpoint groups and offload independent PBA workloads to multiple GPUs. Experiments show that our framework can greatly accelerate the PBA process on designs with over 10M gates.
16:51 CEST	SD7.8	TOFU: A TWO-STEP FLOORPLAN REFINEMENT FRAMEWORK FOR WHITESPACE REDUCTION Speaker: Shixiong Kai, Huawai Noah's Ark Lab, Pl Authors: Shixiong Kai¹, Chak-Wa Pui², Fangzhou Wang³, Jiang Shougao⁴, Bin Wang¹, Yu Huang⁵ and Jianye Hao⁶ ¹Huawei Noah's Ark Lab, CN; ²UniVista, CN; ³The Chinese University of Hong Kong, HK; ⁴Hisilicon, CN; ⁵HiSilicon, CN; ⁶Tianjin University, CN Abstract Floorplanning, as an early step in physical design, will greatly affect the PPA of the later stages. To achieve better performance while maintaining relatively the same chip size, the utilization of the generated floorplan needs to be high and constraints related to design rules, routability, power should be honored. In this paper, we propose a two-step framework, called TOFU, for floorplan whitespace reduction with fixed-outline and soft/pre-placed/hard modules modeled. Whitespace is first reduced by iteratively refining the locations of modules. Then the modules near whitespace will be changed into rectilinear shapes to further improve the utilization. To ensure the legality and quality of the intermediate floorplan during the refinement process, a constraint graph-based legalizer with a novel constraint graph construction method is proposed. Experimental results show that the whitespace of the initial floorplans generated by Corblivar can be reduced by about 70% on average and up to 90% in several cases. Moreover, the resulting wirelength is also 3% shorter due to a higher utilization.

Extended Abstracts

Time	Label	Presentation Title Authors
16:54 CEST	SD7.10	ROUTABILITY PREDICTION USING DEEP HIERARCHICAL CLASSIFICATION AND REGRESSION Speaker: Daeyeon Kim, Pohang University of Science and Technology, KR Authors: Daeyeon Kim, Jakang Lee and Seokhyeong Kang, Pohang University of Science and Technology, KR Abstract Routability prediction can forecast the locations where design rule violations occur without routing and thus can speed up the design iterations by skipping the time-consuming routing tasks. This paper investigated (i) how to predict the routability on a continuous value and (ii) how to improve the prediction accuracy for the minority samples. We propose a deep hierarchical classification and regression (HCR) model that can detect hotspots with the number of violations. The hierarchical inference flow can prevent the model from overfitting to the majority samples in imbalanced data. In addition, we introduce a training method for the proposed HCR model that uses Bayesian optimization to find the ideal modeling parameters quickly and incorporates transfer learning for the regression model. We achieved an R2 score of 0.71 for the regression and increased the F1 score in the binary classification by 94\% compared to previous work.
16:54 CEST	SD7.11	EFFICIENT DESIGN RULE CHECKING WITH GPU ACCELERATION Speaker: Zhenhua Feng, Dalian University of Technology, CN Authors: Wei Zhong¹, Zhenhua Feng¹, Zhuolun He², Weimin Wang¹, Yuzhe Ma³ and Bei Yu² ¹Dalian University of Technology, CN; ²The Chinese University of Hong Kong, HK; ³Hong Kong University of Science and Technology, CN Abstract Design Rule Checking (DRC) is an essential part of the chip design flow, which ensures that manufacturing requirements are conformed to avoid a chip failure. With the rapid increase of design scales, DRC has been suffering from runtime overhead. To overcome this challenge, we propose to accelerate DRC algorithms by harnessing the power of graphics processing units (GPUs). Specifically, we first explore an efficient data transfer approach for geometry information of a layout. Then we investigate GPU-based scanline algorithms to accommodate both intra-polygon checking and intre-polygon checking based on the characteristics of the design rules. Experimental results show that the proposed GPU-accelerated method can substantially outperform a multi-threaded DRC algorithm using CPU. Compared with the baseline with 24 threads, we can achieve an average speedup of 36 times and 201 times for spacing rule checks and enclosing rule checks on a metal layer, respectively.
16:54 CEST	SD7.12	MITIGATING LAYOUT DEPENDENT EFFECT-INDUCED TIMING RISK IN MULTI-ROW-HEIGHT DETAILED PLACEMENT Speaker: Li-Chen Wang, National Taiwan University of Science and Technology, TW Authors: Li-Chen Wang and Shao-Yun Fang, National Taiwan University of Science and Technology, TW Abstract With the development of advanced process technology, the electrical characteristic variation of MOSFET transistors has been seriously influenced by layout dependent effect (LDEs). Due to these LDEs, two cells of specific cell types may suffer from timing degradation when they are adjacently and closely placed with specific orientations. To mitigate the timing risk of critical paths and thus optimize the performance of a target design, this work proposes a dynamic programming (DP)-based method for multi-row-height detailed placement with cell flipping and cell shifting. Experimental results shows the efficiency and effectiveness of the proposed DP-based approach.
16:54 CEST	SD7.13	TWO-STAGE PCB ROUTING USING POLYGON-BASED DYNAMIC PARTITIONING AND MCTS Speaker: Youbiao He, Iowa State University, US Authors: Youbiao He¹, Hebi Li², Ge Luo² and Forrest Sheng Bao³ ¹Iowa state university, US; ²Iowa State University, US; ³Iowa State Univerity, US Abstract We propose a pad-focused, net-by-net, two-stage printed circuit board (PCB) routing approach comprising the global routing using Monte Carlo tree search (MCTS) and the detailed routing using A*. Compared with conventional PCB routing algorithms, our approach can route PCB components in both BGA and non-BGA packages. To minimize the gap between the global and detailed routing stages, a polygon-based dynamic routable region partitioning mechanism is introduced. Experimental results show that our approach outperforms state-of-the-art routers such as DeepPCB and FreeRouting in terms of success rate or wirelength.
16:54 CEST	SD7.14	DEEPTH: CHIP PLACEMENT WITH DEEP REINFORCEMENT LEARNING USING A THREE-HEAD POLICY NETWORK Speaker: Dengwei Zhao, Shanghai Jiao Tong University, CN Authors: Dengwei Zhao, Shuai Yuan, Yanan Sun, Shikui Tu and Lei Xu, Shanghai Jiao Tong University, CN Abstract Modern very-large-scale integrated (VLSI) circuit placement with huge state space is a critical task for achieving layouts with high performance. Recently, reinforcement learning (RL) algorithms have made a promising breakthrough to dramatically save design time than human effort. However, the previous RL-based works either require a large dataset of chip placements for pre-training or produce illegal final placement solutions. In this paper, DeepTH, a three-head policy gradient placer, is proposed to learn from scratch without the need of pre-training, and generate superior chip floorplans. Graph neural network is initially adopted to extract the features from nodes and nets of chips for estimating the policy and value. To efficiently improve the quality of floorplans, a reconstruction head is employed in the RL network to recover the visual representation of the current placement, by enriching the extracted features of placement embedding. Besides, the reconstruction error is used as a bonus during training to encourage exploration while alleviating the sparse reward problem. Furthermore, the expert knowledge of floorplanning preference is embedded into the decision process to narrow down the potential action space. Experiment results on the ISPD2005 benchmark have shown that our method achieves 19.02% HPWL improvement than the analytic placer DREAMPlace and 19.89% improvement at least than the state-of-the-art RL algorithms.

SS1 Security of emerging technologies and machine learning

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 16:30 CEST - 18:00 CEST
Location / Room: Okapi Room 0.8.2

Session chair:
Giorgio Di Natale, TIMA, FR

16:30 CEST until 16:57 CEST: Pitches of regular papers

16:57 CEST until 18:00 CEST: Interactive technical presentations by the authors of regular papers and extended abstracts

Regular Papers

Time	Label	Presentation Title Authors
16:30 CEST	SS1.1	PRIVACY-PRESERVING NEURAL REPRESENTATION FOR BRAIN-INSPIRED LEARNING Speaker: Mohsen Imani, University of California, Irvine, US Authors: Javier Roberto Rubalcava-Cortes¹, Alejandro , Hernandez Cano¹, Alejandra Citlalli Pacheco Tovarm¹, Farhad Imani², Rosario Cammarota³ and Mohsen Imani⁴ ¹Universidad Nacional Autonoma de Mexico, MX; ²University of Connecticut, US; ³Intel Labs, US; ⁴University of California, Irvine, US Abstract In this paper, we propose BIPOD, a brain-inspired privacy-oriented machine learning. Our method rethinks privacypreserving mechanisms by looking at how the human brain provides effective privacy with minimal cost. BIPOD exploits hyperdimensional computing (HDC) as a neurally-inspired computational model. HDC is motivated by the observation that the human brain operates on high-dimensional data representations. In HDC, objects are thereby encoded with high-dimensional vectors, called hypervectors, which have thousands of elements. BIPOD exploits this encoding as a holographic projection with both cryptographic and randomization-based features. BIPOD encoding is performed using a set of brain keys that are generated randomly. Therefore, attackers cannot get encoded data without accessing the encoding keys. In addition, revealing the encoding keys does not directly translate to information loss. We enhance BIPOD encoding method to mathematically create perturbation on encoded neural patterns to ensure a limited amount of information can be extracted from the encoded data. Since BIPOD encoding is a part of the learning process, thus can be optimized together to provide the best trade-off between accuracy, privacy, and efficiency. Our evaluation on a wide range of applications shows that BIPOD privacy-preserving techniques result in 11.3× higher information privacy with no loss in classification accuracy. In addition, at the same quality of learning, BIPOD provides significantly higher information privacy compared to state-of state-of-the-art privacy-preserving techniques
16:33 CEST	SS1.2	EXPLOITING SHORT APPLICATION LIFETIMES FOR LOW COST HARDWARE ENCRYPTION IN FLEXIBLE ELECTRONICS Speaker: Nathaniel Bleier, University of Illinois at Urbana-Champaign, US Authors: Nathaniel Bleier¹, Muhammad Mubarik¹, Suman Balaji², Francisco Rodriguez², Antony Sou², Scott White² and Rakesh Kumar¹ ¹University of Illinois at Urbana-Champaign, US; ²PragmatIC Semiconductor, GB Abstract Many emerging flexible electronics applications require hardware based encryption, but it is unclear if practical hardware-based encryption is possible for flexible applications due to stringent power requirements of these applications and higher area and power over heads of flexible technologies. In this work, we observe that the lifetime of many flexible applications is so small that often one key suffices for the entire lifetime. This means that, instead of generating keys and round keys in hardware, we can generate the round keys offline, and instead store these round keys directly in the engine. This eliminates the need for hardware for dynamic generation of round keys, which significantly reduces encryption overhead. This significant reduction in encryption overhead allows us to demonstrate the first practical flexible encryption engines. To prevent an adversary from reading out the stored round keys, we scramble the round keys before storing them in the ROM; camouflage cells are used to unscramble the keys before feeding them to logic. In spite of the unscrambling overhead, our encryption engines consume 27.4% lower power than the already heavily area and power-optimized baselines, while being 21.9% smaller on average.
16:36 CEST	SS1.3	ATTACKING RERAM-BASED ARCHITECTURES USING REPEATED WRITES Speaker: Biresh Kumar Joardar, University of Houston, IN Authors: Biresh Kumar Joardar¹ and Krishnendu Chakrabarty² ¹University of Houston, US; ²Duke University, US Abstract Resistive random-access memory (ReRAM) have is a promising technology for both memory and for in-memory computing. However, these devices have security vulnerabilities that are yet to be adequately investigated. In this work, we identify one such vulnerability that exploits the write mechanism in ReRAMs. Whenever a cell/row is written, a constant bias is automatically applied to the remaining cells/rows to reduce sneak current. We develop a new attack (referred as WriteHammer) that exploits this process. By repeatedly exposing a subset of cells to this bias, WriteHammer can cause noticeable resistance drift in the victim ReRAM cells. Experimental results indicate that WriteHammer can cause up to 3.5X change in cell resistance by simply writing to the ReRAM cells
16:39 CEST	SS1.4	SECURITY EVALUATION OF A HYBRID CMOS/MRAM ASCON HARDWARE IMPLEMENTATION Speaker: Nathan Roussel, Mines Saint-Etienne, FR Authors: Nathan Roussel, Olivier Potin, Jean-Max Dutertre and Jean-Baptiste Rigaud, Mines Saint-Etienne, FR Abstract As the number of IoT objects is growing fast, power consumption and security become a major concern in the design of integrated circuits. Lightweight Cryptography (LWC) algorithms aim to secure the communications of these connected objects at the lowest energy impact. To reduce the energy footprint of cryptographic primitives, several LWC hardware implementations embedding hybrid CMOS/MRAM-based cells have been investi- gated. These architectures use the non-volatile characteristic of MRAM to store data manipulated in the algorithm computation. We provide in this work a security evaluation of a hybrid CMOS/MRAM hardware implementation of the A SCON cipher, a finalist of the National Institute of Standards and Technology LWC contest. We focus on a simulation flow using the current EDA tools capable of carrying out power analysis for side-channel attacks, for the purpose of assessing potential weaknesses of MRAM hybridization. Differential Power Analysis (DPA) and Correlation Power Analysis (CPA) are conducted on the post- route and parasitic annoted netlist of the design. The results show that the hybrid implementation does not significantly lower the security feature compared to a reference CMOS implementation.
16:42 CEST	SS1.5	MANTIS: MACHINE LEARNING-BASED APPROXIMATE MODELING OF REDACTED INTEGRATED CIRCUITS Speaker: Benjamin Carrion Schaefer, University of Texas at Dallas, US Authors: Chaitali Sathe, Yiorgos Makris and Benjamin Carrion Schaefer, University of Texas at Dallas, US Abstract With most VLSI design companies now being fabless it is imperative to develop methods to protect their Intellectual Property (IP). One approach that has become very popular due to its relative simplicity and practicality is logic locking. One of the problems with traditional locking mechanisms is that the locking circuitry is built into the netlist that the VLSI design company delivers to the foundry which has now access to the entire design including the locking mechanism. This implies that they could potentially tamper with this circuitry or reverse engineer it to obtain the locking key. One relatively new approach that has been coined as hardware redaction is to map a portion of the design to an embedded FPGA (eFPGA). The bitstream of the eFPGA now acts as the locking key. The fab now receives the design without the bitstream and hence, cannot reverse engineer the functionality of the design. The obvious drawbacks are the increase in design complexity and the area and performance overheads associated with the eFPGA. In this work we propose, to the best of our knowledge, the first attack on these type of new locking mechanisms by substituting the exact logic mapped onto the eFPGA by a synthesizable predictive model that replicates the behavior of the exact logic. We show that this approach is especially applicable in the context of approximate computing where hardware accelerators tolerate certain degree of errors at their outputs. Some examples include Digital Signal Processing (DSP) or image processing applications Experimental results show that our proposed approach is very effective finding suitable predictive models.
16:45 CEST	SS1.6	LONG RANGE DETECTION OF EMANATION FROM HDMI CABLES USING CNN AND TRANSFER LEARNING Speaker: Shreyas Sen, Purdue University, US Authors: Md Faizul Bari, Meghna Roy Chowdhury and Shreyas Sen, Purdue University, US Abstract The transition of data and clock signals between high and low states in electronic devices creates electromagnetic radiation according to Maxwell's equations. These unintentional emissions, called emanation, may have a significant correlation with the original information-carrying signal and form an information leakage source, bypassing secure cryptographic methods at both hardware and software levels. Information extraction exploiting compromising emanations poses a major threat to information security. Shielding the devices and cables along with setting a control perimeter for a sensitive facility are the most commonly used preventive measures. These countermeasures raise the research need for the longest detection range of exploitable emanation and the efficacy of commercial shielding. In this work, using data collected from 3 types of commercial HDMI cables (unshielded, single-shielded, and double-shielded) in an office environment, we have shown that the CNN-based detection method outperforms the traditional threshold-based detection method and improves the detection range from 4 m to 22.5 m for an iso-accuracy of ~95%. Also, for an iso-distance of 16 m, the CNN-based method provides ~100% accuracy, compared to ~88.5% using the threshold-based method. The significant performance boost is achieved by treating the FFT plots as images and training a residual neural network (ResNet) with the data so that it learns to identify the impulse-like emanation peaks even in the presence of other interfering signals. A comparison has been made among the emanation power from the 3 types of HDMI cables to judge the efficacy of multi-layer shielding. Finally, a distinction has been made between monitor contents, i.e., still image vs video, with an accuracy of 91.7% at a distance of 16 m. This distinction bridges the gap between emanation-based image and video reconstruction algorithms.
16:48 CEST	SS1.7	ADVERSARIAL ATTACK ON HYPERDIMENSIONAL COMPUTING-BASED NLP APPLICATIONS Speaker: Sizhe Zhang, Villanova University, US Authors: Sizhe Zhang¹, Zhao Wang² and Xun Jiao¹ ¹Villanova University, US; ²University of Chicago, US Abstract The security and robustness of machine learning algorithms have become increasingly important as they are used in critical applications such as natural language processing (NLP), e.g., text-based spam detection. Recently, the emerging brain-inspired hyperdimensional computing (HDC), compared to deep learning methods, has shown advantages such as compact model size, energy efficiency, and capability of few-shot learning in various NLP applications. While HDC has been demonstrated to be vulnerable to adversarial attacks in image and audio input, there is currently very limited study on its adversarial security to NLP tasks, which is arguable one of the most suitable applications for HDC. In this paper, we present a novel study on the adversarial attack of HDC-based NLP applications. By leveraging the unique properties in HDC, the similarity-based inference, we propose similarity-guided approaches to automatically generate adversarial text samples for HDC. Our approach is able to achieve up to 89% attack success rate. More importantly, by comparing with unguided brute-force approach, similarity-guided attack achieves a speedup of 2.4X in generating adversarial samples. Our work opens up new directions and challenges for future adversarially-robust HDC model design and optimization.
16:51 CEST	SS1.8	A PRACTICAL REMOTE POWER ATTACK ON MACHINE LEARNING ACCELERATORS IN CLOUD FPGAS Speaker: Russell Tessier, University of Massachusetts, Amherst, US Authors: Shanquan Tian¹, Shayan Moini², Daniel Holcomb², Russell Tessier² and Jakub Szefer¹ ¹Yale University, US; ²University of Massachusetts Amherst, US Abstract The security and performance of FPGA-based accelerators play vital roles in today's cloud services. In addition to supporting convenient access to high-end FPGAs, cloud vendors and third-party developers now provide numerous FPGA accelerators for machine learning models. However, the security of accelerators developed for state-of-the-art Cloud FPGA environments has not been fully explored, since most remote accelerator attacks have been prototyped on local FPGA boards in lab settings, rather than in Cloud FPGA environments. To address existing research gaps, this work analyzes three existing machine learning accelerators developed in Xilinx Vitis to assess the potential threats of power attacks on accelerators in Amazon Web Services (AWS) F1 Cloud FPGA platforms, in a multi-tenant setting. The experiments show that malicious co-tenants in a multi-tenant environment can instantiate voltage sensing circuits as register-transfer level (RTL) kernels within the Vitis design environment to spy on co-tenant modules. A methodology for launching a practical remote power attack on Cloud FPGAs is also presented, which uses an enhanced time-to-digital (TDC) based voltage sensor and auto-triggered mechanism. The TDC is used to capture power signatures, which are then used to identify power consumption spikes and observe activity patterns involving the FPGA shell, DRAM on the FPGA board, or the other co-tenant victim's accelerators. Voltage change patterns related to shell use and accelerators are then used to create an auto-triggered attack that can automatically detect when to capture voltage traces without the need for a hard-wired synchronization signal between victim and attacker. To address the novel threats presented in this work, this paper also discusses defenses that could be leveraged to secure multi-tenant Cloud FPGAs from power-based attacks.
16:54 CEST	SS1.9	SCALABLE SCAN-CHAIN-BASED EXTRACTION OF NEURAL NETWORK MODELS Speaker: Shui Jiang, The Chinese University of Hong Kong, CN Authors: Shui Jiang¹, Seetal Potluri² and Tsung-Yi Ho¹ ¹The Chinese University of Hong Kong, HK; ²North Carolina State University, US Abstract Scan chains have greatly improved hardware testability while introducing security breaches for confidential data. Scan-chain attacks have extended their scope from cryptoprocessors to AI edge devices. The recently proposed scan-chain-based neural network (NN) model extraction attack (ICCAD 2021) made it possible to achieve fine-grained extraction and is multiple orders of magnitude more efficient both in queries and accuracy than its coarse-grained mathematical counterparts. However, both query formulation complexity and constraint solver failures increase drastically with network depth/size. We demonstrate a more powerful adversary, who is capable of improving scalability while maintaining accuracy, by relaxing high-fidelity constraints to formulate an approximate-fidelity-based layer-constrained least-squares extraction using random queries. We conduct our extraction attack on neural network inference topologies of different depths and sizes, targeting the MNIST digit recognition task. The results show that our method outperforms the scan-chain attack proposed in ICCAD 2021 by an average increase in the extracted neural network's functional accuracy of ≈ 32% and 2−3 orders of reduction in queries. Furthermore, we demonstrated that our attack is highly effective even in the presence of countermeasures against adversarial samples.

Extended Abstracts

Time	Label	Presentation Title Authors
16:57 CEST	SS1.10	COMPREHENSIVE ANALYSIS OF HYPERDIMENSIONAL COMPUTING AGAINST GRADIENT BASED ATTACKS Speaker: Hamza Errahmouni Barkam, University of California, Irvine, US Authors: Hamza Errahmouni Barkam¹, SungHeon Jeong², Calvin Yeung¹, Zhuowen Zou¹, Xun Jiao³ and Mohsen Imani¹ ¹University of California, Irvine, US; ²University of California, Irvine, KR; ³Villanova University, US Abstract Brain-inspired Hyper-dimensional computing (HDC) has recently shown promise as a lightweight machine learning approach. HDC models could become the solution to the security aspect of critical applications, such as self-driving cars. Despite its success, there are limited studies on the robustness of HDC models to adversarial attacks. In this paper, we introduce the first comprehensive study that compares the robustness of HDC to malicious attacks to that of deep neural network (DNN) models. We develop a framework that enables HDC models to generate gradient-based adversarial examples using state-of-the-art techniques applied to DNNs. We explore different hyperparameters and HDC architectures and design mechanisms to protect HDC models against malicious attacks. This study includes using data pre-processing and adversarial training. Our evaluation shows that HDC with a proper neural encoding module provides significantly higher robustness to adversarial attacks than existing DNNs. In addition, HDC models have high robustness to adversarial samples generated for DNNs. Our study also indicates that the proposed defense mechanisms can further protect HDC models and potentially increase this technology's viability in safety-critical applications. Our evaluation shows that our HDC model provides, on average, 19.9% higher robustness than DNNs to adversarial samples.

W04.4 Back-End and Verification

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 16:30 CEST - 18:00 CEST
Location / Room: Okapi Room 0.8.3

Time	Label	Presentation Title Authors
16:30 CEST	W04.4.1	ANDREW KAHNG Abstract OpenROAD (https://theopenroadproject.org) is an open-source RTL-to-GDS tool that generates manufacturable layout from a given hardware description – in 24 hours, at advanced foundry nodes. OpenROAD lowers the cost, expertise and schedule barriers to hardware design, thus providing a platform for research, education and system innovation. This talk will present current status of the OpenROAD project and the roadmap for OpenROAD as it seeks to enable VLSI/EDA education, early design space exploration for system designers, research on machine learning in EDA, and more.
16:45 CEST	W04.4.2	JEAN-PAUL CHAPUT Abstract The talk will be focused on two majors points : why Open Hardware is as important as Open Source Software and the major challenges in building FOSS EDA tools.
17:00 CEST	W04.4.3	MYRTLE SHAH Abstract Myrtle will introduce some of the recent developments in nextpnr; including easier ways of prototyping new architectures as well as some core algorithm improvements. They will also introduce FABulous, a highly flexible open source eFPGA fabric generator, and its close integration with nextpnr.
17:15 CEST	W04.4.4	TRISTAN GINGOLD Abstract GHDL is an open-source VHDL simulator and synthesis tool. This talk will present the latest added features and some ideas for future development (in particular mixed simulation)
17:30 CEST	W04.4.5	JIM LEWIS Abstract Open Source VHDL Verification Methodology (OSVVM) provides VHDL with buzz word verification capabilities including Transaction Level Modeling, Constrained Random, Functional Coverage, Scoreboards, FIFOs, Memory Models, Error and Message handling, and Test Reporting that are simple to use and feel like built-in language features. OSVVM has grown rapidly during the COVID years, giving us better capability, better test reporting (HTML and Junit), and scripting that is simple to use (and works with most VHDL simulators). This presentation shows how these advances fit into the overall OSVVM Methodology.
17:45 CEST	W04.4.6	CLAIRE XENIA WOLF Abstract In her talk, Claire will discuss recent developments in open-source verification tools. Claire will briefly present equivalence checking with Yosys (EQY) and mutation cover with Yosys (MCY), and will highlight potential future directions.

W01.5 Debate panel

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 17:15 CEST - 18:00 CEST
Location / Room: Nightingale Room 2.6.1/2

Session chairs:
David Bol, UC Louvain, BE
Chiara Sandionigi, CEA, FR

This session provides to the audience a place to debate about ecodesign, circular economy and end of life of electronic systems.

Invited speakers:

Manuel Rei, 3DS, France
Cédric Rolin, IMEC, Belgium
Marc Heude, Thales, France
Serge Kimbel, Weeecycling, France

YPPK Young People Programme – Keynote on career opportunities

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 17:30 CEST - 18:00 CEST
Location / Room: Marble Hall

Session chair:
Anton Klotz, Cadence Design Systems, DE

This is a Young People Programme event

Time	Label	Presentation Title Authors
17:30 CEST	YPPK.1	3D INTEGRATION: OPPORTUNITIES & CHALLENGES FOR SYSTEM ARCHITECTURE TECHNOLOGY CO-OPTIMIZATION Presenter: Dragomir Milojevic, IMEC, BE Author: Dragomir Milojevic, IMEC, BE Abstract Today there is a consensus that the future of IC design and manufacturing will combine CMOS scaling to eventually reach 1nm and beyond, and advanced 3D integration packaging. Individual dies will be manufactured using new device architectures, together with so called performance boosters to still enable node-to-node gains, even with more modest gate/metal pitch scaling factors. Future ICs will integrate in the same IC package multiple dies manufactured using different processes (heterogeneous integration) optimized for given functionality (e.g., analogue, lower caches levels, high-capacity memories, high-performance logic etc.). To enable die-to-die connectivity, different 3D integration technologies will be required (TSVs, front-side, back-side bumps, optical interconnects etc.) with optimized properties to match performance, energy, and bandwidth requirements of the die-to-die interconnect. But the future will probably not limit itself to technology improvements only. The holy grail of next generation ICs will most likely be the fact that the above-mentioned technology ingredients could be used to re-design the system architecture from scratch to allow unprecedented gains in power, performance, and area. Thus, the parameter space of traditional SoCs design will increase further, making exploration and design choices much harder to make (number and type of cores, memory hierarchy configuration, interconnect design and configuration etc.). To enable system design novel methods will be required to allow so called System Technology Co-Optimization (STCO), or a paradigm in which good old "divide and conquer” approach should be abandoned in favour of more holistic system architecture-design-technology interaction. In this talk we will provide an overview of next generation challenges for system architecture design, practical implementation through EDA and process technology. Ultimately the goal of the presentation will be to point out incredible opportunities offered by the paradigm change for future research & development in the field.

ASD4 ASD Panel session: Autonomous Systems Design as a Driver of Innovation?

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 18:30 CEST - 20:00 CEST
Location / Room: Gorilla Room 1.5.4/5

Session chair:
Rasmus Adler, Fraunhofer IESE, DE

Panellists:
Karl-Erik Arzen, Lund University, SE
Martin Fränzle, Carl von Ossietzky Universität, DE
Arne Hamann, Robert Bosch GmbH, DE
Davy Pissoort, KU Leuven, BE
Claus Bahlmann, Siemens, DE
Christoph Schulze, The Autonomous, AT

Presenter:
Karl-Erik Arzen, Lund University, SE

Autonomous systems have high potential in many application domains. However, most discussions seem to take place with respect to autonomous road vehicles. Automotive industry promised substantial progress in this field but many predictions have not come true. Companies stepped back and corrected their predictions. Does this mean, systems autonomy is not ready to drive innovation? However, autonomous behavior is obviously not limited to road vehicles. Various kinds of systems can benefit from autonomous behavior in various domains such as health and pharmaceutics, energy, manufacturing, farming, mining and so on. In this session, we will thus take a broader perspective on autonomous system design as a driver of innovation and discuss benefits, challenges, and risks in various application domains.

PhDF PhD Forum

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 18:30 CEST - 20:00 CEST
Location / Room: Atrium

The PhD Forum is a great opportunity for PhD students to present their work to a broad audience in the system design and design automation community from both industry and academia, and to establish contacts for entering the job market. Representatives from industry and academia get a glance of state-of-the-art in system design and design automation. The PhD Forum is hosted by EDAA, ACM SIGDA and IEEE CEDA

Time	Label	Presentation Title Authors
18:30 CEST	PhDF.1	ROBUST AND EFFICIENT MACHINE LEARNING FOR EMERGING RESOURCE-CONSTRAINED EMBEDDED SYSTEMS Speaker: Mikail Yayla, TU Dortmund, DE Authors: Mikail Yayla and Jian-Jia Chen, TU Dortmund, DE Abstract This thesis proposes a vision for highly resource-efficient future intelligent systems that are comprised of robust Binary Neural Networks (BNNs) operating with approximate memory and approximate computing units, while being able to be trained on the edge. The studies conducted within the scope of the thesis are summarized in three sections. In the fist section, we present how BNNs can be optimized for robustness by margin-maximization. In the second section, we present three studies on HW/SW Codesign methods that exploit the error tolerance of BNNs for efficient inference. In the third section, we summarize our method for enabling the memory-efficient training of BNNs.
18:30 CEST	PhDF.2	FORMAL AND PRACTICAL TECHNIQUES FOR THE VIRTUAL PROTOTYPE DRIVEN SYSTEM DESIGN PROCESS Presenter: Pascal Pieper, DFKI, DE Author: Pascal Pieper, DFKI, DE Abstract Modern SoC designs are produced in increasingly faster cycle times, while their complexity rises at the need of a continuously decreasing cost. To cope with this high demand and pressure on a manufacturer's ability to maintain a reliable and secure end-product, a Virtual Prototype based design process is used widely in the industry. A VP creates the possibility to design, evaluate and verify an executable prototype of the system in an early design stage by modelling the future hardware on a behavioral or structural level. In contrast to more traditional design flows like extit{hardware-then-software}, this enables both the iterative design evaluation and a parallel development of the (actual) hardware emph{and} software very early in the product conception phase. Additionally, after development of the lower level hardware stages (e.g., Register-Transfer-Layer, gate level, or physical hardware), VPs can be used as golden reference models with test and verification methods for comparison between the system level behaviour and the actual hardware. For this to work, however, the VP and its components need to be verified in the first place. In this thesis, several techniques are proposed to improve and strengthen the VP-based design process, covering modeling and verification of security properties and hardware behaviour, as well as novel debugging, analysis and educational tools. The main goal of this thesis is to both improve existing processes and state-of-the-art tools, as well as showcase new approaches to handle and verify complex systems on hardware, software and intermediate levels.
18:30 CEST	PhDF.3	ON THE ROLE OF RECONFIGURABLE SYSTEMS IN DOMAIN-SPECIFIC COMPUTING Speaker and Author: Davide Conficconi, Politecnico di Milano, IT Abstract Introduction The computer architecture field faces technological and architectural obstacles that limit the general-purpose processor scaling in the delivered performance at a reasonable energy cost. Therefore, computer architects have to follow novel paths to harvest more energy-efficient computations from the currently available technology, for instance, by employing domain specialized solution for a given scenario. The domain specialization path builds on a comprehensive environment where hardware and software are both specialized towards a particular application domain rather than being general purpose. Domain-Specific Architectures (DSAs) generally are the prominent exponent for hardware-centric domain specialization. DSAs leverage an abstraction layer such as an Instruction Set Architecture (ISA) and employ the easiest yet advanced computer architecture techniques to build a fixed datapath with the simplest data type and size. Generally, DSAs are thought to be efficiently implemented as Application-Specific Integrated Circuits (ASICs) or part of a System on Chip (SoC). However, developing custom silicon devices is a time-consuming and costly process that is not always compatible with the time-to-market and fast evolution of the applications, which may require additional datapath customization. Thus, adaptable computing platforms represent the most viable alternative for these scenarios. Field-Programmable Gate Arrays (FPGAs) are the candidate platforms for their on-field reconfigurable heterogeneous fabric. On top of the reconfigurability, FPGAs can implement large spatial computing designs and are publicly available on cloud computing platforms. Domain-Specific Reconfigurable Architectures FPGAs (and all reconfigurable systems) deserve a deeper analysis of their role in the domain specialization path despite being the commercial platform closest to the ideal adaptable computing paradigm. Indeed, they can implement domain-specialized architectures that can be updated after field deployment, delivering variable datapaths which are adaptable almost an infinite number of times. Here, I call them Domain-Specific Reconfigurable Architectures (DSRAs). Employing Reconfigurable Computing (RC) systems (such as FPGAs) opens a wide variety of architectural organizations different from traditional CPUs with their fixed datapath. This thesis classifies them on two orthogonal characteristics: level of software programmability and datapath configurability. The most traditional is the DSA based on a "fixed" datapath with a dedicated ISA that communicates with instructions and data memories (simply called DSA from now on). Then, streaming architectures have fixed datapaths for each class of problems, generally devised from a high-level tool that automates the whole process. Finally, the third architecture organization combines a semi-fixed datapath with a streaming architecture creating a Coarse-Grained Reconfigurable Architecture (CGRA). Figure 1 represents the three main DSRA classes. Although there are exciting research efforts on CGRAs, they are still immature; hence, I will focus mainly on traditional and streaming DSRAs. This thesis defines and analyzes specialized computer architecture organizations based on reconfigurable platforms called DSRAs and addresses three main topics for each specific domain: design methodologies, automation, and usability. The first one (i.e., the design methodologies) is crucial for designing highly energy-efficient architecture; while automation is essential for fast iterative approaches to newer solutions development and reproducibility of achieved results; the last one (i.e., usability) comprehends software programmability in a complete view that spans from hardware-software interfacing to ways of programming the architecture. My dissertation builds on systematic reviews of the latest system-level trends in reconfigurable systems [3] and the latest ways of how to design digital systems for FPGAs [5]. Then, I explore two domains that better mirror the corresponding DSRA class characteristics: one context-specific streaming architecture that could benefit from an automation toolchain and the other that presents different execution models suited for various application features. The most synthetic views of my dissertation contributions are: 1) An analysis of the latest reconfigurable system-level trends with a taxonomy of domain-specific reconfigurable computer organizations [3]; 2) A survey with taxonomies and timelines of the most prominent digital design abstractions for FPGAs [5]}; 3) An open-source design automation framework for highly customizable streaming-dataflow domain specialized accelerators proven on the Image Registration domain [1]; 4) An exploration of different computational model and form of parallelism for the Regular Expressions (or equivalently Finite State Machines) domain for traditional DSAs (depth-first [2] and breadth-first [4]). Open Source Design Automation Framework For Streaming DSRAs Image Registration (IRG) is an essential pre-processing step of several image processing pipelines. However, it is often neglected for its context-specific nature that would require a different architecture for different contexts. Therefore, this thesis presents a comprehensive framework based on the streaming architectural pattern with a dataflow MapReduce}approach, shown in Figure 2. To complete the DSRAs, a design automation toolchain lowers the adaptability effort of the architecture to unexpected contexts or new devices, and a software abstraction layer hides the low-level hardware interfacing mechanisms to expose simpler software APIs. All these components achieve significantly optimized IRG procedures at a lower energy profile [1]. Different Computational Models of Loopback-based DSRAs Particular domains may present more than a single computational pattern that fits the design process of a DSRA and different applications. For instance, the Regular Expressions (REs) domain presents intrinsically sequential computations that can leverage both a depth-first or a breadth-first execution model. Within this context, this thesis presents two different architectures, shown in Figure 3, that explore these different computational patterns and their respective programming abstraction. They exploit the idea of using REs as a programming language of a DSA and share the automation methodology built out of the IRG domain. The two DSA achieves impressive performance and energy efficiency results, although showing that their improvements are application sensitive [2,4].
18:30 CEST	PhDF.4	PHD FORUM ABSTRACT: CO-OPTIMIZATION OF NEURAL NETWORKS AND HARDWARE ARCHITECTURES FOR THEIR EFFICIENT EXECUTION Speaker and Author: Cecilia Latotzke, RWTH Aachen University, DE Abstract Convolutional Neural Networks (CNNs) are ubiquitously used on edge devices, because of their high classification accuracy. However, CNNs with high classification accuracy usually have a high memory footprint. This memory footprint causes high energy costs, which is a challenge for edge devices. Reducing the memory footprint by means of pruning or quantization can reduce accuracy. Meanwhile, most tasks do not accept a degradation in classification accuracy. This dissertation investigates the research question of how to enable the inference of CNNs efficiently and with high accuracy.
18:30 CEST	PhDF.5	{ACCELERATING MEMORY INTENSIVE ALGORITHMS AND APPLICATIONS USING IN-MEMORY COMPUTING Presenter: Ann Franchesca Laguna, De La Salle University, PH Author: Ann Franchesca Laguna, De La Salle University, PH Abstract Data-intensive applications do not fully utilize the computing capabilities of Von Neumann architectures because of the memory bandwidth bottleneck. These memory-bandwidth-limited applications can be accelerated by minimizing the data movement between the memory and the compute units through in-memory computing (IMC). Using IMC, this work accelerated four different types of applications and algorithms.
18:30 CEST	PhDF.6	LOGIC SYNTHESIS FOR ADIABATIC QUANTUM-FLUX PARAMETRON CIRCUITS CONSIDERING TECHNOLOGY-SPECIFIC COSTS Speaker: Siang-Yun Lee, EPFL, CH Authors: Siang-Yun Lee and Giovanni De Micheli, EPFL, CH Abstract Adiabatic quantum-flux parametron (AQFP) is a next-generation superconducting electronic technology featuring ultra-low energy consumption. While the computation paradigm remains the same as classical digital logic families, the AQFP technology has unconventional properties to be considered in design automation. This thesis is divided into two parts. First, on a technology-independent level, a scalable logic synthesis framework is presented along with a specialized resynthesis algorithm targeting majority-based circuits. Whereas the former is general purpose, the latter is especially important for AQFP circuit optimization because the basic computing unit in AQFP is the majority gate. Second, two design constraints imposed by AQFP, namely, path balancing and fanout branching, are tackled. Additional buffers need to be inserted on shorter paths and splitters have to be inserted at the output of multi-fanout gates to fulfill these constraints, which occupy large area in AQFP circuits. We study the optimality of the buffer and splitter insertion problem and propose both exact and heuristic methods to minimize this additional cost.
18:30 CEST	PhDF.7	MODERN HIGH-LEVEL SYNTHESIS: IMPROVING PRODUCTIVITY WITH A MULTI-LEVEL APPROACH Speaker and Author: Serena Curzel, Politecnico di Milano, IT Abstract High-Level Synthesis (HLS) tools simplify the design of hardware accelerators by automatically generating Verilog/VHDL code starting from a general purpose software programming language, usually C/C++. They include a wide range of optimization techniques in the process, most of them performed on a low-level intermediate representation (IR) of the code. Because of the mismatch between the requirements of hardware descriptions and the characteristics of input languages, HLS tools often rely on users to add specific directives (pragmas) that augment the input specification to guide the generation of optimized hardware. A good result thus still requires hardware design knowledge and non-trivial design space exploration, which might be an obstacle for domain scientists seeking to accelerate applications written, for example, in Python-based programming frameworks. This thesis proposes a modern approach based on multi-level compiler technologies to bridge the gap between HLS and high-level frameworks, and to use domain-specific abstractions to solve domain-specific problems. The key enabling technology is the Multi-Level Intermediate Representation (MLIR), a framework that supports building reusable compiler infrastructure inspired by (and part of) the LLVM project. The proposed approach uses MLIR to introduce new optimizations at appropriate levels of abstraction outside the HLS tool while still relying on years of HLS research in the low-level hardware generation steps; users and developers of HLS tools can thus increase their productivity, obtain accelerators with higher performance, and not be limited by the features of a specific (possibly closed-source) backend. The presented tools and techniques were designed, implemented, and tested to synthesize machine learning algorithms, but they are broadly applicable to any input specification written in a language that has a translation to MLIR. Generated accelerators can be deployed on Field Programmable Gate Arrays or Application-Specific Integrated Circuits, and they can reach ~10-100 GFLOPS/W efficiency without any manual optimization of the code.
18:30 CEST	PhDF.8	FAST BAYESIAN ALGORITHMS FOR FPGA PLATFORMS Speaker and Author: Raissa Likhonina, Academy of Sciences, Institute of Information Theory and Automation, CZ Abstract The PhD thesis was devoted to fast Bayesian algorithms, more precisely to the QRD RLS Lattice algorithm combined with hypothesis testing and applied to hand detection problem solution based on ultrasound technology. Due to the proposed structure of regression models and the offered approach to hypothesis testing in the work, the algorithm under consideration is able to solve the problem of noise cancellation and additionally to compute the distance between the hand and the device; thus, potentially enabling to identify simple gestures. Further, the algorithm was implemented in parallel on the HW platform of Xilinx Zynq Ultrascale+ device with a quad-core ARM Cortex A53 processor and FPGA programmable logic and proved to function reliably and accurately in real time using real data from an ultrasound microphone. The work contains an investigation of the state of the art in the corresponding field and gives the theoretical background necessary for the development and modification of the algorithm to fulfill the goals of the thesis. The thesis also includes thorough description of experiments and an analysis of the results including those from simulation and from computation using real ultrasound data both in the MATLAB R2019b environment and on the HW platform of Xilinx Zynq Ultrascale+.
18:30 CEST	PhDF.9	VIRTUAL PROTOTYPE CENTRIC VERIFICATION FOR EMBEDDED SYSTEM DEVELOPMENT Speaker and Author: Niklas Bruns, University of Bremen, DE Abstract Nowadays, a world without embedded systems cannot be imagined. Embedded systems are widespread in consumer electronics as well as in the automotive sector. The high diversity of products leads to various requirements for the underlying embedded systems. For embedded system development, it is crucial to have a short time-to-market (TTM) to persist in the modern markets. In order to reduce the development time, Virtual Prototype (VP) based design flow was established. The VP-based design flow enables parallelizing the Hardware (HW) and Software (SW) development. Nevertheless, parallelized development is not enough to guarantee a short TTM, but also efficient verification methodologies are required. In this work, several novel approaches are proposed to improve the verification of embedded systems that are developed using a Virtual Prototype based design flow. These approaches concentrate on the transitions between VP-based design flow development steps. Just exactly as in the VP-based development flow, the vital link between specification, HW, and SW is the VP.
18:30 CEST	PhDF.10	OLYMPUS: DESIGN METHODS FOR SIMPLIFYING THE CREATION OF DOMAIN-SPECIFIC MEMORY ARCHITECTURES Speaker and Author: Stephanie Soldavini, Politecnico di Milano, IT Abstract Recently, hardware accelerators are becoming increasingly important and specialization of these accelerators means they can achieve high performance and energy efficiency. This specialization, however, means their design is complex and time consuming, and even more so in the case of modern big data and machine learning applications, where a huge amount of data needs to be processed. This complexity means the designer not only has to optimize the accelerator computation logic, but also has to carefully craft efficient memory architectures, which is not the case in traditional software design. The goal of this work is to address these challenges by reducing the manual steps designer must perform to accelerate data-intensive applications by means of FPGA. We aim to create a multi-level compilation flow that specializes a domain-specific memory template to match data, application, and technology requirements in order to simplify the hardware accelerator development process In this thesis, I am developing Olympus, a set of methods for simplifying the creation of domain-specific memory architectures. With the currently implemented optimizations, Olympus is able to achieve a performance of up to 43 GFLOPS and an efficiency of 1.2 GFLOPS/W while using double-precision data and up to 103 GOPS and 3.9 GOPS/W when using 32-bit fixed point data.
18:30 CEST	PhDF.11	EFFICIENT NEURAL ARCHITECTURES FOR EDGE DEVICES Speaker and Author: Dolly Sapra, University Van Amsterdam, NL Abstract The rise of IoT networks, with numerous interconnected edge devices, has led to an increase in demand for intelligent data processing closer to the data source. Deployment of neural networks at the edge is desirable, though challenging since an edge has limitations on available resources. The focus of this thesis is on neural architectures for Convolutional Neural Networks (CNNs) that execute on the edge. The thesis presents Evolutionary Piecemeal Training (EPT), an algorithm for an efficient Neural Architecture Search (NAS). This flexible algorithm treats NAS as an optimization problem with a variable number of objectives possible. To highlight the versatility of EPT, three different sets of experiments have been shown in the thesis, with one, two and four objectives respectively. The multi-objective algorithm typically involves hardware specific objectives in addition to accuracy of the CNN to produce a pareto-optimal set of neural architectures. Further, the thesis examines adaptivity of the CNN-based application running at the edge. The first work is Scenario Based Run-time Switching (SBRS) framework, where every scenario represents an operation mode and has an associated CNN. An application may switch between scenarios to allow synchronous adaptation with environmental changes. Additionally, a framework was presented to efficiently share and reuse CNNs in distributed IoT networks. This framework supports maintenance and adaptation of existing and deployed CNNs at the edge. To conclude, this thesis demonstrates various methodologies to improve the performance of a CNN deployed on a resource-constrained edge device. The key ideas include searching for an efficient neural architecture, adaptive applications with run-time CNN switching and CNNs as dynamic entities in a distributed IoT network. Thesis is published at https://dare.uva.nl/search?identifier=03eff2c1-b5ab-4fc8-bfe6-046c0a929…
18:30 CEST	PhDF.12	DESIGN AND IMPLEMENTATION OF PARALLEL AND APPROXIMATE MICROARCHITECTURES FOR TIGHTLY COUPLED PROCESSOR ARRAYS Speaker and Author: Marcel Brand, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE Abstract With the decline of Moore's law, the trend of processor architecture design goes to compensating the stagnating compute power with parallelism of many- and multi-core systems. Therefore, it becomes increasingly important to have access to processing elements that are small but powerful and promote efficient coding and memory usage. Our work on Orthogonal Instruction Processing (OIP) and Anytime Instruction Processing (AIP) tackles this problem from various angles. With OIP, in contrast to well-known Very Long Instruction Word (VLIW) processor architectures, we can decrease the size of software-pipelined application code down to 4.6% of software-pipelined VLIW code and thus also save memory that would be expensive both in area and power requirements. AIP gives a programmer or compiler control over the accuracy of computed floating-point (FP) operations. The accuracy of the computations is encoded on bit granularity into the instruction, which leads to the executed operation only computing as many most significant bits (MSBs) and may even terminate earlier than when it had been computed at full accuracy. The concept exploits the fact that many algorithms do not need to compute every instruction with full accuracy and trades said accuracy off against execution time and power consumption. Anytime instructions prove especially useful when computing iterative algorithms like square-root or Jacobi, but also show benefits in other domains, e. g., compared to regular FP operations, they can reduce the energy consumption of Convolutional Neural Network (CNN) inference by up to 62% without increasing the error rate of the classification.
18:30 CEST	PhDF.13	OSCILLATORY NEURAL NETWORKS IMPLEMENTED ON FPGA FOR EDGE COMPUTING APPLICATIONS Speaker: Madeleine Abernot, LIRMM - University of Montpellier - CNRS, FR Authors: Madeleine Abernot and Aida Todri-Sanial, LIRMM, University of Montpellier, CNRS, FR Abstract This PhD work focuses on Oscillatory Neural Network (ONN) computing paradigm for edge artificial intelligence applications. In particular, it uses a digital ONN design implemented on FPGA to explore novel ONN architectures, learning algorithms, and applications. First, using a fully-connected ONN architecture, ONN can perform pattern recognition, applied in this work for various edge applications, like image processing and robotics. Then, this work introduces layered ONN architectures for classification tasks applied to image edge detection.
18:30 CEST	PhDF.14	NOVEL CIRCUIT ARCHITECTURES FOR SCALABLE AND ADAPTIVE SENSOR READOUT Speaker and Author: Jonah Van Assche, KU Leuven, BE Abstract In this Phd research, the design and modelling of new circuit architectures for sensing devices in the extreme edge is investigated, with a focus on biomedical sensing (e.g. ECG, EEG, ...). Such edge devices require very long battery life on a limited energy budget, coming from a small battery that powers the device. At the same time, long monitoring periods are needed, and the devices should be wireless connected to a base station/the cloud for further data-processing. This research explores sensor systems that directly compress the signal when sampling the signal, in the mixed-signal domain. This to lower the data rate of the system and hence, the system-level power consumption of edge devices. Two techniques in particular were focused on, compressive sensing and event-based sampling. The research has three main objectives. The first objective is to develop high-level models that can estimate already power consumption directly when modelling the sensor readout circuits (without the need for circuit level simulations). These modelling techniques were applied to a compressive sensing system and an event-based level-crossing ADC, showing that both such techniques can result in great system level power savings compared to a traditional sensor system. The second research objective was to improve the circuit level performance of level-crossing ADCs, which resulted in a prototype IC that can achieve state-of-the-art power efficiency and accuracy. The third research objective is to validate the design made in the second research objective in an application with a spiking neural network, to show that event-based sampling can result in not only lower data rate, but also lower processing power on-chip.
18:30 CEST	PhDF.15	A COMPLETE ASSERTION-BASED VERIFICATION FRAMEWORK FROM THE EDGE TO THE CLOUD Speaker and Author: Samuele Germiniani, Università di Verona, IT Abstract Since modern Cyber-physical systems (CPS) modern CPSs are increasingly complex and distributed, it is no longer appropriate to focus the verification process only on the single component, instead, it is necessary to embrace holistic approaches that look at the entire system. To this end, it is crucial to consider an ecosystem of integrated tools interconnected in a complete supply chain: from the formalisation of specifications up to their run-time verification. Even though several tools have been proposed in the last few decades, there is no single framework that can be considered an integrated ecosystem. This leads to a number of inefficiencies and holes in the verification process. Assertion-based verification (ABV) is a well-known approach for checking the functional correctness of a system. In ABV, the specifications of the system under verification (SUV) are formalised through assertions, which are logic properties that must hold during the system's execution. Due to the complexity and dynamic nature of the SUV, ABV cannot be applied only in an offline fashion before the deployment of the system. Therefore, it is necessary to extend the verification process to the post-deployment phase, that is, by running checkers during the execution of the system. However, this collides with the issues of dealing with a distributed system affected by unpredictable latency. In this context, the SUV is usually made of several components with limited available resources, and to make things even more challenging, these resources are usually already completely saturated from executing the functional tasks. To fill in the gap, I propose a complete framework to verify complex distributed systems, from the formalisation of specifications to runtime execution. The proposed framework aims at covering several holes in the verification process of systems executing in an edge-to-cloud computing environment.
18:30 CEST	PhDF.16	A RESOURCE EFFICIENT ACCELERATION OF NEURAL NETWORKS ON LOW-END FPGAS THROUGH MEMORY SHARING Speaker: Argyris Kokkinis, Aristotle University of Thessaloniki, GR Authors: Argyris Kokkinis¹ and Kostas Siozios² ¹Aristotle University of Thessaloniki, GR; ²Department of Physics, Aristotle University of Thessaloniki, GR Abstract Hardware acceleration at the deep edge is accompanied with strict constraints for low power and high throughput . In low-end FPGAs the frequent communication with the off-chip memory decreases both the design's performance and energy efficiency. In this research a design methodology for the acceleration of Neural Networks (NNs) on low-end FPGAs through on-chip memory sharing among the implemented accelerators is presented. Experimental analysis indicates that this methodology may increase the size of the on-chip NNs up to x3.09 without the overhead of a continuous off-chip communication.
18:30 CEST	PhDF.17	PHASE-BASED OSCILLATORY NEURAL NETWORK FOR ENERGY EFFICIENT NEUROMORPHIC COMPUTING Speaker: Corentin Delacour, LIRMM, University of Montpellier, CNRS, FR Authors: corentin delacour and Aida Todri-Sanial, LIRMM, University of Montpellier, CNRS, FR Abstract Oscillatory Neural Networks (ONNs) are novel neuromorphic architectures where information is encoded in phases among coupled oscillators. This work introduces the concept of analog ONNs based on beyond-CMOS devices to perform AI tasks with a low energy footprint. Using circuit and TCAD simulations, we investigate the design of compact oscillating neurons made of vanadium dioxide (VO2) and coupled by passive synaptic elements. The ONN energy scaling at the device and architecture level is presented. Finally, we showcase a VO2-ONN for solving NP-hard optimization problems such as finding the maximum cut of a graph.
18:30 CEST	PhDF.18	CO-DESIGN OF LIGHTWEIGHT EHEALTH APPLICATIONS ON A IOT EGDE PROCESSOR Speaker: Mingyu Yang, Tokyo Institute of Technology, JP Authors: Mingyu Yang and Yuko Hara-Azumi, Tokyo Institute of Technology, JP Abstract With the development of the Internet of Things (IoT), eHealth applications implemented on embedded systems are offering an easy-to-use IoT eHealth ecosystem. For such applications, a power/energy-efficient computing platform as well as lightweight algorithms that do not require powerful resources or large memory footprint are both essential. This work targets a lightweight implementation of a low-power eHealth device using both hardware and software approaches. A memory-conscious dynamic time warping (DTW) algorithm used in various lightweight eHealth applications is deployed on a small and low-power embedded processor. Prototypes of the processor were fabricated using a 65nm low-power process.
18:30 CEST	PhDF.19	QUALITY-OF-SERVICE AWARE DESIGN AND MANAGEMENT OF EMBEDDED MIXED-CRITICALITY SYSTEMS Speaker and Author: Behnaz Ranjbar, TU Dresden, DE Abstract A wide range of embedded systems found in the automotive and avionics industries are evolving into Mixed-Criticality (MC) systems to meet cost, space, timing, and power consumption requirements. MC applications are real-time, and to ensure the correctness of these applications, it is essential to meet strict timing requirements as well as functional specifications. The correct design of such MC systems requires a thorough understanding of the system's functions and their importance to the system. We address the challenges associated with efficient MC system design. We first focus on MC application analysis through Worst-Case Execution Time (WCET) analysis and task scheduling analysis in order to execute more low-criticality tasks in the system, i.e., improving the Quality-of-Service (QoS), while guaranteeing the correct execution of high-criticality tasks. Then, it addresses the challenge of enhancing QoS using parallelism in multi-processor hardware platforms. In addition, we studied the power and thermal management of multi-core MC systems while guaranteeing the real-timeliness of the systems under any circumstances.
18:30 CEST	PhDF.20	FUNCTIONAL SYNTHESIS VIA MACHINE LEARNING AND AUTOMATED REASONING Speaker and Author: Priyanka Golia, IIT Kanpur and NUS Singapore, SG Abstract Automated Functional synthesis deals with synthesizing programs, functions, and circuits that satisfy the user's requirement. Given a relation specification R(X, Y ) over input X and output Y, the task is to synthesize output Y in terms of X, that is, Y := F(X), such that the given specification is met. Given the fundamental importance of synthesis in computer science, recent developments in this area led to advances in program synthesis, synthesis of safety controllers, circuit design and repair, and cryptanalysis. We proposed a novel data-driven approach for functional synthesis that takes advantage of advances in machine learning, constrained sampling, and automated reasoning. The proposed approach is very generic and could be lifted to diverse settings. We further analyze its impact on program synthesis and synthesis with explicit dependencies. The submission summarizes the different work done as a part of my thesis. Joint work with Kuldeep S. Meel, Subhajit Roy, and Friedrich Slivovsky.
18:30 CEST	PhDF.21	APPLICATION REFINEMENT AND MEMORY MANAGEMENT OVER HETEROGENEOUS DRAM/NVM SYSTEMS Speaker: Manolis Katsaragakis, National TU Athens, GR Authors: Manolis Katsaragakis¹, Francky Catthoor² and Dimitrios Soudris³ ¹National TU Athens, GR; ²IMEC, BE; ³National Technical University of Athens, GR Abstract This PhD focuses on the development of systematic methodology for providing source code organization, data structure refinement, exploration and placement over emerging memory technologies. The goal is to extract alternative solutions, aiming to provide multi-criteria tradeoff over different optimization aspects, such as memory footprint, accesses, performance and energy consumption.
18:30 CEST	PhDF.22	MAXIMIZING THE POTENTIAL OF RISC-V VECTOR EXTENSIONS FOR SPEEDING UP CRYPTOGRAPHY ALGORITHMS Speaker and Author: Huimin Li, TU Delft, NL Abstract RISC-V is an open and freely accessible Instruction Set Architecture (ISA) based on reduced instruction set computer (RISC) principles. It is suitable for direct native hardware implementation with small base instructions (ISA bases) for simplified general-purpose computers and rich optional instruction extensions for more comprehensive applications. These optional extensions are designed to work with all ISA bases without conflicts. Additionally, RISC-V allows users to customize their instructions to accelerate specification applications. RISC-V vector extensions (RISC-V vector ISA) are designed for vector operations. These extensions make multiple data execute the same highly-parallel process under one instruction and improve the whole system's performance. This paper explores the full potential of RISC-V Vector Extensions in cryptography algorithms.
18:30 CEST	PhDF.23	OPTIMIZING AI AT THE EDGE: FROM NETWORK TOPOLOGY DESIGN TO MCU DEPLOYMENT Speaker and Author: Alessio Burrello, Politecnico di Torino and Università di Bologna, IT Abstract Optimizing and deploying artificial intelligence on edge devices to remove the necessity of cloud computing systems and sending data over networks is vital for reducing energy consumption and improving privacy. This thesis will describe two essential knobs to optimize the so-called EdgeAI. The first topic analyzed in the thesis will be Neural Architecture Search (NAS). NAS is quickly becoming the go-to approach to optimize the structure of Deep Learning (DL) models. I will focus on two different tools that I developed, one to optimize the architecture of Temporal Convolutional Networks (TCNs), a convolutional model for time-series processing that has recently emerged, and one to optimize the data precision of tensors inside CNNs. The first NAS proposed explicitly targets the optimization of the most peculiar architectural parameters of TCNs, namely dilation, receptive field, and the number of features in each layer. Note that this is the first NAS that explicitly targets these networks. The second NAS proposed instead focuses on finding the most efficient data format for a target CNN, with the granularity of the layer filter. Note that applying these two NASes in sequence allows an "application designer" to minimize the structure of the neural network employed, minimizing the number of operations or the memory usage of the network. The second chapter describes the optimization of neural network deployment on edge devices. Importantly, exploiting edge platforms' scarce resources is critical for NN efficient execution on MCUs. To do so, I will introduce DORY (Deployment Oriented to memoRY) -- an automatic tool to deploy CNNs on low-cost MCUs. DORY, in different steps, can manage different levels of memory inside the MCU automatically, offload the computation workload (i.e., the different layers of a neural network) to dedicated hardware accelerators, and automatically generates ANSI C code that orchestrates off- and on-chip transfers with the computation phases. On top of this, I will introduce two optimized computation libraries that DORY can exploit to deploy TCNs and Transformers on edge efficiently. In the last chapter of the thesis, I will describe two different applications of bio-signal analysis, i.e., heart rate tracking and sEMG-based gesture recognition. In these two applications, I will show the employment of previously described techniques as fundamental blocks for optimizing the execution of these tasks on edge.
18:30 CEST	PhDF.24	EFFICIENT AND RELIABLE EDGE VISION ACCELERATOR WITH COMPUTE-IN-MEMORY Speaker: Wantong Li, Georgia Tech, US Authors: Wantong Li and Shimeng Yu, Georgia Tech, US Abstract Compute-in-memory (CIM) has been widely investigated as an attractive candidate to accelerate the extensive multiply-and-accumulate (MAC) workloads in deep learning inference. Analog CIM with non-volatile memories such as resistive random-access memory (RRAM) benefits from low leakage, high capacity, and suppression of data movement, but inference accuracy can deteriorate from nonidealities. This work proposes techniques including on-chip write-verify, in-situ error correction, and temperature-tracking ADC references to combat the process, voltage, and temperature (PVT) variations in RRAM-CIM. A prototype chip with these features has been fabricated and validated in TSMC 40nm technology. The macro achieves competitive compute density of 97.8 GOPS/mm2 and energy efficiency of 44.5 TOPS/W, while guaranteeing high accuracy under low VDD and high temperature. On the application side, vision transformer has become the state-of-the-art for many computer vision tasks, and a digital reconfigurable accelerator (RAWAtten) for its complex window attention is designed. RAWAtten achieves 2.4× speedup over the baseline GPU while consuming only a fraction of GPU power. Having improved the reliability of analog CIM, a hybrid RAWAtten employing analog CIM for its linear layers and digital compute for its intermediate matrix multiplications is under development to combine advantages of both compute schemes. Monolithic 3D integration will be used to further reduce cost of data movements and allow stacking of heterogeneous layers in different technology nodes.
18:30 CEST	PhDF.25	UNFORGETTABLE!: DESIGNING A NON-VOLATILE PROCESSOR FOR INTERMITTENTLY POWERED EMBEDDED DEVICES Speaker: Satya Jaswanth Badri, Indian Institute of Technology Ropar, IN Authors: SatyaJaswanth Badri¹, Mukesh Saini¹ and Neeraj Goel² ¹Indian Institute of Technology, Ropar, IN; ²IIT Ropar, IN Abstract Battery-less technology evolved to replace battery usage in space, deep mines, and other environments to reduce cost and pollution. The alternative and promising solution to replace battery-operated devices is energy harvesters, which help to collect energy from the environment to power up IoT devices. The collected energy is stored in a capacitor and uses this energy for computations, so power failures may often occur in these IoT systems. We refer to this computation as intermittent computation. Data loss is the major challenge in these intermittently powered IoT devices. Non-volatile memory (NVM) based processors were explored for saving the system state during a power failure. A Non-Volatile Processor (NVP) is needed for these devices. We proposed three different architectures that combine to design a suitable and efficient NVP for intermittent computing. In our first work, we deploy NVM at the L1 cache. We deploy NVM at the LLC cache in our second work. In our third proposed work, we proposed a memory mapping technique for a modern NVP, i.e., MSP430FR6989.
18:30 CEST	PhDF.26	LANGUAGE SUPPORT AND OPTIMIZATION FOR ENERGY-EFFICIENT AND ADAPTABLE EXECUTION OF MULTIPLE DATAFLOW APPLICATIONS ON EMBEDDED SYSTEMS Speaker and Author: Robert Khasanov, TU Dresden, DE Abstract Many modern computing systems in embedded end-user devices consist of many cores, and the number of cores continues to grow. Embedded devices often process varying workloads, where different kernels may be requested to execute at any time. The system needs to ensure that a required Quality of Service is delivered and the overall energy consumption is minimized. This thesis combines several works which aim at energy-efficient and adaptable execution on soft/firm real-time systems and researches adaptivity both at the application and system levels. More concretely, first, it presents a novel extension to Kahn Process Networks (KPN), which introduces implicit parallelism and a relaxed execution strategy. Despite this relaxation, the introduced extension to the application model still keeps deterministic KPN semantics. Second, the thesis presents a novel energy-efficient runtime resource management algorithm for multi-application mapping. The presented methodology lets runtime applications adapt to available resources by using mapping segments, which allows the manager to consider the upcoming changes in the workload, thereby enlarging the scope of analysis. As a result, the manager better adapts applications to the available resources and produces energy-efficient schedules. Due to low overhead, the approach could also be applied to a use-case of baseband processing, where the incoming requests are processed at the millisecond granularity. The final part of the prospective thesis presents a complete tool flow from adaptive dataflow application code down to the final execution on the embedded system. This tool flow exploits the available adaptivity knobs at both application and system levels in a joint way, thereby better adapting the system to the varied dynamic workload.
18:30 CEST	PhDF.27	AN INTEGRATED ENVIRONMENT FOR MODELING AND DEPLOYING DIGITAL TWINS Speaker and Author: Charles Steinmetz, Hochschule Hamm-Lippstadt - Campus Lippstadt, DE Abstract The Digital Twin (DT) has been the focus of researchers from academia and industry in the last few years. It is one of the key enablers of the current and next industrial revolutions, such as Industry 4.0, Industry 5.0, and Metaverse. However, representing real-world systems can be complex, since assets might have several ways of being represented and several stakeholders with different experiences might be involved. In this context, this paper proposes an environment that integrates all these perspectives in a common language that different stakeholders can use, covering all system levels from the device level up to process and workflow levels. A methodology and elements for creating semantic DT-models are provided. Furthermore, a 4-layers architecture is presented to help designers to identify the responsibilities of each part of the system.
18:30 CEST	PhDF.28	HARDWARE AND SOFTWARE ARCHITECTURES FOR ENERGY-EFFICIENT SMART HEALTHCARE SYSTEMS Speaker: Bharath Srinivas Prabakaran, TU Wien, AT Authors: Bharath Srinivas Prabakaran¹ and Muhammad Shafique² ¹TU Wien (TU Wien), AT; ²New York University Abu Dhabi, AE Abstract Wearables are proving to be increasingly influential by penetrating and improving the user experience of most smartphone holders. They have drastically improved the users' quality of life due to their ease of use and broad-spectrum functionalities, including the deployment of sensors capable of monitoring various biosignals to estimate the users' health. The collected data is transmitted to the user's device and/or their physician, based on their requirement, for further processing and information extraction to detect anomalies. This work investigates the research challenges associated with such smart-healthcare systems at both the hardware and software layers to propose relevant techniques that can ease future deployment and adoption.
18:30 CEST	PhDF.29	POWER SIDE CHANNELS IN REMOTE FPGAS Speaker and Author: Ognjen Glamocanin, EPFL, CH Abstract The pervasive adoption of field-programmable gate arrays (FPGAs) in both cyber-physical systems and the cloud has raised many security issues. Being integrated circuits, FPGAs are susceptible to fault and power side-channel attacks, which require physical access to the victim device. Recent work demonstrated that physical proximity is no longer required for these attacks, as FPGA logic can be misused to create on-chip voltage sensors or power-wasting circuits. This work explores the impact of on-chip voltage sensors on FPGA security, and shows that they can be used to both enhance and compromise the security of FPGA-based systems. In case of deployed and no longer accessible cyber-physical devices vulnerable to tampering attacks, we show that on-chip sensors allow designers to re-evaluate the power side-channel leakage after deployment, ensuring that security has not been compromised. In the case of shared FPGAs in the cloud, we demonstrate that new security vulnerabilities arise with the use of on-chip sensors. We show that a remote attacker can mount both statistical (correlation power analysis) and machine learning (ML) based attacks with the on-chip sensors, emphasizing the need to deploy countermeasures in multi-tenant FPGAs. Our work also demonstrates new, routing-based sensor architectures that outperform the state of the art. Finally, we evaluate the temperature impact on the on-chip sensors and demonstrate that it can significantly impact the attack effort.
18:30 CEST	PhDF.30	NOVEL TECHNIQUES FOR TIMING ANALYSIS OF VLSI CIRCUITS Speaker and Author: Dimitrios Garyfallou, University of Thessaly, GR Abstract Timing analysis is an essential and demanding verification method used during the design and optimization of a Very Large Scale Integrated (VLSI) circuit, while it also constitutes the cornerstone of the final signoff that dictates whether the chip can be released to the semiconductor foundry for fabrication. Throughout the last few decades, the relentless push for high-performance and energy-efficient circuits has been met by aggressive technology scaling, which enabled the integration of a vast number of devices into the same die but introduced new challenges to timing analysis. In nanometer technologies, highly resistive interconnects have an ever-increasing impact on timing, while nonlinear transistor and Miller capacitances imply that signals no longer resemble smooth saturated ramps. At the same time, manufacturing process variations have become significantly more pronounced, which calls for sophisticated timing analysis techniques to reduce uncertainty in timing estimation. From another perspective, the timing guardbands enforced to protect circuits from variation-induced timing errors are overly pessimistic since they are estimated using static timing analysis under rare worst-case conditions, leaving extensive dynamic timing margins unexploited. To this end, this research presents several new techniques for accurate and efficient timing analysis of VLSI circuits in advanced technologies, which address different aspects of the problem, including gate and interconnect timing estimation, timing analysis under process variation, and dynamic timing analysis.
18:30 CEST	PhDF.31	SHARED RESOURCE CONTENTION AWARE SCHEDULABILITY ANALYSIS FOR MULTIPROCESSOR REAL-TIME SYSTEMS Speaker: Jatin Arora, CISTER Research Centre, ISEP, IPP, PT Authors: Jatin Arora, Eduardo Tovar and Claudio Maia, Polytechnic Institute of Porto, PT Abstract Commercial-off-the-shelf (COTS) multicore platforms have the potential to provide raw computing power while being energy-efficient and cost-effective. However, the adoption of multicore platforms in hard real-time systems is still under scrutiny. The main challenge that hinders the use of COTS multicore platforms in hard real-time systems is their unpredictability, which originates from the sharing of different hardware resources. A task executing on one core of a multicore platform has to compete with other co-running tasks (tasks running on other cores) to access shared hardware resources such as the last-level cache (LLC), the interconnect (e.g., memory bus), and the main memory. This competition is problematic as it can negatively influence the temporal behavior of tasks in a non-deterministic manner. This phenomenon is known as shared resource contention. The shared resource contention in multicore systems is problematic as it can negatively influence the temporal behavior of tasks in a non-deterministic manner. To circumvent this problem, the concept of the 3-phase task execution model was proposed that divides the task execution into distinct computation and memory phases. In such a model, the shared resources, i.e., the memory bus, and the main memory, are only accessed during the memory phases and no memory access is allowed during the computation phase. Leveraging such a model, tasks can be scheduled in a manner such that while a task is executing its memory phase, another task on a different core can execute its computation phase concurrently without suffering shared resource contention. However, if tasks running on multiple cores execute their memory phases at the same time, shared resource contention can occur. To address this issue, this PhD dissertation focuses on analyzing the shared resource contention suffered by 3-phase tasks due to the sharing of the memory bus and main memory. Having analyzed the shared resource contention, the Worst-Case Response Time (WCRT) based schedulability analysis is performed by integrating the maximum shared resource contention suffered by 3-phase tasks.
18:30 CEST	PhDF.32	SCALABLE HARDWARE-AWARE NEURO-EVOLUTIONARY ALGORITHMS Speaker and Author: Michal Pinos, Faculty of Information Technology, Brno University of Technology, CZ Abstract Recently, there has been a growing interest in the use of DNNs in low-power devices with limited resources, such as Internet of Things (IoT) devices, embedded devices, or other battery-powered smart gadgets. The deployment of DNNs in these devices is associated with many restrictions, such as limited power consumption, low memory or insufficient computing power, which, for example, limits their usage to on-device inference only. In order to be able to deploy modern DNNs on resource constrained devices, many methods of hardware-aware DNN design have been researched. One of the most frequently used approaches is the manual or semi-automatic optimizations of the existing DNNs for deployment on the given hardware. Such optimizations usually consist of procedures such as model quantization reducing the bit width of numeric data types, replacing expensive floating point operations with fixed point arithmetic, or model compression using pruning and fine-tuning techniques. Another, recently very popular and successful technique is the deployment of approximate computing at different levels of the DNN computing stack. In this research I focus on the utilization of approximate multipliers in certain layers of DNN models. In particular, excellent trade-offs between energy consumption and accuracy can be achieved by approximating multiplications in the convolutional layers of convolutional neural networks (CNNs). To overcome some of the problems associated with the tedious manual design of DNN architectures, such as the time complexity and error-proneness, a special technique for the automated design of neural network architectures, called Neural Architecture Search (NAS), has been deployed.
18:30 CEST	PhDF.33	DESIGN AND CODE OPTIMIZATION FOR SYSTEMS WITH NEXT-GENERATION RACETRACK MEMORIES Speaker and Author: Asif Ali Khan, TU Dresden, DE Abstract With the rise of computationally expensive application domains such as machine learning, genomics, and fluid simulation, the quest for performance and energy-efficient computing has gained unprecedented momentum. The significant increase in computing and memory devices in modern systems has resulted in an unsustainable surge in energy consumption, a substantial portion of which is attributed to the memory system. The scaling of conventional memory technologies and their suitability for the next-generation system is also questionable. This has led to the emergence and rise of nonvolatile memory (NVM) technologies. Today, in different development stages, several NVM technologies are competing for rapid access to the market. Racetrack memory (RTM) is one such nonvolatile memory technology that promises SRAM- comparable latency, reduced energy consumption, and unprecedented density compared to other technologies. However, RTM is sequential in nature, i.e., data in an RTM cell needs to be shifted to an access port before it can be accessed. These shift operations incur performance and energy penalties. This thesis presents a set of techniques including optimal, near-optimal, and evolutionary algorithms for efficient scalar, instructions, and array placement in RTMs. We present an automatic compilation framework that analyzes static control flow programs and transforms the loop traversal order and memory layout to maximize accesses to consecutive RTM locations and minimize shifts. We develop a simulation framework called RTSim that models various RTM parameters and enables accurate architectural-level simulation.
18:30 CEST	PhDF.34	BITSTREAM PROCESSING SYSTEMS WITH NEW PERSPECTIVES TOWARD SIMULATION AND LIGHTWEIGHT NEURAL NETWORKS Speaker: Sercan Aygun, University of Louisiana at Lafayette, US Authors: Sercan Aygun¹ and Ece Gunes² ¹University of Louisiana at Lafayette, US; ²Istanbul TU, TR Abstract Sercan Aygun obtained his Ph.D. degree in Electronics Engineering from Istanbul Technical University, Turkey, in November 2022. He is currently a postdoctoral researcher at the University of Louisiana at Lafayette, USA. The goal of the dissertation is to propose software simulations of stochastic computing (SC) systems with an emphasis on vision and learning machines. A new simulation approach based on the contingency table (CT) construct is proposed. The simulation burden of the memory- and runtime-bounded SC is reduced. By only utilizing correlation-aware CT, digital circuits are simulated as if using actual bitstreams. However, only scalar processing is performed. In addition, the dissertation proposes a new bitstream processing neural network architecture based on binarized weights and activations. Bitstream processing binarized neural network (BSBNN) is presented by considering its efficient hardware structure and robustness to non-idealities such as bitflip errors. The dissertation was collaboratively continued at the Université Catholique de Louvain, Belgium (Supervisor: Prof. Christophe De Vleeschouwer, 2018-2019) and the University of Louisiana at Lafayette, USA (Supervisor: Asst. Prof. M. Hassan Najafi, 2021- 2022).
18:30 CEST	PhDF.35	A CROSS-LAYER FRAMEWORK FOR ADAPTIVE PROCESSOR-BASED SYSTEMS REGARDING ERROR RESILIENCE AND POWER EFFICIENCY Speaker and Author: Mitko Veleski, Brandenburg University of Technology, DE Abstract This thesis presents a novel, first of its kind framework for synergistic optimization of two fundamental, but non-complementary requirements in modern computing: error resilience and power consumption. The basic foundations of the framework are high degree of configurability and simple integration in a typical processor-based system. Such a framework makes the host system easily adaptable to variations and capable to operate optimally in all conditions. This is achieved by intelligent interchanging of techniques for resilient and low power computing during runtime. As flow of relevant information in an efficient and timely manner is crucial for dynamic system adjustment, the framework building blocks are distributed across several abstraction layers. Moreover, the framework allows a system to preserve its performance at negligible area overhead.
18:30 CEST	PhDF.36	SECURITY AND INTERPRETABILITY IN AUTOMOTIVE SYSTEMS Speaker and Author: Shailja Thakur, New York University, US Abstract The lack of a sender authentication mechanism in the Controller Area Network (CAN) makes it vulnerable to security threats, such as an attacker impersonating an Electronic Control Unit (ECU) and sending spoofed messages. To address the issue, this thesis proposes a sender authentication technique that utilizes power consumption measurements and a classification model to determine transmitting states. By analyzing the power consumption of each ECU, the technique can identify the actual sender and detect any spoofed messages. The method shows good accuracy in real-world settings, making it a promising solution to the problem of CAN security. However, while machine learning-based security controls have shown great potential in improving automotive security, false positives pose a significant challenge. False positive alerts can cause alarm fatigue in operators, leading to incorrect reactions and, ultimately, rendering the system less effective. To address this challenge, the thesis explores explanation techniques for image and time series inputs.These techniques assign weights to sensitive inputs and quantify variations in explanations. Overall, the thesis proposes methods for addressing security and interpretability in automotive systems. These methods have potential applications in other settings where transparent and reliable decision-making is crucial.
18:30 CEST	PhDF.37	RESOURCE-AWARE OPTIMIZATION TECHNIQUES FOR MACHINE LEARNING INFERENCE ON HETEROGENEOUS EMBEDDED SYSTEMS Speaker and Author: Ourania Spantidi, Southern Illinois University Carbondale, US Abstract Deep neural networks (DNNs) are being heavily utilized in modern applications, putting energy-constraint devices to the test. To bypass high energy consumption issues, approximate computing has been employed in DNN accelerators to balance out the accuracy-energy reduction trade-off. However, the approximation-induced accuracy loss can be very high and drastically degrade the performance of the DNN. Therefore, there is a need for a fine-grain mechanism that would assign specific DNN operations to approximation to maintain acceptable DNN accuracy, while achieving low energy consumption. This PhD thesis presents two different methods for weight-to-approximation mapping for approximate DNN accelerators.
18:30 CEST	PhDF.38	A CAD FRAMEWORK FOR AUTOMATED LEARNABILITY ASSESSMENT OF PHYSICALLY UNCLONABLE FUNCTIONS Speaker: Durba Chatterjee, IIT Kharagpur, IN Authors: Durba Chatterjee, Debdeep Mukhopadhyay and Aritra Hazra, IIT Kharagpur, IN Abstract Ever since the emergence of Physically Unclonable Function~(PUF), the hardware primitive has been subjected to various machine learning~(ML) attacks. While several design strategies have been proposed to mitigate state-of-the-art attacks, they are subsequently broken by novel attack techniques. One of the reasons is that most designs are adapted to mitigate the former attacks and do not consider design strengthening from an architectural perspective. This necessitates the development of a formal methodology to design strong ML-resilient PUF constructions. In this work, we present a CAD framework~PUF-G, to formally represent and evaluate the Probably Approximately Correct~(PAC) learnability of Silicon PUFs and their compositions. To represent a PUF design, we propose a formal representation language capable of representing any PUF construction or composition upfront. The PUF-G tool parses the design description, translates the design to an interim model and outputs the PAC-learnability bounds. This tool will help a designer explore various compositional PUF architectures and their resilience to ML attacks automatically before converging on a strong design.
18:30 CEST	PhDF.39	RELIABILITY MODELING AND MITIGATION IN ADVANCED MEMORY TECHNOLOGIES AND PARADIGMS Speaker and Author: Mahta Mayahinia, Karlsruhe Institute of Technology, DE Abstract Scaling the VLSI technology toward the more advanced smaller nodes on the one side, and emerging new devices such as non-volatile resistive memories, on the other side, open up new horizons for designing high-performance and energy-efficient computational and memory platforms. However, both the long-term and short-term reliability of these structures is of paramount importance. Moreover, due to the smaller technology node, utilizing the emerging devices and non-conventional processing units such as computation in memory, the previous modeling for both the functionality and reliability is not sufficiently accurate anymore. Therefore, new models need to be developed by considering the new challenges. In this work, we investigate the reliability issues of the advanced and emerging memory and processing elements and try to resolve them in different levels of abstraction; from low-level circuit-based to higher-level application-oriented solutions.
18:30 CEST	PhDF.40	MACHINE LEARNING FOR RESOURCE-CONSTRAINED COMPUTING SYSTEMS Speaker and Author: Martin Rapp, Karlsruhe Institute of Technology, DE Abstract Optimizing the management of the limited resources of computing systems such as processors is of paramount importance to achieve goals like maximum performance. In particular, system-level resource management has a major impact on the performance, power, and temperature during application execution by utilizing application mapping, application migration, and dynamic voltage and frequency scaling (DVFS). This work presents novel machine learning (ML)-based resource management techniques. ML-based solutions allow tackling the involved challenges by predicting the impact of potential resource management actions, by estimating hidden, i.e., unobservable at run time, properties of applications, or by directly learning a resource management policy. Finally, since ML also needs to run with limited resources, this work presents resource-aware distributed on-device learning. Ultimately, this work shows that ML is a key technology to optimize system-level resources management by tackling the involved challenges and enabling technical innovations to further exploit the full potential of computing systems.
18:30 CEST	PhDF.41	COUNTERMEASURES AGAINST FPGA-BASED NON-INVASIVE ATTACKS Speaker: Ali Asghar, Technische Universitat Ilmenau, DE Authors: Ali Asghar and Daniel Ziener, TU-Ilmenau, DE Abstract Non-Invasive attacks have now been known for decades, however, the security community's interest in these attacks hasn't yet diminished which shows their relevance in hardware security. In this work, we propose countermeasures for two different types of FPGA-based non-invasive attacks. Our major contribution is a countermeasure against a class of non-invasive physical attacks known as Side Channel Analysis (SCA), for which we have developed and evaluated a dynamically reconfigurable system. The proposed system allows exchanging different realizations of a cryptographic algorithm during run-time. This dynamic behavior renders the static principles of SCA ineffective and consequently increases the overall system security. The second contribution of this work deals with Intellectual Property (IP) piracy which is a non-invasive logical attack. We have extended an existing idea which establishes the ownership of an IP core using look-up table (LUT) contents as signatures. Our contributions scale the work for much larger designs and 6-LUT FPGAs and associated CAD tools. The results show a 100% core identification rate with no false-positives or false-negatives.
18:30 CEST	PhDF.42	ENERGY-EFFICIENT LOCALIZATION ON AUTONOMOUS NANO-UAVS WITH NOVEL MULTIZONE DEPTH SENSORS AND PARALLEL RISC-V PROCESSORS Speaker: Hanna Müller, ETH Zürich, CH Author: Hanna Mueller, ETH Zurich, CH Abstract Unmanned aerial vehicles (UAVs) are nowadays used in many fields, such as monitoring, inspection, surveillance, transportation, and communication. In many of those scenarios, a small form factor brings advantages - smaller drones are more agile, can fly through narrow passages, and allow safe operation close to humans. Especially miniaturized UAVs (i.e., nano-UAVs that weigh a few tens of grams) often rely on offboard computation in the form of a powerful computer, as onboard computation is strongly limited by power and size constraints. However, only relying on onboard sensing and computation has many advantages, like higher reliability as their mission does not critically rely on the presence of a reliable communication link with a central computer or a pilot, and increased reach, as they do not have to stay close to a base station anymore. However, to navigate autonomously, one must fulfill several compute-intensive tasks, such as localization, mapping, and planning, while avoiding obstacles. I identified three main challenges in fully autonomous nano-UAVs, (i) miniaturization of the UAVs, (ii) obstacle avoidance and (iii) localization. This work addresses those challenges by exploiting for the first time novel depth-map sensors from STMicroelectronics (VL53L5CX), and novel processing units only consuming tens of milliwatts while providing tens of GOPS, such as parallel ultra-low power (PULP) System-on-Chips (SoCs), as well as optimized algorithms, fitted for execution on microcontrollers.
18:30 CEST	PhDF.43	ENERGY EFFICIENT DOMAIN-SPECIFIC HARDWARE DESIGN Speaker: Kailash Prasad, IIT Gandhinagar, IN Authors: Kailash Prasad and Joycee Mekie, IIT Gandhinagar, IN Abstract The advent of Deep Neural Networks (DNNs) has ushered in a new era of breakthroughs in a wide variety of tasks, including image classification and language translation. However, the complexity of these workloads has led to an enormous increase in computational demands. In recent years, novel paradigms have been proposed for energy-efficient circuits, one of which is approximate computing. This approach aims to exploit the inherent ability of many applications to produce acceptable results, even when there are some errors in their computations. Previous studies on DNN accelerators have shown that on-chip and off-chip memory accounts for a significant portion of the system energy consumption, with data movement being the dominant energy-consuming factor. To overcome this challenge, In-Memory Computing (IMC) has emerged as a promising approach that enables computation within on-chip memory cells, offering numerous benefits in computation time and energy efficiency. In this Ph.D. thesis, we propose approximate circuits, architecture, and evaluation tools to examine their impact on various applications. Additionally, we propose IMC architectures and their evaluation framework to overcome the data movement bottleneck. Our research offers valuable insights into the potential of approximate computing and IMC to improve energy efficiency and performance in a wide range of applications.
18:30 CEST	PhDF.44	TOWARDS ENERGY-EFFICIENT IN-MEMORY COMPUTING Speaker and Author: Muhammad Rashedul Haq Rashed, University of Central Florida, US Abstract The rapid growth of sensor devices in the internet of things (IoT) has resulted in that the amount of available digital data is exponentially increasing. This has powered the emergence of data-driven applications such as computer vision, natural language processing, and search. These new applications have endless computing demands that cannot be met by today's high performance computing systems. Unfortunately, these demands are not expected to be solved by further scaling silicon technology due to the slowdown of Moore's law, the end of Dennard scaling, and the von-Neumann bottleneck. Solving this grand computing efficiency challenge has been the focus of several federal funding agenesis, with multiple billion dollar investments on programs such as the exascale computing project (ECP), the brain initiative (BRAIN), and the joint university microelectronics program (JUMP). My research is aligned with these efforts and aims at developing future computing systems based on emerging hardware. These computing systems promise substantial (orders of magnitude) improvements in throughput and energy-efficiency. The high-level idea of this research direction is to leverage emerging non-volatile memories (NVMs) and perform energy-efficient processing in-memory (PIM). The solution strategy allows otherwise expensive operations such as matrix-vector multiplication to be performed efficiently in the analog domain. Moreover, the processing in-memory eliminates the expensive data movement between the processor and the memory. Within this research direction, I have made several key contributions towards the robustness, scalability, and energy-efficiency of such systems. My five main research contributions are outlined in the attached extended abstract.
18:30 CEST	PhDF.45	DATE PHD FORUM: MEMRISTOR BASED ARTIFICIAL INTELLIGENCE ACCELERATORS USING IN/NEAR MEMORY PARADIGM Speaker: kamel-eddine harabi, Universite Paris-Saclay, FR Authors: kamel-eddine harabi¹ and Damien Querlioz² ¹C2N, Université Paris Saclay, CNRS, FR; ²Université Paris-Sud, FR Abstract Memristors are a new type of memory technology fully embeddable in CMOS, providing a compact nonvolatile, and fast memory. These devices provide fantastic opportunities to integrate logic and memory tightly and allow low-power computing. It is therefore essential to prototype computing concepts involving memristors experimentally. However, appropriate platforms are extremely complex to fabricate due to the need to co-integrate commercial CMOS and memristor devices on the same die. My Ph.D. thesis is about the design and development of energy-efficient AI systems, with low energy consumption, using memristors. In our projects, we rely on In/Near-Memory computing approach, where memory and computation are co-located. During my PhD, I worked mainly on three projects, two of which were published in Nature Electronics, and one presented at ASPDAC 2023.
18:30 CEST	PhDF.46	HARDWARE SECURITY ASSURANCE VIA OBFUSCATION AND AUTHENTICATION Speaker and Author: Mohammad Rahman, University of Florida, US Abstract Due to the globalization of IC manufacturing, there have been increased security concerns, notably IP theft. One promising countermeasure is logic locking, which includes programmable elements in a design to obscure the true functionality during manufacturing. In general, logic locking techniques are meant to provide IP security without incurring large overheads. This dissertation contributes in several ways to this goal. We perform an exhaustive security analysis of the existing logic locking techniques, revealing several vulnerabilities. Once such vulnerability comes from the extit{satisfiability}-based SAT attack, where the circuit under attack (CUA) is represented in a propositional logic form and response from an unlocked chip is utilized to quickly prune out the incorrect key. Criteria for successful SAT attacks on locked circuits include: (i) the circuit under attack is fully combinational, or (ii) the attacker has scan chain access. These vulnerabilities inform the development of a novel dynamically-obfuscated scan chain (DOSC) architecture and illustrate its resiliency against the SAT attacks both mathematically and experimentally when inserted into the scan chain of an obfuscated design. Scan obfuscation methods, e.g., DOSC, require that the functional IP core is locked by a functional logic locking method. However, none of the existing logic locking methods is resilient against emerging attacks on logic locking. To strengthen the protection of the underlying functional IP core against these emerging attacks, O'Clock, a clock-gating based logic locking method has been proposed that `locks the clock' to protect IP cores in a complex SoC environment. O'Clock obstructs data/control flows and makes the underlying logic dysfunctional for incorrect keys by manipulating the activity factor of the clock tree with minimal power, performance, and area (PPA) overhead and maximum resiliency against emerging attacks.
18:30 CEST	PhDF.47	RELIABLE MEMRISTIVE NEUROMORPHIC IN-MEMORY COMPUTING: AN ALGORITHM-HARDWARE CO-DESIGN APPROACH Speaker: Soyed Tuhin Ahmed, KIT - Karlsruhe Institute of Technology, DE Authors: Soyed Tuhin Ahmed¹ and Mehdi Tahoori² ¹KIT - Karlsruhe Institute of Technology, Karlsruhe, Germany, DE; ²Karlsruhe Institute of Technology, DE Abstract The capability of neural networks (NN) to tackle difficult cognitive tasks, such as sensor data processing, image recognition, and language modeling, has made them appealing for hardware realization. To obtain high inference accuracy, most neural network (NN) models enhance their depth and breadth. Also, they require numerous matrix-vector multiplications, which are expensive. NNs applications can be efficiently accelerated in neuromorphic compute-in-memory (CiM) architectures based on emerging resistive non-volatile memories (NVMs) such as Spin Transfer Torque Magnetic RAM (STT-MRAM). NVMs offer many benefits, such as fast switching, high endurance, and low power consumption. However, the manufacturing process for NVM memories has not yet matured. As a result, they suffer from various non-ideal behaviors, such as device-to-device process variation, runtime temperature variations, defective devices, and the retention problem. Consequently, the reliability of the CiM-implemented NN during post-manufacturing and post-deployment is essential and challenging for the proper operation of the NN in safety-critical applications such as medical imaging and autonomous driving. Hardware-only solutions may not be optimal because they may increase hardware overhead. As a result, in this PhD research, hardware-algorithm co-design-based solutions are explored to solve the reliability aspects of the NN implemented in the CiM architecture. Also, we intend to take advantage of the statistical nature of NVM devices and propose statistical NN inferences such as Bayesian inference that not only provide inherent robustness to variations but also quantify model uncertainty.

REC Welcome Reception

Add this session to my calendar

Date: Monday, 17 April 2023
Time: 18:30 CEST - 20:00 CEST
Location / Room: Atrium

ASD5 ASD focus session 1: Autonomy-driven Emerging Directions in Software-defined Vehicles

Add this session to my calendar

Date: Tuesday, 18 April 2023
Time: 08:30 CEST - 10:00 CEST
Location / Room: Gorilla Room 1.5.4/5

Session chair:
Enrico Fraccaroli, University of North Carolina, US

Over the past two decades, the volume of electronics and software in cars have grown tremendously. There is now widespread consensus that more than 90% of the innovation in modern vehicles is driven by them. But this growth has also resulted in hardware and software architectures that are proving to be a bottleneck for further innovation and efficient design flows, especially when implementing compute-intensive functions necessary for autonomous features. Another emerging trend in the domain of automotive software is the need for continuous improvement and continuous deployment (CI/CD) of functionality, that is enabled by Over-The-Air (OTA) capability. The goal of this special session is to discuss these new trends, the resulting challenges, and explore emerging solutions and directions in the broad area of design, development, and verification of software-defined vehicles. The three talks will highlight different aspects of software-defined vehicle designs, what research challenges they pose, and how they would impact the future automotive design ecosystem.

Time	Label	Presentation Title Authors
08:30 CEST	ASD5.1	IMPACTS OF SERVICE ORIENTED COMMUNICATION ON SDV ARCHITECTURES Presenter: Prachi Joshi, General Motors, R&D, US Author: Prachi Joshi, General Motors, R&D, US Abstract .
09:00 CEST	ASD5.2	"SHIFT-LEFT” DEVELOPMENT AND VALIDATION OF SOFTWARE DEFINED VEHICLES WITH A VIRTUAL PLATFORM Presenter: Unmesh D. Bordoloi, Siemens, US Author: Unmesh D. Bordoloi, Siemens, US Abstract .
09:30 CEST	ASD5.3	DESIGN TOOLS FOR ASSURED AUTONOMY Presenter: Samarjit Chakraborty, UNC Chapel Hill, US Author: Samarjit Chakraborty, UNC Chapel Hill, US Abstract .

BPA1 Testing

Add this session to my calendar

Date: Tuesday, 18 April 2023
Time: 08:30 CEST - 10:30 CEST
Location / Room: Gorilla Room 1.5.1

Session chair:
Alberto Bosio, Ecole Centrale de Lyon, FR

Time	Label	Presentation Title Authors
08:30 CEST	BPA1.1	DEVICE-AWARE TEST FOR BACK-HOPPING DEFECTS IN STT-MRAMS Speaker: Sicong Yuan, TU Delft, NL Authors: Sicong Yuan¹, Mottaqiallah Taouil¹, Moritz Fieback¹, Hanzhi Xun¹, Erik Marinissen², Gouri Kar², Siddharth Rao², Sebastien Couet² and Said Hamdioui¹ ¹TU Delft, NL; ²IMEC, BE Abstract The development of Spin-transfer torque magnetic RAM (STT-MRAM) mass production requires high-quality dedicated test solutions, for which understanding and modeling of manufacturing defects of the magnetic tunnel junction (MTJ) is crucial. This paper introduces and characterizes a new defect called Back-Hopping (BH); it also provides its fault models and test solutions. The BH defect causes MTJ state to oscillate during write operations, leading to write failures. The characterization of the defect is carried out based on manufactured MTJ devices. Due to the observed non-linear characteristics, the BH defect cannot be modeled with a linear resistance. Hence, device-aware defect modeling is applied by considering the intrinsic physical mechanisms; the model is then calibrated based on measurement data. Thereafter, the fault modeling and analysis is performed based on circuit-level simulations; new fault primitives/models are derived. These accurately describe the way the STT-MRAM behaves in the presence of BH defect. Finally, the dedicated march test and Design-for-Test solutions are proposed.
08:55 CEST	BPA1.2	CORRECTNET: ROBUSTNESS ENHANCEMENT OF ANALOG IN-MEMORY COMPUTING FOR NEURAL NETWORKS BY ERROR SUPPRESSION AND COMPENSATION Speaker: Amro Eldebiky, TU Munich, DE Authors: Amro Eldebiky¹, Grace Li Zhang², Georg Bocherer³, Bing Li¹ and Ulf Schlichtmann¹ ¹TU Munich, DE; ²TU Darmstadt, DE; ³Huawei Munich Research Center, DE Abstract The last decade has witnessed the breakthrough of deep neural networks (DNNs) in many fields. With the increasing depth of DNNs, hundreds of millions of multiply-and-accumulate (MAC) operations need to be executed. To accelerate such operations efficiently, analog in-memory computing platforms based on emerging devices, e.g., resistive RAM (RRAM), have been introduced. These acceleration platforms rely on analog properties of the devices and thus suffer from process variations and noise. Consequently, weights in neural networks configured into these platforms can deviate from the expected values, which may lead to feature errors and a significant degradation of inference accuracy. To address this issue, in this paper, we propose a framework to enhance the robustness of neural networks under variations and noise. First, a modified Lipschitz constant regularization is proposed during neural network training to suppress the amplification of errors propagated through network layers. Afterwards, error compensation is introduced at necessary locations determined by reinforcement learning to rescue the feature maps with remaining errors. Experimental results demonstrate that inference accuracy of neural networks can be recovered from as low as 1.69% under variations and noise back to more than 95% of their original accuracy, while the training and hardware cost are negligible.
09:20 CEST	BPA1.3	ASSESSING CONVOLUTIONAL NEURAL NETWORKS RELIABILITY THROUGH STATISTICAL FAULT INJECTIONS Speaker: Annachiara Ruospo, Politecnico di Torino, IT Authors: Annachiara Ruospo¹, Gabrile Gavarini¹, Corrado De Sio¹, Juan Guerrero Balaguera¹, Luca Sterpone¹, Matteo Sonza Reorda¹, Ernesto Sanchez¹, Riccardo Mariani², Joseph Aribido³ and Jyotika Athavale³ ¹Politecnico di Torino, IT; ²NVIDIA, IT; ³nvidia, US Abstract Assessing the reliability of modern devices running CNN algorithms is a very difficult task. Actually, the complexity of the state-of-the-art devices makes exhaustive Fault Injection (FI) campaigns impractical and typically out of the computational capabilities. A possible solution consists of resorting to statistical FI campaigns that allow a reduction in the number of needed experiments by injecting only a carefully selected small part of it. Under specific hypothesis, statistical FIs guarantee an accurate picture of the problem, albeit selecting a reduced sample size. The main problems today are related to the choice of the sample size, the location of the faults, and the correct understanding of the statistical assumptions. The intent of this paper is twofold: first, we describe how to correctly specify statistical FIs for Convolutional Neural Networks; second, we propose a data analysis on the CNN parameters that drastically reduces the number of FIs needed to achieve statistically significant results without compromising the validity of the proposed method. The methodology is experimentally validated on two CNNs, ResNet-20 and MobileNetV2, and the results show that a statistical FI campaign on about 1.21% and 0.55% of the possible faults, provides very precise information of the CNN reliability. The statistical results have been confirmed by the exhaustive FI campaigns on the same cases of study.
09:45 CEST	BPA1.4	INTERACTIVE TECHNICAL PRESENTATIONS BY THE AUTHORS Speaker: Authors of the session, DATE, BE Author: Session Chairs, DATE, BE Abstract Participants can freely interact with authors during their interactive technical presentations.

BPA2 From synthesis to application

Add this session to my calendar

Date: Tuesday, 18 April 2023
Time: 08:30 CEST - 10:30 CEST
Location / Room: Okapi Room 0.8.3

Session chair:
Mirjana Stojilovic, EPFL, CH

Time	Label	Presentation Title Authors
08:30 CEST	BPA2.1	EFFICIENT PARALLELIZATION OF 5G-PUSCH ON A SCALABLE RISC-V MANY-CORE PROCESSOR Speaker: Marco Bertuletti, ETH Zurich, IT Authors: Marco Bertuletti¹, Yichao Zhang¹, Alessandro Vanelli-Coralli² and Luca Benini² ¹ETH Zurich, CH; ²ETH Zurich, CH \| Università di Bologna, IT Abstract 5G Radio access network disaggregation and softwarization pose challenges in terms of computational performance to the processing units. At the physical layer level, the baseband processing computational effort is typically offloaded to specialized hardware accelerators. However, the trend toward software-defined radio-access networks demands flexible, programmable architectures. In this paper, we explore the software design, parallelization and optimization of the key kernels of the lower physical layer (PHY) for physical uplink shared channel (PUSCH) reception on MemPool and TeraPool, two manycore systems having respectively 256 and 1024 small and efficient RISC-V cores with a large shared L1 data memory. PUSCH processing is demanding and strictly time-constrained, it represents a challenge for the baseband processors, and it is also common to most of the uplink channels. Our analysis thus generalizes to the entire lower PHY of the uplink receiver at gNodeB (gNB). Based on the evaluation of the computational effort (in multiply-accumulate operations) required by the PUSCH algorithmic stages, we focus on the parallel implementation of the dominant kernels, namely fast Fourier transform, matrix-matrix multiplication, and matrix decomposition kernels for the solution of linear systems. Our optimized parallel kernels achieve respectively on MemPool and TeraPool speedups of 211, 225, 158, and 762, 880, 722, at high utilization (0.81, 0.89, 0.71, and 0.74, 0.88, 0.71), comparable a single-core serial execution, moving a step closer toward a full-software PUSCH implementation.
08:55 CEST	BPA2.2	NARROWING THE SYNTHESIS GAP: ACADEMIC FPGA SYNTHESIS IS CATCHING UP WITH THE INDUSTRY Speaker: Benjamin Barzen, University of California, Berkeley, DE Authors: Benjamin Barzen¹, Arya Reais-Parsi¹, Eddie Hung², Minwoo Kang¹, Alan Mishchenko¹, Jonathan Greene¹ and John Wawrzynek¹ ¹University of California, Berkeley, US; ²FPG-eh Research and University of British Columbia, CA Abstract Historically, open-source FPGA synthesis and technology mapping tools have been considered far inferior to industry-standard tools. We show that this is no longer true. Improvements in recent years to Yosys (Verilog elaborator) and ABC (technology mapper) have resulted in substantially better performance, evident in both the reduction of area utilization and the increase in the maximum achievable clock frequency. More specifically, we describe how ABC9 --- a set of feature additions to ABC --- was integrated into Yosys upstream and available in the latest version. Technology mapping now has a complete view of the circuit, including support for hard blocks (e.g., carry chains) and multiple clock domains for timing-aware mapping. We demonstrate how these improvements accumulate in dramatically better synthesis results, with Yosys-ABC9 reducing the delay gap from 30% to 0% on a commercial FPGA target for the commonly used VTR benchmark, thus matching Vivado's performance in terms of maximum clock frequency. We also measured the performance on a selection of circuits from OpenCores as well as literature, comparing the results produced by Vivado, Yosys-ABC1 (existing work), and the proposed Yosys-ABC9 integration.
09:20 CEST	BPA2.3	SAGEROUTE: SYNERGISTIC ANALOG ROUTING CONSIDERING GEOMETRIC AND ELECTRICAL CONSTRAINTS WITH MANUAL DESIGN COMPATIBILITY Speaker: Haoyi Zhang, Peking University, CN Authors: Haoyi Zhang, Xiaohan Gao, Haoyang Luo, Jiahao Song, Xiyuan Tang, Junhua Liu, Yibo Lin, Runsheng Wang and Ru Huang, Peking University, CN Abstract Routing is critical to the post-layout performance of analog circuits. As modern analog layouts need to consider both geometric constraints (e.g., design rules and low bending constraints) and electrical constraints (e.g., electromigration (EM), IR drop, symmetry, etc.), analog routing becomes increasingly challenging to investigate the complicated design space. Most previous work has focused only on geometric constraints or basic electrical constraints, lacking holistic and systematic investigation. Such an approach is far from typical manual design practice and can not guarantee post-layout performance on real-world designs. In this work, we propose SAGERoute, a synergistic routing framework taking both geometric and electrical constraints into consideration. Through Steiner tree based wire sizing and guided detailed routing, the framework can generate high-quality routing solutions efficiently under versatile constraints on real-world analog designs.
09:45 CEST	BPA2.4	INTERACTIVE TECHNICAL PRESENTATIONS BY THE AUTHORS Speaker: Authors of the session, DATE, BE Author: Session Chairs, DATE, BE Abstract Participants can freely interact with authors during their interactive technical presentations.

BPA5 Benchmarking and verification

Add this session to my calendar

Date: Tuesday, 18 April 2023
Time: 08:30 CEST - 10:30 CEST
Location / Room: Okapi Room 0.8.2

Session chair:
Daniel Große, Johannes Kepler University Linz, AT

Time	Label	Presentation Title Authors
08:30 CEST	BPA5.1	BENCHMARKING LARGE LANGUAGE MODELS FOR AUTOMATED VERILOG RTL CODE GENERATION Speaker: Shailja Thakur, New York University, US Authors: Shailja Thakur¹, Baleegh Ahmad¹, Zhenxing Fan¹, Hammond Pearce¹, Benjamin Tan², Ramesh Karri¹, Brendan Dolan-Gavitt¹ and Siddharth Garg¹ ¹New York University, US; ²University of Calgary, CA Abstract Automating hardware design could obviate a significant amount of human error from the engineering process and lead to fewer errors. Verilog is a popular hardware description language to model and design digital systems, thus generating Verilog code is a critical first step. Emerging large language models (LLMs) are able to write high-quality code in other programming languages. In this paper, we characterize the ability of LLMs to generate useful Verilog. For this, we fine-tune pre-trained LLMs on Verilog datasets collected from GitHub and Verilog textbooks. We construct an evaluation framework comprising test-benches for functional analysis and a flow to test the syntax of Verilog code generated in response to problems of varying difficulty. Our findings show that across our problem scenarios, the fine-tuning results in LLMs more capable of producing syntactically correct code (25.9\% overall). Further, when analyzing functional correctness, a fine-tuned open-source CodeGen LLM can outperform the state-of-the-art commercial Codex LLM (6.5\% overall). Training/evaluation scripts and LLM checkpoints are available as open source contributions.
08:55 CEST	BPA5.2	PROCESSOR VERIFICATION USING SYMBOLIC EXECUTION: A RISC-V CASE-STUDY Speaker: Niklas Bruns, Group of Computer Architecture of Universität Bremen, DE Authors: Niklas Bruns¹, Vladimir Herdt² and Rolf Drechsler³ ¹University of Bremen, DE; ²DFKI, DE; ³University of Bremen \| DFKI, DE Abstract We propose to leverage state-of-the-art symbolic execution techniques from the Software (SW) domain for processor verification at the Register-Transfer Level (RTL). In particular, we utilize an Instruction Set Simulator (ISS) as a reference model and integrate it with the RTL processor under test in a co-simulation setting. We then leverage the symbolic execution engine KLEE to perform a symbolic exploration that searches for functional mismatches between the ISS and RTL processor. To ensure a comprehensive verification process, symbolic values are used to represent the instructions and also to initialize the register values of the ISS and processor. As a case study, we present results on the verification of the open source RISC-V based MicroRV32 processor, using the ISS of the open source RISC-V VP as a reference model. Our results demonstrate that modern symbolic execution techniques are applicable to a full scale processor co-simulation in the embedded domain and are very effective in finding bugs in the RTL core.
09:20 CEST	BPA5.3	PERSPECTOR: BENCHMARKING BENCHMARK SUITES Speaker: Sandeep Kumar, IIT Delhi, IN Authors: Sandeep Kumar¹, Abhisek Panda² and Smruti R. Sarangi¹ ¹IIT Delhi, IN; ²Indian Institute of Technology, IN Abstract Estimating the quality of a benchmark suite is a non-trivial task. A poorly selected or improperly configured benchmark suite can present a distorted picture of the performance of the evaluated framework. With computing venturing into new domains, the total number of benchmark suites available is increasing by the day. Researchers must evaluate these suites quickly and decisively for their effectiveness. We present Perspector, a novel tool to quantify the performance of a benchmark suite. Perspector comprises novel metrics to characterize the quality of a benchmark suite. It provides a mathematical framework for capturing some qualitative suggestions and observations made in prior work. The metrics are generic and domain-agnostic. Furthermore, our tool can be used to compare the efficacy of one suite vis-a`-vis other benchmark suites, systematically and rigorously create a suite of workloads, and appropriately tune them for a target system.
09:45 CEST	BPA5.4	INTERACTIVE TECHNICAL PRESENTATIONS BY THE AUTHORS Speaker: Authors of the session, DATE, BE Author: Session Chairs, DATE, BE Abstract Participants can freely interact with authors during their interactive technical presentations.

BPA8 Machine Learning techniques for embedded systems

Add this session to my calendar

Date: Tuesday, 18 April 2023
Time: 08:30 CEST - 10:30 CEST
Location / Room: Marble Hall

Session chair:
Bing Li, TU Munich, DE

Time	Label	Presentation Title Authors
08:30 CEST	BPA8.1	PRADA: POINT CLOUD RECOGNITION ACCELERATION VIA DYNAMIC APPROXIMATION Speaker: Zhuoran Song, Shanghai Jiao Tong University, CN Authors: Zhuoran Song, Heng Lu, Gang Li, Li Jiang, Naifeng Jing and Xiaoyao Liang, Shanghai Jiao Tong University, CN Abstract Recent point cloud recognition (PCR) tasks tend to utilize deep neural network (DNN) for better accuracy. Still, the computational intensity of DNN makes them far from real-time processing, given the fast-increasing number of points that need to be processed. Because the point cloud represents 3D-shaped discrete objects in the physical world using a mass of points, the points tend for an uneven distribution in the view space that exposes strong clustering possibility and local pairs' similarities. Based on this observation, this paper proposes PRADA, an algorithm-architecture co-design that can accelerate PCR while reserving its accuracy. We propose dynamic approximation, which can approximate and eliminate the similar local pairs' computations and recover their results by copying key local pairs' features for PCR speedup without losing accuracy. For accuracy good, we further propose an advanced re-clustering technique to maximize the similarity between local pairs. For performance good, we then propose a PRADA architecture that can be built on any conventional DNN accelerator to dynamically approximate the similarity and skip the redundant DNN computation with memory accesses at the same time. Our experiments on a wide variety of datasets show that PRADA averagely achieves 4.2x, 4.9x, 7.1x, and 12.2x speedup over Mesorasi, V100 GPU, 1080TI GPU, and Xeon CPU with negligible accuracy loss.
08:55 CEST	BPA8.2	FEDERATED LEARNING WITH HETEROGENEOUS MODELS FOR ON-DEVICE MALWARE DETECTION IN IOT NETWORKS Speaker: Sanket Shukla, George Mason University, IN Authors: sanket shukla¹, Setareh Rafatirad², Houman Homayoun³ and Sai Manoj Pudukotai Dinakarrao⁴ ¹George mason university, US; ²University of California, Davis, US; ³University of California Davis, US; ⁴George Mason University, US Abstract IoT devices have been widely deployed in a vast number of applications to facilitate smart technology, increased portability, and seamless connectivity. Despite being widely adopted, security in IoT devices is often considered an afterthought due to resource and cost constraints. Among multiple security threats, malware attacks are observed to be a pivotal threat to IoT devices. Considering the spread of IoT devices and the threats they experience over time, deploying a static malware detector that is trained offline seems to be an ineffective solution. On the other hand, on-device learning is an expensive or infeasible option due to the limited available resources on IoT devices. To overcome these challenges, this work employs ‘Federated Learning' (FL) which enables timely updates to the malware detection models for increased security while mitigating the high communication or data storage overhead of centralized cloud approaches. Federated learning allows training machine learning models with decentralized data while preserving its privacy by design. However, one of the challenges with the FL is that the ondevice models are required to be homogeneous, which may not be true in the case of networked IoT systems. As a panacea, we introduce a methodology to unify the models in the cloud with minimal overheads and an impact on on-device malware detection. We evaluate the proposed technique against homogeneous models in networked IoT systems encompassing Raspberry Pi devices. The experimental results and system efficiency analysis indicate that end-to-end training time is just 1.12× higher than traditional FL, testing latency is 1.63× faster, and malware detection performance is improved by 7% to 13% for resource-constrained IoT devices.
09:20 CEST	BPA8.3	GENETIC ALGORITHM-BASED FRAMEWORK FOR LAYER-FUSED SCHEDULING OF MULTIPLE DNNS ON MULTI-CORE SYSTEMS Speaker: Sebastian Karl, TU Munich, DE Authors: Sebastian Karl¹, Arne Symons², Nael Fasfous³ and Marian Verhelst² ¹TU Munich, DE; ²KU Leuven, BE; ³BMW AG, DE Abstract Heterogeneous multi-core architectures are becoming a popular design choice to accelerate the inference of modern deep neural networks (DNNs). This trend allows for more flexible mappings onto the cores, but shifts the challenge to keeping all cores busy due to limited network parallelism. To this extent, layer-fused processing, where several layers are mapped simultaneously to an architecture and executed in a depth-first fashion, has shown promising opportunities to maximize core utilization. However, SotA mapping frameworks fail to efficiently map layer-fused DNNs onto heterogeneous multi-core architectures due to ignoring 1.) on-chip weight traffic and 2.) inter-core communication congestion. This work tackles these shortcomings by introducing a weight memory manager (WMM), which manages the weights present in a core and models the cost of re-fetching weights. Secondly, the inter-core communication (ICC) of feature data is modeled through a limited-bandwidth bus, and optimized through a contention-aware scheduler (CAS). Relying on these models, a genetic algorithm is developed to optimally schedule different DNN layers across the different cores. The impact of our enhanced modeling, core allocation and scheduling capabilities is shown in several experiments and demonstrates a decrease of 52% resp. 38% in latency, resp. energy when mapping a multi-DNN inference, consisting of ResNet-18, MobileNet-V2 and Tiny YOLO V2, on a heterogeneous multi-core platform compared to iso-area homogeneous architectures.
09:45 CEST	BPA8.4	INTERACTIVE TECHNICAL PRESENTATIONS BY THE AUTHORS Speaker: Authors of the session, DATE, BE Author: Session Chairs, DATE, BE Abstract Participants can freely interact with authors during their interactive technical presentations.

MPP2 Multi-partner projects

Add this session to my calendar

Date: Tuesday, 18 April 2023
Time: 08:30 CEST - 10:00 CEST
Location / Room: Gorilla Room 1.5.3

Session chair:
Paul Pop, TU Denmark, DK

Time	Label	Presentation Title Authors
08:30 CEST	MPP2.1	SECURING A RISC-V ARCHITECTURE: A DYNAMIC APPROACH Speaker: Sebastien Pillement, IETR - Nantes University, FR Authors: Sebastien Pillement¹, Maria Mendez Real¹, Juliette Pottier¹, Thomas Nieddu², Bertrand Le Gall², Sébastien Faucou³, Jean-Luc Béchennec⁴, Mikaël Briday⁵, Sylvain Girbal⁶, Jimmy Le Rhun⁶, Olivier Gilles⁶, Daniel Gracia Pérez⁷, Andre Sintzoff⁸ and Jean-Roch Coulon⁸ ¹École Polytechnique de l'Université de Nantes, FR; ²IMS, FR; ³Université de Nantes, FR; ⁴LS2N/CNRS, FR; ⁵École Centrale de Nantes - LS2N, FR; ⁶THALES TRT, FR; ⁷Thales, FR; ⁸THALES DIS, FR Abstract For decades, the evolution of processors has focused on improving their performance. In recent years, attacks directly exploiting optimization mechanisms have appeared. Using for example caches, performance counters or speculation units, they jeopardize the safety and security of processors and the industrial systems that operate them. We can cite SPECTRE and Meltdown as flagship examples. The open-HW approaches and in particular the RISC-V initiative are now both an economic reality and an innovation opportunity for European players in the field of processors architecture. The use of this open-source approach requires the design of secure processor cores, and therefore makes it possible to move towards greater independence in the field of cyber-security. The SECURE-V project offers an innovative open-source, secure and high-performance processor core based on the ISA RISC-V. The originality of the approach lies in the integration of a dynamic code transformation unit covering 4 of the 5 NIST functions of cybersecurity, in particular via monitoring (identify, detect), obfuscation (protect), and dynamic adaptation (reacting). This dynamic management paves the way for online optimizations that improve the security and safety of the micro-architecture without overhauling the software or the architecture of the chip.
08:33 CEST	MPP2.2	THE ZUSE-KI-MOBIL AI ACCELERATOR SOC: OVERVIEW AND A FUNCTIONAL SAFETY PERSPECTIVE Speaker: Fabian Kempf, Karlsruhe Institute of Technology, DE Authors: Fabian Kempf¹, Julian Hoefer¹, Tanja Harbaum¹, Juergen Becker¹, Nael Fasfous², Alexander Frickenstein³, Hans-Jörg Vögel³, Simon Friedrich⁴, Robert Wittig⁴, Emil Matus⁴, Gerhard Fettweis⁴, Matthias Lueders⁵, Holger Blume⁶, Karl-Heinz Eickel⁷, Darius Grantz⁷, Jens Benndorf⁷, Martin Zeller⁷ and Dietmar Engelke⁷ ¹Karlsruhe Institute of Technology, DE; ²BMW AG, DE; ³BMW Group, DE; ⁴TU Dresden, DE; ⁵Leibniz University Hannover, DE; ⁶Leibniz Universität Hannover, DE; ⁷Dream Chip Technologies GmbH, DE Abstract The goal of the ZuKIMo project is to develop a new System-on-Chip (SoC) platform and corresponding ecosystem to enable efficient Artificial Intelligence (AI) applications with specific requirements. With ZuKIMo, we specifically target applications from the mobility domain, i.e. autonomous vehicles and drones. The initial ecosystem is built by a consortium consisting of seven partners from German academia and industry. We develop the SoC platform and its ecosystem around a novel AI Accelerator design. The customizable accelerator is conceived from scratch to fulfill the functional and non-functional requirements derived from the ambitious use cases. A tape-out in 22~nm FDX-technology is planned in 2023. Apart from the System-on-Chip hardware design itself, the ZuKIMo ecosystem has the objective of providing software tooling for easy deployment of new use cases and hardware-CNN co-design. Furthermore, AI accelerators in safety-critical applications like our mobility use cases, necessitate the fulfillment of safety requirements. Therefore, we investigate new design methodologies for fault analysis of Deep Neural Networks (DNNs) and introduce our new redundancy mechanism for AI accelerators.
08:36 CEST	MPP2.3	ZUSE-KI-AVF: APPLICATION-SPECIFIC AI PROCESSOR FOR INTELLIGENT SENSOR SIGNAL PROCESSING IN AUTONOMOUS DRIVING Speaker: Sven Gesper, TU Braunschweig, DE Authors: Gia Bao Thieu¹, Sven Gesper², Guillermo Payá Vayá¹, Christoph Riggers³, Oliver Renke³, Till Fiedler³, Jakob Marten³, Tobias Stuckenberg⁴, Holger Blume³, Christian Weis⁵, Lukas Steiner⁵, Chirag Sudarshan⁵, Norbert Wehn⁵, Lennart Reimann⁶, Rainer Leupers⁶, Michael Beyer⁷, Daniel Köhler⁷, Alisa Jauch⁷, Jan Micha Bormann⁷, Setareh Jaberansari⁷, Tim Berthold⁸, Meinolf Blawat⁸, Markus Kock⁸, Gregor Schewior⁸, Jens Benndorf⁸, Frederik Kautz⁹, Hans-Martin Bluethgen¹⁰ and Christian Sauer⁹ ¹TU Braunschweig, DE; ²TU Braunschweig, Chair of Chip Design for Embedded Computing, DE; ³Leibniz Universität Hannover, DE; ⁴Leibniz Universität Hannover, Institute of Microelectronic Systems, DE; ⁵TU Kaiserslautern, DE; ⁶RWTH Aachen University, DE; ⁷Robert Bosch GmbH, DE; ⁸Dream Chip Technologies GmbH, DE; ⁹Cadence Design Systems, DE; ¹⁰Cadence Design System GmbH, DE Abstract Modern and future AI-based automotive applications, such as autonomous driving, require the efficient real-time processing of huge amounts of data from different sensors, like camera, radar, and LiDAR. In the ZuSE-KI-AVF project, multiple university, and industry partners collaborate to develop a novel massive parallel processor architecture, based on a customized RISC-V host processor, and an efficient high-performance vertical vector coprocessor. In addition, a software development framework is also provided to efficiently program AI-based sensor processing applications. The proposed processor system was verified and evaluated on a state-of-the-art UltraScale+ FPGA board, reaching a processing performance of up to 126.9 FPS, while executing the YOLO-LITE CNN on 224x224 input images. Further optimizations of the FPGA design and the realization of the processor system on a 22nm FDSOI CMOS technology are planned.
08:39 CEST	MPP2.4	EUFRATE: EUROPEAN FPGA RADIATION-HARDENED ARCHITECTURE FOR TELECOMMUNICATIONS Speaker: Luca Sterpone, Politecnico di Torino- Department of Control and Control Engineering (DAUIN), IT Authors: Ludovica Bozzoli¹, Antonino Catanese¹, Emilio Fazzoletto¹, Eugenio Scarpa¹, Diana Goehringer², Sergio Pertuz², Lester Kalms², Cornelia Wulf², Najdet Charaf³, Luca Sterpone⁴, Sarah Azimi⁴, Daniele Rizzieri⁴, Salvatore Gabriele La Greca⁴, David Merodio Codinachs⁵ and Stephen King⁵ ¹Argotec, IT; ²TU Dresden, DE; ³TU Dresden, Faculty of Computer Science, Chair of Adaptive Dynamic Systems, DE; ⁴Politecnico di Torino, IT; ⁵European Space Agency, NL Abstract The EuFRATE project aims to research, develop and test radiation-hardening methods for telecommunication payloads deployed for Geostationary-Earth Orbit (GEO) using Commercial-Off-The-Shelf Field Programmable Gate Arrays (FPGAs). This project is conducted by Argotec Group (Italy) with the collaboration of two partners: Politecnico di Torino (Italy) and Technische Universität Dresden (Germany). The idea of the project focuses on high-performance telecommunication algorithms and the design and implementation strategies for connecting an FPGA device into a robust and efficient cluster of multi-FPGA systems. The radiation-hardening techniques currently under development are addressing both device and cluster levels, with redundant datapaths on multiple devices, comparing the results and isolating fatal errors. This paper introduces the current state of the project's hardware design description, the composition of the FPGA cluster node, the proposed cluster topology, and the radiation hardening techniques. Intermediate stage experimental results of the FPGA communication layer performance and fault detection techniques are presented. Finally, a wide summary of the project's impact on the scientific community is provided.
08:42 CEST	MPP2.5	THE SERRANO PLATFORM: STEPPING TOWARDS SEAMLESS APPLICATION DEVELOPMENT & DEPLOYMENT IN THE HETEROGENEOUS EDGE-CLOUD CONTINUUM Speaker: Argyrios Kokkinis, Aristotle University of Thessaloniki, GR Authors: Aggelos Ferikoglou¹, Argyris Kokkinis¹, Dimitrios Danopoulos¹, Ioannis Oroutzoglou¹, Anastasios Nanos², Stathis Karanastasis³, Marton Sipos⁴, Javad Ghotbi⁵, Juan Jose Olmos⁶, Dimosthenis Masouros¹ and Kostas Siozios⁷ ¹Aristotle University of Thessaloniki, GR; ²Nubificus LTD, GB; ³INNOV, GR; ⁴Chocolate Cloud, DK; ⁵HLRS, DE; ⁶NVIDIA, DK; ⁷Department of Physics, Aristotle University of Thessaloniki, GR Abstract The need for real-time analytics and faster decision-making mechanisms has led to the adoption of hardware accelerators such as GPUs and FPGAs within the edge cloud computing continuum. However, their programmability and lack of orchestration mechanisms for seamless deployment make them difficult to use efficiently. We address these challenges by presenting SERRANO, a project for transparent application deployment in a secure, accelerated, and cognitive cloud continuum. In this work, we introduce the SERRANO platform and its software, orchestration, and deployment services, focusing on its methods for automated GPU/FPGA acceleration and efficient, isolated, and secure deployments. By evaluating these services against representative use cases, we highlight SERRANO 's ability to simplify the development and deployment process without sacrificing performance.
08:45 CEST	MPP2.6	EVALUATION OF HETEROGENEOUS AIOT ACCELERATORS WITHIN VEDLIOT Speaker: Rene Griessl, Bielefeld University, DE Authors: Rene Griessl¹, Florian Porrmann¹, Nils Kucza¹, Kevin Mika¹, Jens Hagemeyer¹, Martin Kaiser¹, Mario Porrmann², Marco Tassemeier², Marcel Flottmann², Fareed Qararyah³, Muhammad Azhar³, Pedro Trancoso³, Daniel Odman⁴, Karol Gugala⁵ and Grzegorz Latosinksi⁵ ¹Bielefeld University, DE; ²Osnabrueck University, DE; ³Chalmers, SE; ⁴EmbeDL AB, SE; ⁵Antmicro Ltd, PL Abstract Within VEDLIoT, a project targeting the development of energy-efficient Deep Learning for distributed AIoT applications, several accelerator platforms based on technologies like CPUs, embedded GPUs, FPGAs, or specialized ASICs are evaluated. The VEDLIoT approach is based on modular and scalable cognitive IoT hardware platforms. Modular microserver technology enables the integration of different, heterogeneous accelerators into one platform. Benchmarking of the different accelerators takes into account performance, energy efficiency and accuracy. The results in this paper provide a solid overview regarding available accelerator solutions and provide guidance for hardware selection for AIoT applications from far edge to cloud. VEDLIoT is an H2020 EU project which started in November 2020. It is currently in an intermediate stage. The focus is on the considerations of the performance and energy efficiency of hardware accelerators. Apart from the hardware and accelerator focus presented in this paper, the project also covers toolchain, security and safety aspects. The resulting technology is tested on a wide range of AIoT applications.
08:48 CEST	MPP2.7	SPHERE-DNA: PRIVACY-PRESERVING FEDERATED LEARNING FOR EHEALTH Speaker: Jari Nurmi, Tampere University, FI Authors: Jari Nurmi¹, Yinda Xu¹, Jani Boutellier² and Bo Tan¹ ¹Tampere University, FI; ²University of Vaasa, FI Abstract The rapid growth of chronic diseases and medical conditions (e.g. obesity, depression, diabetes, respiratory and musculoskeletal diseases) in many OECD countries has become one of the most significant wellbeing problems, which also poses pressure to the sustainability of healthcare and economies. Thus, it is important to promote early diagnosis, intervention, and healthier lifestyles. One partial solution to the problem is extending long-term health monitoring from hospitals to natural living environments. It has been shown in laboratory settings and practical trials that sensor data, such as camera images, radio samples, acoustics signals, infrared etc., can be used for accurately modelling activity patterns that are related to different medical conditions. However, due to the rising concern related to private data leaks and, consequently, stricter personal data regulations, the growth of pervasive residential sensing for healthcare applications has been slow. To mitigate public concern and meet the regulatory requirements, our national multi-partner project aims to combine pervasive sensing technology with secured and privacy-preserving distributed privacy frameworks for healthcare applications. The project leverages local differential privacy federated learning (LDP-FL) to achieve resilience against active and passive attacks, as well as edge computing to avoid transmitting sensitive data over networks. Combinations of sensor data modalities and security architectures are explored by a machine learning architecture for finding the most viable technology combinations, relying on metrics that allow balancing between computational cost and accuracy for a desired level of privacy. We also consider realistic edge computing platforms and develop hardware acceleration and approximate computing techniques to facilitate the adoption of LDP-FL and privacy preserving signal processing to lightweight edge processors. A proof-of-concept (PoC) multimodal sensing system will be developed and a novel multimodal dataset will be collected during the project to verify the concept.
08:51 CEST	MPP2.8	INTERACTIVE TECHNICAL PRESENTATIONS BY THE AUTHORS Speaker: Authors of the session, DATE, BE Author: Session Chairs, DATE, BE Abstract Participants can freely interact with authors during their interactive technical presentations.

SpD1 Special Day on Human AI-Interaction: Introduction, innovations and technologies

Add this session to my calendar

Date: Tuesday, 18 April 2023
Time: 08:30 CEST - 10:00 CEST
Location / Room: Darwin Hall

Computing systems are increasingly entangled with the physical world, such that keyboards and screens are no longer the only way to communicate between humans and computers. More “natural” ways to communicate such as voice commands, analysis of the environment and imaging are increasingly widespread thanks to the progress of Artificial Intelligence. To further enhance communication and understanding between humans and machines, the next step for computing systems will be to enable more precise evaluation of all implicit communications, including emotions. In exchange, they should provide more natural, human-like responses, in a trustworthy way. The goal of this special day on Human AI-interaction is to show the latest developments in this field, including “emotional systems” but also to present the corresponding ethical aspects.

Time	Label	Presentation Title Authors
08:30 CEST	SpD1.1	INTRODUCTION AND A QUICK STATE-OF-THE-ART ON HUMAN AI-INTERACTION. Speaker: Marina Zapater, University of Applied Sciences Western Switzerland (HES-SO), CH and Marc Duranton, CEA, FR Authors: Marina Zapater¹ and Marc Duranton² ¹University of Applied Sciences Western Switzerland (HES-SO), CH; ²CEA, FR Abstract We will make a short introduction of the topic of Human AI-Interaction which involves the study of how humans and machines can communicate and interact with each other in a more natural and intuitive way, and show existing realizations exhibiting few aspects of this field.
09:00 CEST	SpD1.2	THE FUTURE OF BRAIN-MACHINE INTERFACES: AI-DRIVEN INNOVATIONS Presenter: Shoaran Mahsa, EPFL, CH Author: Shoaran Mahsa, EPFL, CH Abstract Implantable neural devices and Brain-Machine Interfaces (BMIs) hold the promise to offer new therapies for brain disorders when symptoms no longer improve with medications and other treatments. Despite significant advances in neural interface microsystems over the past decade, the limited embedded processing and small number of channels in the existing technologies remain a barrier to their therapeutic potential. In this talk, I will provide an overview of the state-of-the-art research on BMIs and our recent efforts to integrate modern machine learning techniques on neural microchips for various neurological and psychiatric disorders. I will also discuss how AI can improve next-generation BMIs to restore movement and communication for paralyzed patients.
09:30 CEST	SpD1.3	PRIVACY-PRESERVING EDGE FEDERATED LEARNING. Presenter: Amir Aminifar, Lund University, SE Author: Amir Aminifar, Lund University, SE Abstract We are now entering the era of intelligent Internet of Things (IoT) systems. The bar is set high. Despite the inherently complex nature of human interactions, we would like these systems to react to our inputs, and perhaps even to our emotions, in real time. We also expect such systems to be self-adaptive, i.e., continuously learn and evolve over time in interaction with humans. At the same time, we would like these systems to be trustworthy, e.g., to ensure privacy with respect to our personal data. In this talk, we discuss how edge federated learning could address such challenges and pave the way for the development of intelligent IoT systems.

W03 Workshop on Nano Security: From Nano-Electronics to Secure Systems

Add this session to my calendar

Date: Tuesday, 18 April 2023
Time: 08:30 CEST - 12:30 CEST
Location / Room: Nightingale Room 2.6.1/2

Organisers:
Ilia Polian, University of Stuttgart, DE
Nan Du, Friedrich Schiller University Jena, Germany, DE
Shahar Kvatinsky, Technion – Israel Institute of Technology, IL
Ingrid Verbauwhede, KU Leuven, BE

Today’s societies critically depend on electronic systems. Security of such systems are facing completely new challenges due to the ongoing transition to radically new types of nano-electronic devices, such as memristors, spintronics, or carbon nanotubes. The use of such emerging nano-technologies is inevitable to address the essential needs related to energy efficiency, computing power and performance. Therefore, the entire industry is switching to emerging nano-electronics alongside scaled CMOS technologies in heterogeneous integrated systems. These technologies come with new properties and also facilitate the development of radically different computer architectures.

The proposed workshop will bring together researchers from hardware-oriented security and from emerging hardware technology. It will explore the potential of new technologies and architectures to provide new opportunities for achieving security targets, but it will also raise questions about their vulnerabilities to new types of hardware-oriented attacks. The workshop is based on a Priority Program https://spp-nanosecurity.uni-stuttgart.de/ funded since 2019 by the German DFG, and will be open to members and non-members of that Priority Program alike.

W03.1 Keynote

Add this session to my calendar

Date: Tuesday, 18 April 2023
Time: 08:30 CEST - 09:15 CEST
Location / Room: Nightingale Room 2.6.1/2

Session chair:
Ilia Polian, University of Stuttgart, DE

Securing the Internet of Bodies using Human Body as a ‘Wire’

Shreyas Sen, Purdue University

Abstract: Radiative communication using electromagnetic (EM) fields is the state-of-the-art for connecting wearable and implantable devices enabling prime applications in the fields of connected healthcare, electroceuticals, neuroscience, augmented and virtual reality (AR/VR) and human-computer interaction (HCI), forming a subset of the Internet of Things called the Internet of body (IoB). However, owing to such radiative nature of the traditional wireless communication, EM signals propagate in all directions, inadvertently allowing an eavesdropper to intercept the information. Moreover, since only a fraction of the energy is picked up by the intended device, and the need for high carrier frequency compared to information content, wireless communication tends to suffer from poor energy-efficiency (>nJ/bit). Noting that all IoB devices share a common medium, i.e., the human body, using the conductivity of the human the body allows low-loss transmission, termed as human body communication (HBC) and improves energy-efficiency. Conventional HBC implementations still suffer from significant radiation compromising physical security and efficiency. Our recent work has developed Electro-Quasistatic Human Body Communication (EQS-HBC), a method for localizing signals within the body using low-frequency transmission, thereby making it extremely difficult for a nearby eavesdropper to intercept critical personal data, thus producing a covert communication channel, i.e., the human body as a ‘wire’ along with reducing interference.

In this talk, I will explore the fundamentals of radio communication around the human body to lead to the evolution of EQS-HBC and show recent advancements in the field which has a strong promise to become the future of Body Area Network (BAN). I will show the theoretical development of the first Bio-Physical Model of EQS-HBC and how it was leveraged to develop the world’s lowest-energy (<10pJ/b) and world’s first sub-uW Physically and Mathematically Secure (AES 256) IoB Communication SoC, with >100x improvement in energy-efficiency over Bluetooth. I’ll also highlight how recent developments in mixed-signal circuit techniques allow orders of magnitude improvement in side-channel attack resistance of the encryption engines in such SoCs. Finally, I will highlight the possibilities and applications in the fields of HCI, Medical Device Communication, and Neuroscience including a video demonstration. I will highlight how such low-power secure communication in combination with in-sensor intelligence is paving the way forward for Secure and Efficient IoB Sensor Nodes.

Bio: Shreyas Sen is an Elmore Associate Professor of ECE & BME, Purdue University and received his Ph.D. degree from ECE, Georgia Tech and the Founder and CTO of Ixana. His current research interests span mixed-signal circuits/systems and electromagnetics for the Internet of Things (IoT), Biomedical, and Security. He has authored/co-authored 3 book chapters, over 175 journal and conference papers and has 25 patents granted/pending. Dr. Sen serves as the founding Director of the Center for Internet of Bodies (C-IoB) at Purdue. Dr. Sen is the inventor of the Electro-Quasistatic Human Body Communication (EQS-HBC), or Body as a Wire technology, for which, he is the recipient of the MIT Technology Review top-10 Indian Inventor Worldwide under 35 (MIT TR35 India) Award in 2018 and the Georgia Tech 40 Under 40 Award in 2022. His work has been covered by 250+ news releases worldwide, IEEE Spectrum feature, invited appearance on TEDx Indianapolis, Indian National Television CNBC TV18 Young Turks Program, NPR subsidiary Lakeshore Public Radio and the CyberWire podcast. Dr. Sen is a recipient of the NSF CAREER Award 2020, AFOSR Young Investigator Award 2016, NSF CISE CRII Award 2017, Intel Outstanding Researcher Award 2020, Google Faculty Research Award 2017, Purdue CoE Early Career Research Award 2021, Intel Labs Quality Award 2012 for industrywide impact on USB-C type, Intel Ph.D. Fellowship 2010, IEEE Microwave Fellowship 2008, GSRC Margarida Jacome Best Research Award 2007, and nine best paper awards including IEEE CICC 2019, 2021 and in IEEE HOST 2017-2020, for four consecutive years. Dr. Sen's work was chosen as one of the top-10 papers in the Hardware Security field (TopPicks 2019). He serves/has served as an Associate Editor for IEEE Solid States Circuits Letters (SSC-L), Nature Scientific Reports, Frontiers in Electronics, IEEE Design & Test, Executive Committee member of IEEE Central Indiana Section and Technical Program Committee member of DAC, CICC, IMS, CCS, DATE, ISLPED, ICCAD, ITC, VLSI Design, among others. Dr. Sen is a Senior Member of IEEE.

W03.2 Session 1: PUFs and RNGs

Add this session to my calendar

Date: Tuesday, 18 April 2023
Time: 09:15 CEST - 09:45 CEST
Location / Room: Nightingale Room 2.6.1/2

Session chair:
Nan Du, Friedrich Schiller University Jena, Germany, DE

Carbon-Nanotube-Based Physical Unclonable Functions and True Random Number Generators

Nikolaos Athanasios Anagnostopoulos¹, Tolga Arul^1,2, Simon Böttger³, Florian Frank¹, Ali Mohamed³, Martin Hartmann³, Sascha Hermann^3,4 and Stefan Katzenbeisser¹,

¹University of Passau, ²TU Darmstadt, ³TU Chemnitz, ⁴Fraunhofer ENAS, Chemnitz

Towards a PVT-Variation Resistant Resistor-Based PUF

Carl Riehm¹, Christoph Frisch¹, Florin Burcea¹, Matthias Hiller², Michael Pehl¹ and Ralf Brederlow¹,

¹TU Munich ²Fraunhofer AISEC, Garching

W03.3 Session 2: Side-channel Attacks

Add this session to my calendar

Date: Tuesday, 18 April 2023
Time: 09:45 CEST - 10:15 CEST
Location / Room: Nightingale Room 2.6.1/2

Session chair:
Ingrid Verbauwhede, KU Leuven, BE

Practical Considerations for Optical Side-Channel Analysis: A Case Study on Reconfigurable FETs

Thilo Krachenfels¹, Giulio Galderisi², Thomas Mikolajick^2,3, Jens Trommer² and Jean-Pierre Seifert^1,4,

¹TU Berlin, ²NaMLab gGmbH, Dresden, ³TU Dresden, ⁴Fraunhofer SIT, Darmstadt

Side-Channel Leakage Evaluation of Multi-Chip Cryptographic Modules

Kazuki Monta¹, Takumi Matsumaru¹, Takaaki Okidono², Takuji Miki¹ and Makoto Nagata¹,

¹Kobe U ²SCU Co. Ltd, Tokyo

W03.4 Poster session: Projects of Priority Program Nano Security

Add this session to my calendar

Date: Tuesday, 18 April 2023
Time: 10:15 CEST - 11:00 CEST
Location / Room: Nightingale Room 2.6.1/2

Session chair:
Shahar Kvatinsky, Technion – Israel Institute of Technology, IL

PUFMem: Intrinsic Physical Unclonable Functions from Emerging Non-Volatile Memories

Tolga Arul, Stefan Katzenbeisser, Florian Frank, University of Passau

nanoEBeam: E Beam Probing for backside attacks against nanoscale ICs

Frank Altmann, Jörg Jatzkowski, FhG IMWS Halle, Elham Amini, Jean-Pierre Seifert, Christian Boit, Tholo Krachenfels, TU Berlin

STAMPS: From Strain to Trust: tAMper aware silicon PufS

Ralf Brederlow, TU Munich, Matthias Hiller, FhG AISEC

RAINCOAT: Randomization in Secure Nano-Scale Microarchitectures

Christian Niesler¹, Jan Thoma², Lucas Davi¹, Tim Güneysu²

¹University of Duisburg-Essen, ²Ruhr University Bochum

OptiSecure: Securing Nano-Circuits against Optical Probing

Sajjad Parvin¹, Thilo Krachenfels², Frank Sill Torres³, Jean-Pierre Seifert^2,4, Rolf Drechsler^1,5

¹University of Bremen, ²TU Berlin, ³DLR, Bremerhaven, ⁴Fraunhofer SIT, Darmstadt, ⁵DFKI, Bremen

MemCrypto: Towards Secure Electroforming-free Memristive Cryptographic Implementations

Nan Du (University of Jena and Leibniz IPHT), Ilia Polian (University of Stuttgart)

HaSPro: Verifiable Hardware Security for Out-of-Order Processors

Thomas Eisenbarth, University of Lübeck, Wolfgang Kunz, Tobias Jauch, TU Kaiserslautern

NANOSEC: Tamper-Evident PUFs based on Nanostructures for Secure and Robust Hardware Security Primitives

Sascha Hermann, TU Chemnitz, Stefan Katzenbeisser, Nikolaus Athanasios Anagnostopoulos, University of Passau

SecuReFET: Secure Circuits through inherent Reconfigurable FET

Shubham Rai, Akash Kumar, TU Dresden

Giulio Galderisi, Thomas Mikolajick, Jens Trommer, NaMLab gGmbH, Dresden

BioNanoLock: Bio-Nanoelectronic based Logic Locking for Secure Systems

Farhad Amirali Merchant, Vivek Pachauri, Rainer Leupers, Elmira Moussavi, RWTH Aachen

RRAMPUFTRNG: CMOS-compatible RRAM-based structures for the implementation of Physical Unclonable Functions (PUF) and True Random Number Generators (TRNG)

Sahitya Yarragolla, Torben Hemke, Thomas Mussenbrock, Ruhr University Bochum

W03.5 Session 3: Trustworthy Electronics

Add this session to my calendar

Date: Tuesday, 18 April 2023
Time: 11:00 CEST - 11:45 CEST
Location / Room: Nightingale Room 2.6.1/2

Session chair:
Jean-Pierre Seifert, TU Berlin, DE

Quantifying Trust in Hardware through Physical Inspection

Bernhard Lippmann¹, Matthias Ludwig¹ and Horst Gieser²,

¹Infineon Technologies AG, Munich ²Fraunhofer EMFT, Munich

(Un)Attractiveness for State Machine Obfuscation

Michaela Brunner¹, Hye Hyun Lee¹, Alexander Hepp¹, Johanna Baehr¹ and Georg Sigl^1,2,

¹TU Munich ²Fraunhofer AISEC, Garching

Thwarting Structural Attacks on Logic Locking with Reconfigurable Nanotechnologies

Armin Darjani, Nima Kavand and Akash Kumar,

TU Dresden

W03.6 Panel

Add this session to my calendar

Date: Tuesday, 18 April 2023
Time: 11:45 CEST - 12:30 CEST
Location / Room: Nightingale Room 2.6.1/2

Session Chair:
Ilia Polian, University of Stuttgart, DE

Security Issues in Heterogeneous Systems

Panelists:

Farimah Farahmandi, University of Florida

Sandip Kundu, University of Massachusetts, Amherst

Shahar Kvatinsky, Technion

Johanna Sepulveda, Airbus Defence and Space