Our research centers around the systematic design (CAD) of hardware/software systems, ranging from embedded systems to HPC platforms. One principal research direction is domain-specific computing that tries to tackle the very complex programming and design challenge of parallel heterogeneous computer architectures. Domain-specific computing drastically separates the concerns of algorithm development and target architecture implementation, including parallelization and low-level implementation details. The key idea is to take advantage of the knowledge being inherent in a particular problem area or field of application, i.e., a particular domain, in a well-directed manner and thus, to master the complexity of heterogeneous systems. Such domain knowledge can be captured by reasonable abstractions, augmentations, and notations, e.g., libraries, Domain-specific programming languages (DSLs), or combinations of both (e.g., embedded DSLs implemented via template metaprogramming). On this basis, patterns can be utilized to transform and optimize the input description in a goal-oriented way during compilation, and, finally, to generate code for a specific target architecture. Thus, DSLs provide high productivity plus typically also high performance. We develop DSLs and target platform languages to capture both domain and architecture knowledge, which is utilized during the different phases of compilation, parallelization, mapping, as well as code generation for a wide variety of architectures, e.g., multi-core processors, GPUs, MPSoCs, FPGAs. All these steps usually go along with optimizing and exploring the vast space of design options and trading off multiple objectives, such as performance, cost, energy, or reliability.
Research projects
Diffusion-weighted imaging and quantitative susceptibility mapping of the breast, liver, prostate, and brain
Development of new MRI pulse sequences
Development of new MRI post-processing schemes
Joint evaluation of new MR methods with radiology
Domain-specific Computing for Medical imaging
Hipacc – the Heterogeneous Image Processing Acceleration Framework
AI Laboratory for System-level Design of ML-based Signal Processing Applications
Architecture Modeling and Exploration of Algorithms for Medical Image Processing
Neural Approximate Accelerator Architecture Optimization for DNN Inference on Lightweight FPGAs
(Third Party Funds Single)
Term: 1. May 2024 - 30. April 2027 Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
Embedded Machine Learning (ML) constitutes an admittedly fast-growing field that comprises ML algorithms, hardware, and software capable of performing on-device sensor data analyses at extremely low power, enabling thus several always-on and battery-powered applications and services. Running ML-based applications on embedded edge devices witnesses a phenomenal research and business interest for many reasons, including accessibility, privacy, latency, cost, and security. Embedded ML is primarily represented by artificial intelligence (AI) at the edge (EdgeAI) and on tiny, ultra resource constrained devices, a.k.a. TinyML. TinyML poses requirements for energy efficiency but also low latency as well as to retain accuracy in acceptable levels mandating, thus, optimization of the software and hardware stack. GPUs form the default platform for DNN training workloads, due to their high parallelism computing originating by the massive number of processing cores. Though, GPU is often not an optimal solution for DNN inference acceleration due to the high energy-cost and the lack of reconfigurability, especially for high sparsity models or customized architectures. On the other hand, Field Programmable Gate Arrays (FPGAs) have a unique privilege of potentially lower latency and higher efficiency than GPUs while offering high customization and faster time-to-market combined with potentially longer useful life than ASIC solutions. In the context of TinyML, NA³Os focuses on a neural approximate accelerator-architecture co-search targeting specifically lightweight FPGA devices. This project investigates design techniques to optimally and automatically map DNNs to resource- constrained FPGAs while exploiting principles of approximate computing. Our particular topics of investigation include:
Techniques for fast and automated design space exploration of mappings of DNNs defined by a set of approximate operators and a set of FPGA platform constraints.
Investigation of a hardware-aware neural architecture co-search methodology targeting FPGA-based DNN accelerators.
Evaluation of robustness vs. energy efficiency tradeoffs.
Finally, all developed methods shall be evaluated experimentally by providing a proper synthesis path and comparing the quality of generated solutions with state-of-the-art solutions.
Chip design is the essential step when developing microelectronics for specific products and applications. Competence in chip design can strengthen Germany's innovation and competitiveness and increase its technological sovereignty in Europe. In order to leverage this potential, the German and European chip design ecosystem is to be expanded. To this end, the BMBF has launched the Microelectronics Design Initiative with four key areas of focus: a strong network as a central exchange platform, training and further education for talented individuals and specialists, research projects to strengthen design capabilities, and expanding research structures.
Project Goals
The aim of the project is to develop modern AI chips that are designed with a particular focus on security, trustworthiness, and energy efficiency in various application scenarios. Another goal is to implement a seamless transition from software-based AI algorithm development to efficient hardware implementation. The focus here is on the close linking of AI and hardware in the design process as well as the development of various AI accelerators and corresponding architectures. The end result should be an automated design methodology that extends from the AI software to the AI hardware.
The focus of our chair within DI-EDAI is, in particular, on the development of a co-exploration approach that optimizes both neural network models and associated AI-specific microprocessor extensions, taking into account non-functional requirements (e.g., cost, speed, accuracy, energy, security). The results, in the form of hardware blocks and EDA software, shall be published as open source and contribute to creating an ecosystem for designing sustainable and transparent AI systems.
Automatic Cross-Layer Synthesis of High Performance, (Ultra-)Low Power Hardware Implementations from Data Flow Specifications by Integration of Emerging FeFET Technology
High throughput data and signal processing applications can be specified preferably by dataflow networks, as these naturally allow the exploitation of parallelism as well globally (at the level of a network of communicating actors) as locally at the actor level, e.g., by implementing each actor as a hardware circuit. Today, there exist a few system-level design approaches to aid an algorithm designer in compiling a dataflow network to a set of processors or, alternatively, to synthesize the network directly in hardware for achieving high processing speeds. But embedded systems, particularly in the context of IoT applications, have additional requirements: Safe operation, even in an environment of intermittent power shortages, and in general (ultra-)low power requirements. Altogether, these requirements seem to be contradictory.
Our proposed project named HiLoDa (High performance, (ultra-Low) power Dataflow) Nets attacks this obvious discrepancy and conflict in requirements by a) introducing, exploiting, and integrating for the first time emerging FeFET technology for the design of actor networks, i.e., by investigating and designing persistable FIFO-based memory units. b) In particular, circuit devices being able to operate in mixed volatile/non-volatile mode of operation shall be modeled, characterized, and designed. c) By combining the system-level concept of dataflow, which is based on self-scheduled activations of computations with emerging CMOS-compatible FeFET technology, inactive actors or even subnets shall inherit the capability of self-powering (down and wakeup). In addition, for a continuously safe mode of operation, a down-powering must also be triggered upon any intermittent shortage of power supply. Analogously, actors shall perform an auto-wakeup after recovery from a power shortage but also subject to fireability.
HiLoDa Nets will be able to combine high clock-speed data processing of each synthesized actor circuit in power-on mode and automatic state retention using FeFET technology in power-off mode, self-triggered during time intervals of either data unavailability or power shortage. d) A fully automatic cross-layer synthesis from system-level dataflow specification to optimized circuit implementation involving FeFET devices shall be developed. This includes e) the DSE (design space exploration) of actor clusterings at the system level to explore individual power domains for the optimization of throughput, circuit cost, energy savings, and endurance. Finally, f) HiLoDa Nets shall be compared to conventional CMOS technology implementations with respect to energy consumption for applications such as spiking neural networks. Likewise, shutdown (backup) and recovery latencies from power shortages shall be evaluated and optimized.
Artificial Intelligence (AI) methods have quickly progressed from research to productive applications in recent years. Typical AI models (e.g., deep neural networks) yield high memory demands and computational efforts for training and when making predictions during operation. This is opposed to the typically limited resources of embedded controllers used in automotive or industrial applications. To comply with these limitations, AI models must be streamlined on different levels to be applicable to a given specific embedded target hardware, e.g., by architecture and feature selection, pruning, and other compression techniques. Currently, model adaptation to fit the target hardware is achieved by iterative, manual changes in a “trial-and-error” manner: the model is designed, trained, and compiled to the target hardware while applying different optimization techniques. The model is then checked for compliance with the hardware constraints, and the cycle is repeated if necessary. This approach is time-consuming and error-prone.
Therefore, this project, funded by the Schaeffler Hub for Advanced Research at Friedrich-Alexander-Universität Erlangen-Nürnberg (SHARE at FAU), seeks to establish guidelines for hardware selection and a systematic toolchain for optimizing and embedding AI in order to reduce the current efforts of porting machine learning models to automotive and industrial devices.
This project is funded by the German Research Foundation (DFG) within the Priority Program SPP 2377 "Scalable Data Management for Future Hardware".
HYPNOS explores how emerging non-volatile memory (NVM) technologies could beneficially replace not only main memory in modern embedded processor architectures, but potentially also one or multiple levels of the cache hierarchy or even the registers and how to optimize such a hybrid-volatile memory hierarchy for offering high speed and low energy tradeoffs for a multitude of application programs while providing persistence of data structures and processing state in a simple and efficient way.
On the one hand, completely non-volatile (memory) processors (NVPs) that have emerged for IoT devices are known to suffer from low write times of current NVM technologies as well as by orders of magnitude lower endurance than, e.g., SRAM, thus prohibiting an operation at GHz speeds. On the other hand, existing NVM main memory computer solutions suffer from the need of the programmer to explicitly persist data structures through the cache hierarchy.
HYPNOS (Named after the Greek god of sleep.) systematically attacks this intertwined performance/endurance/programmability gap by taking a hardware/software co-design approach:
Our investigations include techniques for
a) design space exploration of hybrid NVM memory processor architectures} wrt. speed and energy consumption including hybrid (mixed volatile) register and cache-level designs,
b) offering instruction-level persistence for (non-transactional) programs in case of, e.g., instantaneous power failures through low-cost and low-latency control unit (hardware) design of checkpointing and recovery functions, and additionally providing
c) application-programmer (software) persistence control on a multi-core HyPNOS system for user-defined checkpointing and recovery from these and other errors or access conflicts backed by size-limited hardware transactional memory (HTM).
d) The explored processor architecture designs and different types of NVM technologies will be systematically evaluated for achievable speed and energy gains, and for testing co-designed backup and recovery mechanisms, e.g., wakeup latencies, etc., using a gem5-based multi-core simulation platform and using ARM processors with HTM instruction extensions.
As benchmarks, i) simple data structures, ii) sensor (peripheral device) I/O and finally iii) transactional database applications shall be investigated and evaluated.
Neue Informationstechnologien erlauben immer auch neue Möglichkeiten der Begehung von Straftaten, die häufig mit dem Begriff „Cyberkriminalität“ belegt werden. Im Hinblick auf die Abhängigkeit hochentwickelter Gesellschaften von (kritischen) IT-Infrastrukturen bedroht diese Kriminalität heute die Stabilität unseres Wirtschafts- und Gesellschaftssystems. Die neuen Informationstechnologien eröffnen jedoch auch neue Möglichkeiten der Strafverfolgung, wie etwa automatisierte Datensammlung und –auswertung im Netz oder heimlich in IT-Systeme eingeschleuste Überwachungsprogramme (Trojaner).
Die Effektivität dieser neuen Methoden der so genannten „forensischen Informatik“ provoziert regelmäßig die Frage nach den Auswirkungen auf die Grundrechte der Betroffenen. Die Begrenzung des Rechtsraums auf Nationalstaaten schafft zusätzliche Probleme. In diesem Vorhaben haben sich etablierte Wissenschaftler aus der Informatik und den Rechtswissenschaften zusammengeschlossen, um das noch recht unscharfe Forschungsfeld Cyberkriminalität sowie Strafbarkeit und Strafverfolgung von Cyberkriminalität systematisch zu erschließen, grundlegende Zusammenhänge aufzudecken und das Gebiet insgesamt einer besseren Handhabe zugänglich zu machen.
Die Forschung im Graduiertenkolleg hat darum hier das Potential, die technisch-methodischen Standards des Umgangs mit digitalen Spuren, deren Nutzen für die Strafverfolgung sowie die nationale wie internationale Rechtsinterpretation und -gestaltung auf viele Jahre hinaus zu prägen. Gleichzeitig wirken wir in diesem Bereich dem Mangel an wissenschaftlich-methodisch geschultem Fachpersonal in Wirtschaft, Verwaltung und bei den Strafverfolgungsbehörden entgegen.
Cybercrime and Forensic Computing -- Hardware Security
(Third Party Funds Group – Sub project)
Overall project: Research Training Group 2475: Cybercrime and Forensic Computing Term: 1. October 2019 - 1. October 2028 Funding source: DFG / Graduiertenkolleg (GRK) URL: https://www.cybercrime.fau.de
This project is funded by the German Research Foundation (DFG) within the Research Training Group 2475 "Cybercrime and Forensic Computing". Cybercrime is becoming an ever greater threat in view of the growing societal importance of information technology. At the same time, new opportunities are emerging for law enforcement, such as automated data collection and analysis on the Internet or via surveillance programs. But how do you deal with the fundamental rights of those affected when "forensic informatics" is used? The RTG "Cybercrime and Forensic Informatics" brings together experts in computer science and law to investigate the research field of "prosecution of cybercrime" in a systematic way. At the Chair of Computer Science 12, aspects of hardware security are investigated. The focus is on researching techniques to extract information and traces from technical devices via side channels. The physical implementation of a system emits further, so-called side-channel information to the environment in addition to the actual processing of input data to output data. Known side channels are, for example, the data-dependent time behavior of an algorithm implementation, as well as power consumption, electromagnetic radiation and temperature development.
Spieck, J., Walter, D., Waschkeit, J., & Teich, J. (2025). Co-Design of Systems-on-Chip for Sustainability. In Proceedings of the NG-RES 2025: Sixth Workshop on Next Generation Real-Time Embedded Systems. Barcelona, Spain.
Esper, K., & Teich, J. (2024). History-Based Run-Time Requirement Enforcement of Non-Functional Properties on MPSoCs. In Patrick Meumeu Yomsi, Stefan Wildermann (Eds.), Fifth Workshop on Next Generation Real-Time Embedded Systems (NG-RES 2024) (pp. 4:1-4:11). Munich, DE: Saarbrücken/Wadern: Schloss Dagstuhl – Leibniz-Zentrum für Informatik.
Groth, S., Schmid, M., Teich, J., & Hannig, F. (2024). Estimating the Execution Time of CNN Inference on GPUs. In Proceedings of the 27th Workshop on Methods and Description Languages for Modelling and Verification of Circuits and Systems (MBMV) (pp. 53-62). Kaiserslautern, DE.
Heidorn, C., Hannig, F., Riedelbauch, D., Strohmeyer, C., & Teich, J. (2024). OpTC – A Toolchain for Deployment of Neural Networks on AURIX TC3xx Microcontrollers.
Heidorn, C., Hannig, F., Riedelbauch, D., Strohmeyer, C., & Teich, J. (2024). OpTC – A Toolchain for Deployment of Neural Networks on AURIX TC3xx Microcontrollers. In André Casal Kulzer, Hans-Christian Reuss, Andreas Wagner (Eds.), Proceeding of the 2024 Stuttgart International Symposium on Automotive and Engine Technology (pp. pp 65–81). Stuttgart, DE: Wiesbaden: Springer Vieweg.
Karim, A., Falk, J., Schmidt, D., & Teich, J. (2024). Self-Powering Dataflow Networks – Concepts and Implementation. In Proceedings of the 22nd ACM-IEEE International Symposium on Formal Methods and Models for System Design (MEMOCODE) (pp. 69-74). Raleigh, NC, US.
Plagwitz, P., Hannig, F., Teich, J., & Keszöcze, O. (2024). Compiler-based Processor Network Generation for Neural Networks on FPGAs. In Proceedings of the 27th Workshop on Methods and Description Languages for Modelling and Verification of Circuits and Systems (MBMV) (pp. 41-52). Kaiserslautern, DE.
Plagwitz, P., Hannig, F., Teich, J., & Keszöcze, O. (2024). DSL-based SNN Accelerator Design using Chisel. In 2024 27th Euromicro Conference on Digital System Design (DSD). Paris, FR.
Plagwitz, P., Hannig, F., Teich, J., & Keszöcze, O. (2024). SNN vs. CNN Implementations on FPGAs: An Empirical Evaluation. In Proceedings of the 20th International Symposium on Applied Reconfigurable Computing. Architectures, Tools, and Applications (ARC). Aveiro, PT: Springer.
Walter, D., Brand, M., Heidorn, C., Witterauf, M., Hannig, F., & Teich, J. (2024). ALPACA: An Accelerator Chip for Nested Loop Programs. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS). Singapore, SG.
Sabih, M., Yayla, M., Hannig, F., Teich, J., & Chen, J.-J. (2023). Robust and Tiny Binary Neural Networks using Gradient-based Explainability Methods. In Eiko Yoneki, Luigi Nardi (Eds.), EuroMLSys '23: Proceedings of the 3rd Workshop on Machine Learning and System (pp. 87–93). Rome, Italy, IT: New York(NY) United States: Association for Computing Machinery (ACM).
Sixdenier, P.-L., Wildermann, S., Ottens, M., & Teich, J. (2023). Seque: Lean and Energy-aware Data Management for IoT Gateways. In Proceedings of the IEEE International Conference on Edge Computing and Communications (EDGE). Chicago, Illinois USA, US: IEEE.
Hahn, T., Becher, A., Wildermann, S., & Teich, J. (2022). Raw Filtering of JSON data on FPGAs. In Proceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe. Antwerpen, BE.
Hahn, T., Wildermann, S., & Teich, J. (2022). Auto-Tuning of Raw Filters for FPGAs. In IEEE Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications. Belfast, United Kingdom.
Heidorn, C., Meyerhöfer, N., Schinabeck, C., Hannig, F., & Teich, J. (2022). Hardware-Aware Evolutionary Filter Pruning. In Springer, Cham (Eds.), Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS XXII) (pp. 283 - 299). Pythagoreio, Samos, GR: Switzerland: Springer Nature.
Mishra, A., Hannig, F., Teich, J., & Sabih, M. (2022). MOSP: Multi-Objective Sensitivity Pruning of Deep Neural Networks. In IEEE (Eds.), 2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC) (pp. 1-8). Virtual: Pittsburgh, PA, USA: Institute of Electrical and Electronics Engineers (IEEE).
Snelting, G., Teich, J., Fried, A., Hannig, F., & Witterauf, M. (2022). Compilation and Code Generation for Invasive Programs. In Jürgen Teich, Jörg Henkel, Andreas Herkersdorf (Eds.), Invasive Computing. (pp. 309-333). FAU University Press.
Teich, J., Brand, M., Hannig, F., Heidorn, C., Walter, D., & Witterauf, M. (2022). Invasive Tightly-Coupled Processor Arrays. In Jürgen Teich, Jörg Henkel, Andreas Herkersdorf (Eds.), Invasive Computing. (pp. 177-202). FAU University Press.
Teich, J., Esper, K., Falk, J., Pourmohseni, B., Schwarzer, T., & Wildermann, S. (2022). Basics of Invasive Computing. In Jürgen Teich, Jörg Henkel, Andreas Herkersdorf (Eds.), Invasive Computing. (pp. 69-95). FAU University Press.
Teich, J., Henkel, J., & Herkersdorf, A. (2022). Introduction to Invasive Computing. In Jürgen Teich, Jörg Henkel, Andreas Herkersdorf (Eds.), Invasive Computing. (pp. 1-66). FAU University Press.
Trautmann, J., Patsiatzis, N., Becher, A., Teich, J., & Wildermann, S. (2022). Real-Time Waveform Matching with a Digitizer at 10 GS/s. In IEEE Proceedings of the 32nd International Conference on Field Programmable Logic and Applications. Belfast, United Kingdom.
Alhaddad, S., Förstner, J., Groth, S., Grünewald, D., Grynko, Y., Hannig, F.,... Wende, F. (2021). HighPerMeshes -- A Domain-Specific Language for Numerical Algorithms on Unstructured Grids. In Proceedings of the 18th International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar) in Euro-Par 2020: Parallel Processing Workshops. Warsaw, PL: Springer.
Keszöcze, O., Brand, M., Witterauf, M., Heidorn, C., & Teich, J. (2021). Aarith: An Arbitrary Precision Number Library. In Proceedings of the ACM/SIGAPP Symposium On Applied Computing. virtual conference, KR.
Sabih, M., Hannig, F., & Teich, J. (2021). Fault-Tolerant Low-Precision DNNs using Explainable AI. In 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). Virtual Workshop: IEEE Xplore.
Streit, F.-J., Krüger, P., Becher, A., Schlumberger, J., Wildermann, S., & Teich, J. (2021). CHOICE – A Tunable PUF-Design for FPGAs. In IEEE Proceedings of the 31th International Conference on Field Programmable Logic and Applications. Dresden, Germany.
Streit, F.-J., Krüger, P., Becher, A., Wildermann, S., & Teich, J. (2021, December). Design and Evaluation of a Tunable PUF Architecture for FPGAs. Paper presentation at International Conference on Field-Programmable Technology (FPT), Auckland, New Zealand, NZ.
Streit, F.-J., Wildermann, S., Pschyklenk, M., & Teich, J. (2021). Providing Tamper-Secure SoC Updates through Reconfigurable Hardware. In Springer Proceedings of the 17th International Symposium on Applied Reconfigurable Computing. Rennes, France, FR: Springer Computer Science Proceedings.
Lengauer, C., Apel, S., Bolten, M., Chiba, S., Rüde, U., Teich, J.,... Schmitt, J. (2020). ExaStencils: Advanced multigrid solver generation. In Hans-Joachim Bungartz, Severin Reiz, Benjamin Uekermann, Philipp Neumann, Wolfgang E. Nagel (Eds.), Lecture notes in computational science and engineering. (pp. 405-452). Cham: Springer.
Lengauer, C., Apel, S., Bolten, M., Chiba, S., Rüde, U., Teich, J.,... Schmitt, J. (2020). ExaStencils – Advanced Multigrid Solver Generation. In Hans-Joachim Bungartz, Severin Reiz, Philipp Neumann, Benjamin Uekermann, Wolfgang Nagel (Eds.), Software for Exascale Computing – SPPEXA 2016-2019. (pp. 405-452). Springer.
Qiao, B., Reiche, O., Özkan, M.A., Teich, J., & Hannig, F. (2020). Efficient Parallel Reduction on GPUs with Hipacc. In Proceedings of the 23rd International Workshop on Software and Compilers for Embedded Systems (SCOPES) (pp. 58-61). Sankt Goar, DE.
Streit, F.-J., Fritz, F., Becher, A., Wildermann, S., Werner, S., Schmidt-Korth, M.,... Teich, J. (2020). Secure Boot from Non-Volatile Memory for Programmable SoC-Architectures. In IEEE Proceedings of the 13th International Symposium on Hardware Oriented Security and Trust. San José, USA, US.
Özkan, M.A., Pérard-Gayot, A., Membarth, R., Slusallek, P., Leißa, R., Hack, S.,... Hannig, F. (2020). AnyHLS: High-Level Synthesis with Partial Evaluation. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). Hamburg, DE.
Our research centers around the systematic design (CAD) of hardware/software systems, ranging from embedded systems to HPC platforms. One principal research direction is domain-specific computing that tries to tackle the very complex programming and design challenge of parallel heterogeneous computer architectures. Domain-specific computing drastically separates the concerns of algorithm development and target architecture implementation, including parallelization and low-level implementation details. The key idea is to take advantage of the knowledge being inherent in a particular problem area or field of application, i.e., a particular domain, in a well-directed manner and thus, to master the complexity of heterogeneous systems. Such domain knowledge can be captured by reasonable abstractions, augmentations, and notations, e.g., libraries, Domain-specific programming languages (DSLs), or combinations of both (e.g., embedded DSLs implemented via template metaprogramming). On this basis, patterns can be utilized to transform and optimize the input description in a goal-oriented way during compilation, and, finally, to generate code for a specific target architecture. Thus, DSLs provide high productivity plus typically also high performance. We develop DSLs and target platform languages to capture both domain and architecture knowledge, which is utilized during the different phases of compilation, parallelization, mapping, as well as code generation for a wide variety of architectures, e.g., multi-core processors, GPUs, MPSoCs, FPGAs. All these steps usually go along with optimizing and exploring the vast space of design options and trading off multiple objectives, such as performance, cost, energy, or reliability.
Research projects
Neural Approximate Accelerator Architecture Optimization for DNN Inference on Lightweight FPGAs
(Third Party Funds Single)
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
Embedded Machine Learning (ML) constitutes an admittedly fast-growing field that comprises ML algorithms, hardware, and software capable of performing on-device sensor data analyses at extremely low power, enabling thus several always-on and battery-powered applications and services. Running ML-based applications on embedded edge devices witnesses a phenomenal research and business interest for many reasons, including accessibility, privacy, latency, cost, and security. Embedded ML is primarily represented by artificial intelligence (AI) at the edge (EdgeAI) and on tiny, ultra resource constrained devices, a.k.a. TinyML. TinyML poses requirements for energy efficiency but also low latency as well as to retain accuracy in acceptable levels mandating, thus, optimization of the software and hardware stack.
GPUs form the default platform for DNN training workloads, due to their high parallelism computing originating by the massive number of processing cores. Though, GPU is often not an optimal solution for DNN inference acceleration due to the high energy-cost and the lack of reconfigurability, especially for high sparsity models or customized architectures. On the other hand, Field Programmable Gate Arrays (FPGAs) have a unique privilege of potentially lower latency and higher efficiency than GPUs while offering high customization and faster time-to-market combined with potentially longer useful life than ASIC solutions.
In the context of TinyML, NA³Os focuses on a neural approximate accelerator-architecture co-search targeting specifically lightweight FPGA devices. This project investigates design techniques to optimally and automatically map DNNs to resource- constrained FPGAs while exploiting principles of approximate computing. Our particular topics of investigation include:
Open-Source Design Tools for the Co-Development of AI Algorithms and AI Chips
(Third Party Funds Single)
Funding source: Bundesministerium für Bildung und Forschung (BMBF)
URL: https://www.elektronikforschung.de/projekte/di-edai
Motivation
Chip design is the essential step when developing microelectronics for specific products and applications. Competence in chip design can strengthen Germany's innovation and competitiveness and increase its technological sovereignty in Europe. In order to leverage this potential, the German and European chip design ecosystem is to be expanded. To this end, the BMBF has launched the Microelectronics Design Initiative with four key areas of focus: a strong network as a central exchange platform, training and further education for talented individuals and specialists, research projects to strengthen design capabilities, and expanding research structures.
Project Goals
The aim of the project is to develop modern AI chips that are designed with a particular focus on security, trustworthiness, and energy efficiency in various application scenarios. Another goal is to implement a seamless transition from software-based AI algorithm development to efficient hardware implementation. The focus here is on the close linking of AI and hardware in the design process as well as the development of various AI accelerators and corresponding architectures. The end result should be an automated design methodology that extends from the AI software to the AI hardware.
The focus of our chair within DI-EDAI is, in particular, on the development of a co-exploration approach that optimizes both neural network models and associated AI-specific microprocessor extensions, taking into account non-functional requirements (e.g., cost, speed, accuracy, energy, security). The results, in the form of hardware blocks and EDA software, shall be published as open source and contribute to creating an ecosystem for designing sustainable and transparent AI systems.
Automatic Cross-Layer Synthesis of High Performance, (Ultra-)Low Power Hardware Implementations from Data Flow Specifications by Integration of Emerging FeFET Technology
(Third Party Funds Single)
Funding source: Deutsche Forschungsgemeinschaft (DFG)
URL: https://www.cs12.tf.fau.de/forschung/projekte/hiloda-nets/
High throughput data and signal processing applications can be specified preferably by dataflow networks, as these naturally allow the exploitation of parallelism as well globally (at the level of a network of communicating actors) as locally at the actor level, e.g., by implementing each actor as a hardware circuit. Today, there exist a few system-level design approaches to aid an algorithm designer in compiling a dataflow network to a set of processors or, alternatively, to synthesize the network directly in hardware for achieving high processing speeds. But embedded systems, particularly in the context of IoT applications, have additional requirements: Safe operation, even in an environment of intermittent power shortages, and in general (ultra-)low power requirements. Altogether, these requirements seem to be contradictory.
Our proposed project named HiLoDa (High performance, (ultra-Low) power Dataflow) Nets attacks this obvious discrepancy and conflict in requirements by a) introducing, exploiting, and integrating for the first time emerging FeFET technology for the design of actor networks, i.e., by investigating and designing persistable FIFO-based memory units. b) In particular, circuit devices being able to operate in mixed volatile/non-volatile mode of operation shall be modeled, characterized, and designed. c) By combining the system-level concept of dataflow, which is based on self-scheduled activations of computations with emerging CMOS-compatible FeFET technology, inactive actors or even subnets shall inherit the capability of self-powering (down and wakeup). In addition, for a continuously safe mode of operation, a down-powering must also be triggered upon any intermittent shortage of power supply. Analogously, actors shall perform an auto-wakeup after recovery from a power shortage but also subject to fireability.
HiLoDa Nets will be able to combine high clock-speed data processing of each synthesized actor circuit in power-on mode and automatic state retention using FeFET technology in power-off mode, self-triggered during time intervals of either data unavailability or power shortage. d) A fully automatic cross-layer synthesis from system-level dataflow specification to optimized circuit implementation involving FeFET devices shall be developed. This includes e) the DSE (design space exploration) of actor clusterings at the system level to explore individual power domains for the optimization of throughput, circuit cost, energy savings, and endurance. Finally, f) HiLoDa Nets shall be compared to conventional CMOS technology implementations with respect to energy consumption for applications such as spiking neural networks. Likewise, shutdown (backup) and recovery latencies from power shortages shall be evaluated and optimized.
Optimization and Toolchain for Embedding AI
(Third Party Funds Single)
Funding source: Industrie
Artificial Intelligence (AI) methods have quickly progressed from research to productive applications in recent years. Typical AI models (e.g., deep neural networks) yield high memory demands and computational efforts for training and when making predictions during operation. This is opposed to the typically limited resources of embedded controllers used in automotive or industrial applications. To comply with these limitations, AI models must be streamlined on different levels to be applicable to a given specific embedded target hardware, e.g., by architecture and feature selection, pruning, and other compression techniques. Currently, model adaptation to fit the target hardware is achieved by iterative, manual changes in a “trial-and-error” manner: the model is designed, trained, and compiled to the target hardware while applying different optimization techniques. The model is then checked for compliance with the hardware constraints, and the cycle is repeated if necessary. This approach is time-consuming and error-prone.
Therefore, this project, funded by the Schaeffler Hub for Advanced Research at Friedrich-Alexander-Universität Erlangen-Nürnberg (SHARE at FAU), seeks to establish guidelines for hardware selection and a systematic toolchain for optimizing and embedding AI in order to reduce the current efforts of porting machine learning models to automotive and industrial devices.
HYPNOS – Co-Design of Persistent, Energy-efficient and High-speed Embedded Processor Systems with Hybrid Volatility Memory Organisation
(Third Party Funds Group – Sub project)
Term: 21. September 2022 - 21. September 2025
Funding source: DFG / Schwerpunktprogramm (SPP)
URL: https://spp2377.uos.de/
This project is funded by the German Research Foundation (DFG) within the Priority Program SPP 2377 "Scalable Data Management for Future Hardware".
HYPNOS explores how emerging non-volatile memory (NVM) technologies could beneficially replace not only main memory in modern embedded processor architectures, but potentially also one or multiple levels of the cache hierarchy or even the registers and how to optimize such a hybrid-volatile memory hierarchy for offering high speed and low energy tradeoffs for a multitude of application programs while providing persistence of data structures and processing state in a simple and efficient way.
On the one hand, completely non-volatile (memory) processors (NVPs) that have emerged for IoT devices are known to suffer from low write times of current NVM technologies as well as by orders of magnitude lower endurance than, e.g., SRAM, thus prohibiting an operation at GHz speeds. On the other hand, existing NVM main memory computer solutions suffer from the need of the programmer to explicitly persist data structures through the cache hierarchy.
HYPNOS (Named after the Greek god of sleep.) systematically attacks this intertwined performance/endurance/programmability gap by taking a hardware/software co-design approach:
Our investigations include techniques for
a) design space exploration of hybrid NVM memory processor architectures} wrt. speed and energy consumption including hybrid (mixed volatile) register and cache-level designs,
b) offering instruction-level persistence for (non-transactional) programs in case of, e.g., instantaneous power failures through low-cost and low-latency control unit (hardware) design of checkpointing and recovery functions, and additionally providing
c) application-programmer (software) persistence control on a multi-core HyPNOS system for user-defined checkpointing and recovery from these and other errors or access conflicts backed by size-limited hardware transactional memory (HTM).
d) The explored processor architecture designs and different types of NVM technologies will be systematically evaluated for achievable speed and energy gains, and for testing co-designed backup and recovery mechanisms, e.g., wakeup latencies, etc., using a gem5-based multi-core simulation platform and using ARM processors with HTM instruction extensions.
As benchmarks, i) simple data structures, ii) sensor (peripheral device) I/O and finally iii) transactional database applications shall be investigated and evaluated.
GRK2475: Cyberkriminalität und Forensische Informatik
(Third Party Funds Group – Overall project)
Funding source: DFG / Graduiertenkolleg (GRK)
URL: https://www.cybercrime.fau.de/
Neue Informationstechnologien erlauben immer auch neue Möglichkeiten der Begehung von Straftaten, die häufig mit dem Begriff „Cyberkriminalität“ belegt werden. Im Hinblick auf die Abhängigkeit hochentwickelter Gesellschaften von (kritischen) IT-Infrastrukturen bedroht diese Kriminalität heute die Stabilität unseres Wirtschafts- und Gesellschaftssystems. Die neuen Informationstechnologien eröffnen jedoch auch neue Möglichkeiten der Strafverfolgung, wie etwa automatisierte Datensammlung und –auswertung im Netz oder heimlich in IT-Systeme eingeschleuste Überwachungsprogramme (Trojaner).
Die Effektivität dieser neuen Methoden der so genannten „forensischen Informatik“ provoziert regelmäßig die Frage nach den Auswirkungen auf die Grundrechte der Betroffenen. Die Begrenzung des Rechtsraums auf Nationalstaaten schafft zusätzliche Probleme. In diesem Vorhaben haben sich etablierte Wissenschaftler aus der Informatik und den Rechtswissenschaften zusammengeschlossen, um das noch recht unscharfe Forschungsfeld Cyberkriminalität sowie Strafbarkeit und Strafverfolgung von Cyberkriminalität systematisch zu erschließen, grundlegende Zusammenhänge aufzudecken und das Gebiet insgesamt einer besseren Handhabe zugänglich zu machen.
Die Forschung im Graduiertenkolleg hat darum hier das Potential, die technisch-methodischen Standards des Umgangs mit digitalen Spuren, deren Nutzen für die Strafverfolgung sowie die nationale wie internationale Rechtsinterpretation und -gestaltung auf viele Jahre hinaus zu prägen. Gleichzeitig wirken wir in diesem Bereich dem Mangel an wissenschaftlich-methodisch geschultem Fachpersonal in Wirtschaft, Verwaltung und bei den Strafverfolgungsbehörden entgegen.
Cybercrime and Forensic Computing -- Hardware Security
(Third Party Funds Group – Sub project)
Term: 1. October 2019 - 1. October 2028
Funding source: DFG / Graduiertenkolleg (GRK)
URL: https://www.cybercrime.fau.de
Cybercrime is becoming an ever greater threat in view of the growing societal importance of information technology. At the same time, new opportunities are emerging for law enforcement, such as automated data collection and analysis on the Internet or via surveillance programs. But how do you deal with the fundamental rights of those affected when "forensic informatics" is used? The RTG "Cybercrime and Forensic Informatics" brings together experts in computer science and law to investigate the research field of "prosecution of cybercrime" in a systematic way.
At the Chair of Computer Science 12, aspects of hardware security are investigated. The focus is on researching techniques to extract information and traces from technical devices via side channels. The physical implementation of a system emits further, so-called side-channel information to the environment in addition to the actual processing of input data to output data. Known side channels are, for example, the data-dependent time behavior of an algorithm implementation, as well as power consumption, electromagnetic radiation and temperature development.
2025
2024
2023
2022
2021
2020
Related Research Fields
Contact: