[AI Hardware & Systems Design Track]: Cloud Resiliency in the Age of High-Performance Computing | Kisaco Research

As the era of high-performance computing (HPC) and artificial intelligence (AI) ushers in unprecedented advancements, the reliance on cloud strategies becomes vital. As cloud infrastructure becomes increasingly integral to supporting demanding computational workloads, maintaining the availability and robustness of these systems becomes paramount.

This panel will delve into the critical intersection of HPC/AI and cloud technology, spotlighting strategies for ensuring uninterrupted operations in the face of emerging challenges. The session brings together leading experts to examine architectural design paradigms that foster robustness, redundancy trade-offs, load balancing, and intelligent fault detection and predictive monitoring mechanisms. Experts will share insights on best practices for optimizing resource allocation, orchestrating seamless workload migrations, and deploying resilient cloud-native solutions. By exploring real-world cases, emerging trends, and practical insights, this discussion aims to equip data center and cloud professionals with insights to elevate their resiliency strategies amidst evolving computational demands.

Sponsor(s): 
proteanTecs
Speaker(s): 
Moderator

Author:

Alam Akbar

Director, Product Marketing
proteanTecs

Alam Akbar is a veteran of the semiconductor industry with experience spanning multiple engineering, product management, and product marketing roles. He holds a Bachelors of Science degree in Electrical Engineering from Texas A&M,  and an MBA from Santa Clara University.

 

Alam began his career at Synopsys as an Application Consultant where he helped grow their market share in the signoff domain. He then joined the business management team at Cadence where he helped launch a new physical verification solution. After Cadence, Alam joined  Intel Foundry services as a design kit program manager, and then moved into the client compute group as director of product marketing. There, he helped scale Intel's storage business, and developed product strategy for new memory solutions for the PC market.

At ProteanTecs, he's part of a team that’s bringing greater insight into the health and performance of semiconductors across the value chain, from the design stage to in field operation, and all the steps in the middle. 

Alam Akbar

Director, Product Marketing
proteanTecs

Alam Akbar is a veteran of the semiconductor industry with experience spanning multiple engineering, product management, and product marketing roles. He holds a Bachelors of Science degree in Electrical Engineering from Texas A&M,  and an MBA from Santa Clara University.

 

Alam began his career at Synopsys as an Application Consultant where he helped grow their market share in the signoff domain. He then joined the business management team at Cadence where he helped launch a new physical verification solution. After Cadence, Alam joined  Intel Foundry services as a design kit program manager, and then moved into the client compute group as director of product marketing. There, he helped scale Intel's storage business, and developed product strategy for new memory solutions for the PC market.

At ProteanTecs, he's part of a team that’s bringing greater insight into the health and performance of semiconductors across the value chain, from the design stage to in field operation, and all the steps in the middle. 

Panellists

Author:

Venkat Ramesh

Hardware Systems Engineer
Meta

Venkat Ramesh is a Hardware Systems Engineer in Meta's Infrastructure Org. 

 

As a Technical Lead in the Release-to-Production team, Venkat has been at the helm of pivotal initiatives aimed at bringing various AI/ML Accelerator, Compute and Storage platforms into the Meta fleet. His multifaceted technical background spans roles across software development, performance engineering, NPI and hardware health telemetry across hyper-scalers and hardware providers.

 

Deeply passionate about the topic of AI hardware resiliency, Venkat's current focus is on building tools and methodologies to enhance hardware reliability, performance and efficiencies for the rapidly evolving AI workloads and technologies.

Venkat Ramesh

Hardware Systems Engineer
Meta

Venkat Ramesh is a Hardware Systems Engineer in Meta's Infrastructure Org. 

 

As a Technical Lead in the Release-to-Production team, Venkat has been at the helm of pivotal initiatives aimed at bringing various AI/ML Accelerator, Compute and Storage platforms into the Meta fleet. His multifaceted technical background spans roles across software development, performance engineering, NPI and hardware health telemetry across hyper-scalers and hardware providers.

 

Deeply passionate about the topic of AI hardware resiliency, Venkat's current focus is on building tools and methodologies to enhance hardware reliability, performance and efficiencies for the rapidly evolving AI workloads and technologies.

Author:

Yun Jin

Engineering Director
Meta

Yun Jin currently works as Engineering Director of Infrastructure in Meta Inc where he leads the Meta's strategy of private cloud capacity and efficiency. Before Meta, Yun has been engineering leadership roles for PPLive, Alibaba Cloud, and Microsoft. Yun has worked on large scale distributed systems, cloud and big data area for 20 years.

Yun Jin

Engineering Director
Meta

Yun Jin currently works as Engineering Director of Infrastructure in Meta Inc where he leads the Meta's strategy of private cloud capacity and efficiency. Before Meta, Yun has been engineering leadership roles for PPLive, Alibaba Cloud, and Microsoft. Yun has worked on large scale distributed systems, cloud and big data area for 20 years.

Author:

Paolo Faraboschi

Vice President and HPE Fellow; Director, AI Research Lab, HP Labs
HPE

Paolo Faraboschi leads research in the Systems Research Lab at HP Labs. His technical interests lie at the intersection of hardware and software and include low power servers and systems-on-a-chip, workload-optimized, highly-parallel and distributed systems, ILP and VLIW processor architectures, compilers, and embedded systems. Faraboschi’s current research focuses on next-generation data-centric systems. His work on system-level integration for low energy servers and scale-out architectures is a key element of the HP Moonshot System, HP’s new class of software-defined servers built to address the energy efficiency challenges of hyperscale datacenters.

 Previously, Faraboschi led HP Labs research in system-level modeling and simulation, an effort that resulted in the COTSon open-source simulation platform. He is also the founder of HP’s Barcelona Research Office, which pioneered research in contentprocessing systems.. Before that, Faraboschi was technical lead for the Custom-Fit Processors Project at HP Labs, Cambridge (MA), building highly-optimized, softwaredefined CPU cores. In that role, he was the principal architect of the instruction set architecture for the Lx/ST200 family of VLIW embedded processor cores (developed with STMicroelectronics) which have been used for over a decade in a variety of audio, video, and imaging consumer products, including HP's printers and scanners.

 A regular keynote speaker at conferences and industry events, Faraboschi is an IEEE Fellow for "contributions to embedded processor architecture & system-on-chip technology." An active member of the computer architecture community, he also serves regularly on IEEE program and organizational committees, was guest editor of the 2012 edition of IEEE Micro TopPicks, and is co-author (with Josh Fisher and Cliff Young) of the book, “Embedded Computing: a VLIW Approach to Architecture, Compilers and Tools.” A co-holder of 24 granted patents, several other patent applications, and co-author of over 65 scientific publications, Faraboschi received his M.S. and Ph.D. (Dottora)

Paolo Faraboschi

Vice President and HPE Fellow; Director, AI Research Lab, HP Labs
HPE

Paolo Faraboschi leads research in the Systems Research Lab at HP Labs. His technical interests lie at the intersection of hardware and software and include low power servers and systems-on-a-chip, workload-optimized, highly-parallel and distributed systems, ILP and VLIW processor architectures, compilers, and embedded systems. Faraboschi’s current research focuses on next-generation data-centric systems. His work on system-level integration for low energy servers and scale-out architectures is a key element of the HP Moonshot System, HP’s new class of software-defined servers built to address the energy efficiency challenges of hyperscale datacenters.

 Previously, Faraboschi led HP Labs research in system-level modeling and simulation, an effort that resulted in the COTSon open-source simulation platform. He is also the founder of HP’s Barcelona Research Office, which pioneered research in contentprocessing systems.. Before that, Faraboschi was technical lead for the Custom-Fit Processors Project at HP Labs, Cambridge (MA), building highly-optimized, softwaredefined CPU cores. In that role, he was the principal architect of the instruction set architecture for the Lx/ST200 family of VLIW embedded processor cores (developed with STMicroelectronics) which have been used for over a decade in a variety of audio, video, and imaging consumer products, including HP's printers and scanners.

 A regular keynote speaker at conferences and industry events, Faraboschi is an IEEE Fellow for "contributions to embedded processor architecture & system-on-chip technology." An active member of the computer architecture community, he also serves regularly on IEEE program and organizational committees, was guest editor of the 2012 edition of IEEE Micro TopPicks, and is co-author (with Josh Fisher and Cliff Young) of the book, “Embedded Computing: a VLIW Approach to Architecture, Compilers and Tools.” A co-holder of 24 granted patents, several other patent applications, and co-author of over 65 scientific publications, Faraboschi received his M.S. and Ph.D. (Dottora)