Artwork

Contenuto fornito da Latitude Media. Tutti i contenuti dei podcast, inclusi episodi, grafica e descrizioni dei podcast, vengono caricati e forniti direttamente da Latitude Media o dal partner della piattaforma podcast. Se ritieni che qualcuno stia utilizzando la tua opera protetta da copyright senza la tua autorizzazione, puoi seguire la procedura descritta qui https://it.player.fm/legal.
Player FM - App Podcast
Vai offline con l'app Player FM !

Will inference move to the edge?

47:47
 
Condividi
 

Manage episode 524876397 series 3001880
Contenuto fornito da Latitude Media. Tutti i contenuti dei podcast, inclusi episodi, grafica e descrizioni dei podcast, vengono caricati e forniti direttamente da Latitude Media o dal partner della piattaforma podcast. Se ritieni che qualcuno stia utilizzando la tua opera protetta da copyright senza la tua autorizzazione, puoi seguire la procedura descritta qui https://it.player.fm/legal.

Today virtually all AI compute takes place in centralized data centers, driving the demand for massive power infrastructure.

But as workloads shift from training to inference, and AI applications become more latency-sensitive (autonomous vehicles, anyone?), there‘s another pathway: migrating a portion of inference from centralized computing to the edge. Instead of a gigawatt-scale data center in a remote location, we might see a fleet of smaller data centers clustered around an urban core. Some inference might even shift to our devices.

So how likely is a shift like this, and what would need to happen for it to substantially reshape AI power?

In this episode, Shayle talks to Dr. Ben Lee, a professor of electrical engineering and computer science at the University of Pennsylvania, as well as a visiting researcher at Google. Shayle and Ben cover topics like:

  • The three main categories of compute: hyperscale, edge, and on-device

  • Why training is unlikely to move from hyperscale

  • The low latency demands of new applications like autonomous vehicles

  • How generative AI is training us to tolerate longer latencies

  • Why distributed inference doesn‘t face the same technical challenges as distributed training

  • Why consumer devices may limit model capability

Resources:

Credits: Hosted by Shayle Kann. Produced and edited by Daniel Woldorff. Original music and engineering by Sean Marquand. Stephen Lacey is our executive editor.

Catalyst is brought to you by EnergyHub. EnergyHub helps utilities build next-generation virtual power plants that unlock reliable flexibility at every level of the grid. See how EnergyHub helps unlock the power of flexibility at scale, and deliver more value through cross-DER dispatch with their leading Edge DERMS platform, by visiting energyhub.com.

Catalyst is brought to you by Bloom Energy. AI data centers can’t wait years for grid power—and with Bloom Energy’s fuel cells, they don’t have to. Bloom Energy delivers affordable, always-on, ultra-reliable onsite power, built for chipmakers, hyperscalers, and data center leaders looking to power their operations at AI speed. Learn more by visiting⁠ ⁠⁠BloomEnergy.com⁠.

  continue reading

230 episodi

Artwork

Will inference move to the edge?

Catalyst with Shayle Kann

8,579 subscribers

published

iconCondividi
 
Manage episode 524876397 series 3001880
Contenuto fornito da Latitude Media. Tutti i contenuti dei podcast, inclusi episodi, grafica e descrizioni dei podcast, vengono caricati e forniti direttamente da Latitude Media o dal partner della piattaforma podcast. Se ritieni che qualcuno stia utilizzando la tua opera protetta da copyright senza la tua autorizzazione, puoi seguire la procedura descritta qui https://it.player.fm/legal.

Today virtually all AI compute takes place in centralized data centers, driving the demand for massive power infrastructure.

But as workloads shift from training to inference, and AI applications become more latency-sensitive (autonomous vehicles, anyone?), there‘s another pathway: migrating a portion of inference from centralized computing to the edge. Instead of a gigawatt-scale data center in a remote location, we might see a fleet of smaller data centers clustered around an urban core. Some inference might even shift to our devices.

So how likely is a shift like this, and what would need to happen for it to substantially reshape AI power?

In this episode, Shayle talks to Dr. Ben Lee, a professor of electrical engineering and computer science at the University of Pennsylvania, as well as a visiting researcher at Google. Shayle and Ben cover topics like:

  • The three main categories of compute: hyperscale, edge, and on-device

  • Why training is unlikely to move from hyperscale

  • The low latency demands of new applications like autonomous vehicles

  • How generative AI is training us to tolerate longer latencies

  • Why distributed inference doesn‘t face the same technical challenges as distributed training

  • Why consumer devices may limit model capability

Resources:

Credits: Hosted by Shayle Kann. Produced and edited by Daniel Woldorff. Original music and engineering by Sean Marquand. Stephen Lacey is our executive editor.

Catalyst is brought to you by EnergyHub. EnergyHub helps utilities build next-generation virtual power plants that unlock reliable flexibility at every level of the grid. See how EnergyHub helps unlock the power of flexibility at scale, and deliver more value through cross-DER dispatch with their leading Edge DERMS platform, by visiting energyhub.com.

Catalyst is brought to you by Bloom Energy. AI data centers can’t wait years for grid power—and with Bloom Energy’s fuel cells, they don’t have to. Bloom Energy delivers affordable, always-on, ultra-reliable onsite power, built for chipmakers, hyperscalers, and data center leaders looking to power their operations at AI speed. Learn more by visiting⁠ ⁠⁠BloomEnergy.com⁠.

  continue reading

230 episodi

كل الحلقات

×
 
Loading …

Benvenuto su Player FM!

Player FM ricerca sul web podcast di alta qualità che tu possa goderti adesso. È la migliore app di podcast e funziona su Android, iPhone e web. Registrati per sincronizzare le iscrizioni su tutti i tuoi dispositivi.

 

Guida rapida

Ascolta questo spettacolo mentre esplori
Riproduci