New🔥

Microsoft's Vertical AI Strategy: Agents, Models, and Power

here's my interview with Jay Periq the executive vice president of core AI at Microsoft in this interview we talk about AI infrastructure whether there are any dark GPUs whether energy is a limiter of course given his experience in security we talk about AI hacking and AI security in general and why his team is returning to office fulltime enjoy jay thanks for joining me today yeah it's great to be here man thank you so you joined Microsoft about a year ago correct uh you report directly to Satia you're overseeing the newly combined core AI team the combined developer division core infrastructure teams first like break down what those two teams did and now what this new core AI team does under your leadership absolutely

🛡️ [TRUST_LABEL]: Strategy Verified
🚀 [TAKEAWAYS_LABEL]:
  • Microsoft is focused on reinventing tools for building software in the AI era.
  • Security and trust for AI must be integrated from the start due to the complex tasks AI agents perform.
  • Microsoft aims to empower builders by creating a vertically integrated organization focused on AI.

1. The Vision Behind Core AI

[SNIPER CLIP: absolutely so we created the core AI team early this year and January of this year and in May this year at Microsoft Build the conference we rolled out this product strategy this vision right and this was to think about in the new world of AI what is it going to take to help builders help developers help enterprises be successful with this technology right and there's just a lot of different pieces there's all these different models there's different sort of frameworks out there now we have different protocols like MCP and then how do you put all this together in a new stack and I I say stack kind of in in quotes because it's evolving it's not something that we have oh hey here's the blueprint we know how to build it you know it's just cookie cutter thing right now so it's changing kind of every week it's a different way of working it's a different way of engaging with our customers and so what we're focused on at the top layer is reinventing re-imagining all the tools that you need to build software a different way in this AI era that sits on top of a platform that we call Foundry or our agent factory that is where these agents these AI applications are going to be built they're going to be deployed they're going to be the places where you can observe those those entities those agents in the enterprise workplace in that in that space and then for organizations big or small security and trust for AI has to be baked in from the start right because these are things that these agents these AI applications are not as deterministic as software as we've built in the past where we could go through kind of a compliance checklist or a set of kind of security policies to go and make sure that you set up correctly here you've got some bit of thinking planning reasoning calling out to different tools different access to different data so these things are going to do more complex tasks in the enterprise and so security and trustworthy AI has to be there from the start and then for our enterprises we want to give them a flexible deployment strategy so a lot of this stuff today runs in the cloud but as we think about different types of sectors different types of geos in the world some of these agentic applications are going to run in the edge in edge devices right so that programming model of what we're building has to span all four of those areas as you thought about bringing these two teams together what was your vision for having this singular team what was the reason for for really like uh needing to bring these two different parts of the organization together and then how do you have a cohesive culture with two separate teams i know it's all under Microsoft but potentially like aligning that culture uh how did you think about doing that yeah absolutely i think the it's in some ways it's simple because the goal the end goal for us is to serve bu builders serve developers right so everything has to acrue to serving this persona developer but even the notion of a developer today is changing right and that's why I refer to folks builders as as builders and so everything we do in terms of our product the way we we work together the way we measure our progress the way we engage with the end user with the companies that we work with is about really looking through the entire tech stack to like make sure this stuff all adds up right in a way where it's going to be easy for developers to get their arms around this technology to be able to really use this to really ramp up creativity and collaboration and not be stumbling around with a lot of things that don't connect or they're hard or they're insecure or they're not observable or they're just missing parts right that's why this like in some ways is a vertically integrated organization that's end goal is to and our focus is to empower everyone every builder out there to shape the future with AI and these are the components that I think are necessary to put together to accomplish that focus

🧠 [WHY_LABEL]: the goal the end goal for us is to serve bu builders serve developers right so everything has to acrue to serving this persona developer
💡 [TIP_LABEL]: I think being in person enables us to learn faster so that we can really stay along that exponential trajectory that this technology is on right now
📊 [STAT_LABEL]: I would say like 90% of the conversations end up in talking about the cultural transformation like and I mean culture and like the way we work Yeah is changing and everybody is trying to learn from each other in terms of how is this technology enabling us to work differently changing the way we work together and how do we learn from that because I don't think anybody has like the perfect recipe right but we got to stay curious we got to be experimenting all the time and we got to be openly sharing what's working what's not working what do we need to tweak what are we going to do more of if we like this new pattern let's continue on

📷 IMAGE_PROMPT: Jay Periq, EVP of Core AI at Microsoft, discussing AI strategy with Matt.

Jay Periq discussing Microsoft's AI strategy.

2. Optimizing AI Deployment and Model Selection

we need to be progressing as fast as the technology is progressing right and Microsoft the organization covers a lot of different products it covers a lot of different enduser personas whether it be somebody in human resources or in the finance team or in an engineering team or a security op center we serve all of those different personas and we're trying to build our products in a way that bring AI as a superpower to every one of these different departments every one of these end users so that in many ways for example for us is in corei we have this focus on a program called nge thrive inside of nge thrive this program there's three pillars here but a big part of it Matt is what we're trying to do is understand how we're spending our time and what can we do to free up kind of run the business and other time so that it can be reallocated to creative time where we could be making products improving improving our products delivering more value to our customers to our partners but we can use AI to actually be more time efficient right with run the business stuff with administrative work that we're doing and in many ways for example today if you're a product team you come up with a maybe a prototype for something right and then you like go through that you come up with a prototype and then you go get some user feedback on it and then you iterate and then you come up with prototype two and then you keep doing that well now with the power of AI I can write all of this i can specify this in a do like a word document right and then I can tell the AI tell the agents hey build me five prototypes now all at the same time so now I have five prototypes instead of one that I can go and iterate take the best of all of these engage different consumers different end users to get that feedback faster right and so product making I think one we get to explore more of that creative space we can get faster feedback that tighter loop of feedback from customers from end users which then ultimately means that we can help our end customers be more successful with AI if we do that different way of working right and and a big part of like when I talk to customers and partners out there I would say like 90% of the conversations end up in talking about the cultural transformation like and I mean culture and like the way we work Yeah is changing and everybody is trying to learn from each other in terms of how is this technology enabling us to work differently changing the way we work together and how do we learn from that because I don't think anybody has like the perfect recipe right but we got to stay curious we got to be experimenting all the time and we got to be openly sharing what's working what's not working what do we need to tweak what are we going to do more of if we like this new pattern let's continue on changing work uh I'm wondering what you're seeing internally i've heard a lot of people talk about converging roles whether it's product management uh product development developers designers like these roles the the the lines the delineation between them is blurring every day with artificial intelligence what are you seeing internally absolutely we're seeing that too we're seeing different functions i would say one collaborate much more closely because they can use these tools to understand and you know do things that maybe were maybe unreachable for them or unapproachable you know and now you can have somebody who's maybe a lower level systems engineer who may not have really understood how to do UI like a website or something like that being able to be like hey I have this idea for how this new feature could show up to our end users let mock that up right and now you can just use any AI tool and prompt it write something and generate concepts right and generate visuals that you can show other people in the team to get feedback on the same thing happens where we have folks maybe in product management or in design or other functions where they are able to fix bugs just by prompting it assigning to co-pilot and the pull request you know is there and maybe the engineer still has to review it and make sure all of the code it fits our standards right so that I think is super cool that everybody's able to learn more of that continuum of software and being able to create being able to build something being able to deploy it and being able to operate it that's accessible to more and more functions in the company is there a particular role in which you've seen become really superpowered because of AI something where maybe it surprised you i think it's less about the role i think it's more about the individual right so I think it's more about what the individual brings so as I think about this you know I think about sort of like two groups of people maybe three groups of people but you know we have kind of one group of people who generally and I think everybody can selfidentify with this and most people are in tech or probably somewhere in between but there's sort of group one which is you know every time you use a co-pilot or you use an AI powered tool you're absolutely just astonished and amazed and surprised as how like incredible this thing is right you're just like "Oh my gosh this thing did blah." Right and you're like surprised then there's like group two of people and group two people are most of the time frustrated that the AI tool or co-pilot isn't delivering on what they expect of it right and usually group one your ambitions your expectations for AI are too low and you're using it not frequently enough group two is you're using it a lot and you're pushing your understanding and the capabilities of the AI and the thing is that's really interesting there is because people will stay curious they'll try different models they'll try different techniques for context engineering they'll learn things like eval fine-tuning and more about reinforcement learning and so that curve of like exponential advancements that's happening in AI they're more riding that curve of learning whereas the people who are using it in group one probably using it infrequently and are too much amazed with the with the tools versus having some active like frustration with it because you're giving it you're pushing it for very complex tasks and it's not delivering all the time for you yeah i mean it's such an exciting time because it is so new there's not many golden paths to follow so you are kind of forging these new paths these new directions in which to use these tools it is quite exciting for the curious person as you said yeah and we have you know teams doing a whole range of things we have teams that are very very I think advanced in how they're using AI and they are generating lots and lots of like output right like new things they're building they don't look at a single line of code because they've become masters in how they do the context engineering and then how they do the verification and the validation of that right so code is you know for them like a intermediary state and it's not just delegating a a simple task to one agent they're actually orchestrating and organizing a team of agents right so they're able to fire off like different types of agents those agents are all working together on this complex task that they specify upfront and then they create the verification systems and even some of them are pretty cool because the verification agents will like feed back back to the coding agents to like fix its own problems so I want to I want to switch gears for a moment and talk about powering all of these uh amazing innovations um you know when you think about the data centers first of all congratulations on Fairwater it's exciting yeah very cool you kicked that off last week I believe and um or opened it up last week like what are the biggest constraints today i'm hearing uh you know a lot of a lot of folks talk about either like GPU constraints but it seems more and more like power constraints are the true constraints especially in the US is that what you're seeing i think it depends in terms of what people's ambitions and plans are right and I think there is a lot of supply chain scaling up that's happening whether it be power land whether it be kind of the big heavy equipment transformers those types of things that go into running a hypers scale or a big scale data center right and those are all things that the industry is obviously rallying on and you know people are scaling filling up their manufacturing and we're all teaming up to kind of tackle those changes and it depends on kind of the geo in the in the in the world as well right and what might be the constraints here in the US versus what might be true in Europe there are some countries in the world that have put you know moratoriums on not building any more data centers because they rode that boom you know for a really long time for a while and now there's a pause for a while right whereas others are you know open for business and want to invest and build and scale more more rapidly right now right so I think this is not atypical of some like infrastructure um I guess acceleration boom right now that we're seeing but it's exciting because I think a lot of really interesting engineering challenges are being sorted out in terms of how to do these things cheaper better faster there's another aspect of this which is the actual hardware kind of what these AI systems required last year versus this year versus next year is changing right even if you look at say Nvidia's hardware roadmap right you look at the generational differences and what they will do to the entire system from cooling to power to what that network looks like and now even with these more advance advanced agents that we're building and starting to deploy in the enterprises it's interesting because these agents actually make a lot of tool calls they talk to a lot of other systems and that then drives up the amount of normal compute and storage and network not GPU load right so the more these agents become cap more capable in the enterprise right and they're able to tackle higher what I would say higher ROI workflows or tasks or programs then they do need to interact with a lot of these other enterprise systems Those enterprise systems are all the things we've been building for decades right but now you're able to get a lot more throughput and it drives up the utilization of the conventional stuff too so we have a whole kind of system scaling problem here and as much as there is a lot of like money and focus on kind of the GPU and AI data center it is a system that is evolving and and growing and advancing at a very rapid rate beyond just you know the the the chip the GPU or a particular AI data center i think that makes a lot of sense but of course like over the last few weeks in particular there's been a lot of discussion about AI bubble and infrastructure investment and um like uh Gavin Baker made this analogy to the fiber roll out in the mid9s right so as the fiber was getting rolled out for the internet 95% of it was dark but then he said today there are no dark GPUs uh but then you know Satia just a couple weeks ago said well that's true but we actually can't source enough power and we actually have GPUs kind of just sitting idle waiting for that power so I is that it like maybe talk a little bit about what the current state of GPU utilization is within Microsoft yeah for us the focus on making sure all of our workloads right and this doesn't just apply to GPUs it applies to CPUs as well efficiency of any workload across Microsoft is a top priority for us right so one we have focused efforts we have focused teams we have an incredible amount of like science that goes into understanding how the different components how the different services how the technology right it's like the CPU the GPU but you have network you have band memory bandwidth you have all these different components that you are managing you're inspecting and then you're doing optimizations and maybe the way you train a model the way you do inferencing and the neat thing about for for Microsoft that I find is we have a lot of different workloads right we have our first party workloads like M365 or GitHub etc and we have all of these third-party workloads from our customers and so being able to optimize these different be able to learn from these different workloads how they're changing and then to be able to optimize our entire stack the hard the hardware stack and the software stack and the application stack to be able to get the most I guess use out of these GPUs is something that we spend a lot of time on and are measuring adjusting tweaking improving every day yeah

Step/ToolDetails
AI-Powered PrototypingUse AI to generate multiple prototypes simultaneously from a single document, enabling faster iteration and feedback.
Model RouterUtilize Microsoft's Model Router to automatically select the best model for each application based on cost, performance, and quality preferences.

AI Model Selection: Pros & Cons

👍 Pros

  • Choice of diverse models allows customization for specific tasks.
  • Enterprises can integrate their data to enhance model performance.

👎 Cons

  • Requires understanding of different models and their characteristics.
  • Integration of enterprise data can raise security and privacy concerns.

3. Security Concerns with AI Deployment

but like specifically is there enough energy to power all of the GPUs that Microsoft has for now and then as like the next 12 18 24 months happen there's a lot of constraints there in terms of how you do that right because part of it is making sure that you're building this physical plant this capacity in a way that is grounded in the demand curve as well right so keeping those in some rough similar shape together is an important thing that we do right and you know I think there are times where you'll find maybe you have a surplus of something sometimes you have a shortage of something right and that's just an active thing that the teams are always looking for so you know Satia said hey we're short on power you know there's lots of people working to go find more power to light up more GPU use right and then whatever GPUs and infrastructure we have we want to keep getting the most out of it as as we possibly can so that's where we partner we work with lots of other folks our partners here but then put a lot of effort internally to optimizing that stack and and how important do you think model efficiency is going to be a kind of the broader picture of the data center rollout the energy requirements these models seemingly are getting smarter smaller more efficient ho how much how important is that in the total equation as you think about rolling out uh AI to enterprise yeah and you know just on the the last topic I think it's maybe just uh to worth calling out that you you can always talk about how much more power you need or there isn't enough power but often times like you you have for the power we have turned up with GPUs plugged in doing work you want to get the most out of the power you have because that is like the most cost effective and the fastest from a lead time perspective right because we ma we manage a massive GPU fleet right and if we can go find you know a fraction of a percent of an improvement then that unlocks a lot of capability it allow unlocks a lot of supply to feed our co-pilot family of products or to you know serve our customers so that is something where these two things really do balance each other out right because building these data centers takes time but whatever we can do on making our own infrastructure that kind of vertical uh efficiency kind of program making it more and more efficient over time now on the model uh front I think the trajectory is very clear right because people do care about cost they do care about latency and you're absolutely right Matt that there are going to be and I see this in the enterprise there are workloads that require the big slower expensive models and then there is a whole shape of different workloads in the enterprise where they can use a smaller more targeted or more kind of rosp specific specific or jobspecific model and they'll usually start with a smaller open- source model and they'll fine-tune that or do some distillation of that model to serve that agent or that AI use case right so often times what happens is new applications land at in the some big model then as they learn and then they want to put more workload into it then they will optimize the frameworks optimize the model optimize the runtime to really get faster performance and lower cost right and to keep within some tolerance of accuracy security safety as well right that's all part of the enterprise those things are super important so they don't lack acts on those those elements and so like okay as you're talking to your clients and as you're thinking about deployment for them like for their like cutting edge sophisticated use cases they're taking the top models the frontier models that tend to be a little bit higher latency a little bit higher cost but then as they're figuring out what works you you're saying they can almost verticalize they can take more specific models for that task smaller models and then almost in parallel the smaller models are getting better right they're learning from the larger models and so is that is that kind of the trend you're seeing and I would add to those two elements in two in two ways which is one is you know what we're bringing to our enterprises is that platform that allows them to manage a whole different range of models for their different applications right because they have complex deployments they have complex jobs to be done and for example we have a capability that we call model router and you know it might be hard for some enterprises to figure out when they're working to scale up their ambition and their projects with AI and it may be hard for them to know which models to go pick for each application well we have a capability that we call model router and you can just send your application your agent through the model router and you can provide it some input do you care about low cost do you care about fast performance do you care about just absolute best quality you know and so you can set those dials and then the router will pick the underlying models that it's been trained to understand the characteristics of these models so these are things that we want to take that overhead off of the enterprises plate to help them be able to use multiple models to optimize for each workload the other category I would add in here is that in the enterprise these models get smarter not just based on what the labs are doing or what's happening in open source but these models get smarter because you bring your enterprise data to them right so being able to say I'm going to bring my customer data or my supply chain data or my marketing data next to the open source model or the closed source model and being able to fine-tune those models or being able to reinforce learn those models that then makes these applications more capable they can handle higher or more complex workflows or jobs in the enterprise and ultimately they're going to deliver a more noticeable ROI but that enterprise data is an important part it's not just having something in open source or closed source you have to be able to bring that enterprise context close to and be able to evolve the model as well so because you mentioned open and closed source I want to go a little bit deeper in that how do you advise your customers on how to think about open verse closed source uh there's a a few leading Chinese companies coming out with incredible open source models does it really matter open versus closed source is there uh a security element a safety element is it just well whatever model is the best for this use case it doesn't matter open versus closed how do you think about it as Microsoft and then how do you advise your customers to think about that yeah absolutely so for us as part of building a platform a big part of what we care about is bringing that choice bringing that ecosystem right with 11,000 models in Foundry today or if I'm using GitHub Copilot I have a long list of different models that I can choose from or I can build custom agents and I can bring my custom models to to that equation as well so we want to support choice because we know one the space is advancing super fast right and to be dogmatic about one or the other isn't what builders and developers around the world want the second thing I would say is when we work with our customers it goes I think it first starts with understanding kind of where they are and what they're what they're trying to accomplish and that I think is much more about discovery and understanding what they're trying to accomplish and then trying to give them ideas or advise them or say or give them proof points right which is like hey another customer over here used this type of model to handle a similar type of project or objective in their enterprise here's another company that did this with an open source model right or they started open source they went to closed source and they started closed source or open source so I don't think there's a binary way to look at this I think every customer is one in a different point in this journey of AI transformation two is their jobs to be done are actually quite different and then where there is more I would say uh kind of examples or where we can package things up and say hey this is a package of something you can use to get started this is how we will work with our customers but a lot of that is with our partners with our field teams being embedded understanding the the priorities and the objectives of the customer I let me just touch one last part on on model diversity it sounds like that's not only a strategic decision for Microsoft but you know I've heard the there's you know potentially an omniodel future one large model to do the job of any artificial intelligence workload that you need sounds like you don't agree with that i don't know that I don't agree with it i would say right now where we are and kind of what we see in the ecosystem right now the technology is advancing at an incredible rate so is there one model that can do everything in the world i don't think that happens anytime soon but I do think that this choice of being able to use different models for different jobs to be able to bring your enterprise IP your organizations like IP data context all of that glued together with the model in the scaffolding that we're providing does allow for faster diffusion like more success with with AI now if there are models that just are you know way out there in terms of capability and they just become the default that will be another day that will have to think through what this programming paradigm looks like but I think there's so much context there's so much difference like different problems to be solved out there whether it be in healthcare or in biology or in supply chain or in climate or in you know finance processes or industrial processes that it's going to be hard for any one model to be the best at everything right and I think there's always going to be a need for some smaller model or some custom model for those those specific problems as AI becomes more and more diffuse in the world as more and more adoption happens in organizations in enterprises I think this is only going to accelerate the creativity and the customization of the underlying components okay and I mean let's let's switch to the kind of the dynamic with Open AI for a moment um you know Satia mentioned in an interview recently that Microsoft has access to kind of the latest IP from Open AI they get the latest models which is great how do you and your team think about building on top of the research that OpenAI has done versus kind of uh go going at it and and taking a completely different path and and like how do you balance those two kind of different paths that you could take yeah absolutely i mean both are areas of focus for us right and a lot of what we do in our partnership with OpenAI is being able to learn from and take that IP take that technology and then put it into our products in a very sophisticated way right and that does involve not just prompts it involves other post-training around around these models it may be combining these models with other models to solve you know these different opportunities or create these new paradigms or features in our in our products right and on the other side the work that we're doing in the MAI team is to go and be able to build our own models as we've announced some of these in the past couple of months as well so we are building that infrastructure that that capability we are shipping stuff in in that so it's two parallel threads in terms of what we're what we're pushing on and we're fortunate to be able to kind of learn from both aspects and both efforts so let's let's continue on your interaction and and your dynamic with Mustafa now and the MAI team so as they're building incredible things as they're releasing it how do you what is that dynamic what is that communication like where you're determining what to build into production and what do you deploy to your customers yeah it's high bandwidth in terms of being able to or our collaboration there right so it's infrastructure collaboration it's sort of like algorithm research you know collaboration it's how to uh get better at eval um it's working together on safety and trust issues together so I would largely say you know most of it is just think of it as like one team and the teams are working together and bringing kind of all of their different talent their different diversity to the table to solve these priority problems we have right and some of them may fit more with one team versus the other team right so for example the stuff we're doing to drive this model development for software development you know is largely done with teams that are more connected to GitHub copilot and to our developer tools right um so that's just where we divide and conquer but there's a lot of collaboration that's happening between the two perfect i want to talk a little bit about safety and security you were at Lace Work uh spent years on cloud security obviously there's a lot of attack vectors for artificial intelligence whether you're talking about the theft of model weights poisoning the weights themselves using AI i don't know if you saw that paper from anthropic about you like a completely autonomous system being used for hacking um what are you kind of most worried about what which attack vector kind of keeps you up at night if any

⚠️ [Warning Label]: the attack vectors that most security people worried about is the ones you know that you don't know yet, and I think the ones that you know you can put in mitigations to prevent those or to catch those and to mitigate them quickly

Final Verdict

i think we're in a place just like we talk about what AI can help you do to create and to solve problems that humankind may not have been able to solve for decades in the past now those things are reachable right with and and discoveries but the flip side is is okay well what can this technology do to break down a lot of the security conventions we've had in the past right because one it can operate really really fast two it's advancing really really fast right and three like it's learning along the way too so what we're doing and and it's throughout the the company but even here at Ignite everything we do has to hook into the overall security of the enterprise of the organization right so for example an agent created in our platform will get an ID that ID is tracked in entra it has policy and compliance and you can track it and you can grant it access or not and if it's doing something that it's not supposed to you can deactivate this right so these aren't things that are an afterthought for us they're designed kind of from the get-go when we're building a new part of the platform or a new tool and providing that understanding of what it is that observability of what it is and and making sure it adheres to your company's compliance or governance or security guidelines that you have your expectations that you have and then being able to act on these things right being able to go and say "Hey I want to go and trace through what this agent did yesterday in this customer support case." And being able to see line by line everything it did all the tools it called all the data it access if there was a human in the loop what did they approve or not approve and being able to really learn from how these things are operating and and that but security is got to be there from the start for us okay and uh last question for you Jay what is your most contrarian belief about artificial intelligence today i don't know if this is my most contrarian but I you know back to some of the things we were talking about we often get the question or part of the dialogue in interviews like this is you know how many lines of code does AI write for you right and that is a completely nonsensical and like dumb should that Yeah absolutely because I I think that is measuring something that doesn't really make sense in this world and really you have to focus on what are you able to do now that you couldn't do before right so I'll give you an example i was meeting with a bunch of execs from the financial services industry and one of the things that we talked about and they were realizing is they've had decade five years of technical debt that they have just been kind of kicking the can down the road for years because it's too expensive they don't have the right talent it takes too long the opportunity cost to go and divert to clean up some of this stuff versus go handle kind of new stuff is just too hard for them but now all of a sudden ability to shrink that technical debt to modernize their code bases to modernize their infrastructure to get into a healthier like security uh stature all of that now is more tractable right and and focused on those outcomes versus here's how much you know how many lines of code my AI wrote for me today all right well Jay thank you so much i appreciate your time

FAQ

How does Microsoft balance building on OpenAI's research with its own AI development?

Both are areas of focus for us right and a lot of what we do in our partnership with OpenAI is being able to learn from and take that IP take that technology and then put it into our products in a very sophisticated way right and that does involve not just prompts it involves other post-training around around these models it may be combining these models with other models to solve you know these different opportunities or create these new paradigms or features in our in our products right and on the other side the work that we're doing in the MAI team is to go and be able to build our own models as we've announced some of these in the past couple of months as well so we are building that infrastructure that that capability we are shipping stuff in in that so it's two parallel threads in terms of what we're what we're pushing on and

Comments