CTO Shawn Edwards on Bloomberg’s evolving data and AI approaches

Overview

Shawn Edwards, chief technology officer at Bloomberg L.P., joins host Maryfran Johnson for this CIO Leadership Live interview. They discuss Bloomberg's evolving data and AI approaches, constant product innovation cycles, leveraging alternative data sources, genAI's practical uses vs. its limitations and more.

Register Now

Transcript

[This transcript was auto-generated.]
Maryfran Johnson 0:03
Hello, good afternoon and welcome to CIO Leadership Live. I'm Maryfran Johnson, your host for the show and the CEO of Maryfran Johnson media. This video show and podcast is produced with the support of CIO.com and the digital media division of Foundry, which is an IDG company. We're streaming live to you right now on LinkedIn, and also to our CIO channel on YouTube. Our viewers are most cordially invited to join in today's conversation by submitting questions of your own. We have editors watching the stream, and they'll be happy to pass them along to me and my guest, who today I'm very pleased to say is Shawn Edwards. He is the CTO of Bloomberg. He oversees the development of Bloomberg global technology strategy, and he runs a uniquely influential CTO office that has been instrumental in developing all sorts of innovative products for Bloomberg market data analytics, news and community offerings. You all know who Bloomberg what Bloomberg is, of course, it was founded in 1981, and is based in New York City still, and Bloomberg provides economic, financial and other real time data, research and information to financial companies and organizations around the world. As one of the world's leading financial media organizations, Bloomberg News produces roughly 5000 stories a day. It's listed at number 33, now on Forbes list of America's largest private companies, and employs more than 21,000 around the world. Shawn has been with Bloomberg since 2003. And five years after he got there. In 2008. He created the CTO office, where he assembled what is today a team of about 300. IT experts working on everything from leading edge tech research, and user experience design to product management. His CTO office also runs Bloomberg information security, its risk and compliance offices, and the company's overall machine learning strategy. In partnership with Bloomberg AI engineering group, his team today works directly with academic researchers from top universities around the world on new ways to apply machine learning and AI it to finance. And one of the most critical recent growth initiatives at Bloomberg, which we'll be talking about on today's show was the creation of a new line of products centered around alternative data, which promises to become an essential aspect of Bloomberg financial analytics. Before he joined Bloomberg, Sean worked for Bear Stearns and company as the managing director in the fixed income trading group. And he's also held positions at Mentor Graphics and IBM. Shawn, it's great to have you here. Thank you.
Shawn Edwards 2:53
It's great to be here. Maryfran, thank you for having me.
Maryfran Johnson 2:57
All right, let's start out with just a few of those kind of dazzling numbers, a few examples that give us an idea about the scope and the scale and the speed of the data driven universe that you are part of at Bloomberg, all those market moving systems, a lot of that goes through your office. So talk a little bit about that.
Shawn Edwards 3:18
Yeah, you know, the financial markets, where we sit the capital markets is a, I think in a unique position in the world of dealing with a huge amount of heterogeneous data. Just to give you a few data points. Bloomberg and Jess over 300 billion messages from exchanges around the world we call them ticks every single day. And so just as a comparison, you know, the number of tweets that go on and Twitter I kind of their peak was about 500 million a day, that kind of gives you the comparison. And we take every one of those messages and we have to process it, we it's not just about taking it and pass it on. We have to normalize it. We do calculations on real time calculations, we store it we we distributed around the world, and it kicks off massive amounts of calculations on it. another data point that's so that's kind of from a world of structured data, the unstructured data world, we ingest over two and a half million documents every day of various types from it's something from, you know, 120,000 sources, and it's everything from a copy of filings and a Edgar filing to a press release to transcripts of meetings and earnings calls. Just a massive amount of unstructured data. This gives you kind of the flavor of the amount of data that we're dealing with. And on top of all that there is a mass amount of calculations and computations that are going on notional value traded through or exchanged through some of our systems is in trillions of dollars. And so it's it's a very large world. And it's very it's heterogeneous. It's not we're not dealing with one type of calculation or one type of data source, which I find fascinating and interesting.
Maryfran Johnson 5:19
Well, I think it is, I saw some of that when I was looking around and doing some of this research for our talk today, the company sits on over 100 petabytes of data. And there's access to something like 575 different exchange products, probably a lot of those are ones that stemmed out of the research and the work you do in the CTO office.
Shawn Edwards 5:41
Yeah, when the you know, when I first created the CTO office, you know, we were focused largely on researching and bringing in technology capabilities. Well, maybe I'll start and talk a little bit about the mission of the CTF, which is absolutely, it's still the same, right that when we started, you know, our mission has really been to help set the technology direction for the company. And it's evolved a bit. And now it's not just technology company, direction, it's also technology product direction for the company. And some of the early work that we did was around the real time systems and architecture for the company, real time data and market data, the data from venues is, is is such an important component of the ingredients that we use to drive our analytics. And it's an important part of what we offer for our customers. And so there's Bloomberg has had to build one of the world's largest private networks just to deal with this type of real time data. We have exchange connectivity and systems that are located around the world, to processes the data entered into get it out to all of our customers. Just that system alone is a really interesting area of research. And and we continue to focus on that it's not something that you can just stop at Mo, moments time, market data is growing exponentially. It's just something that we are continuing to tackle.
Maryfran Johnson 7:16
Well, and when we first started talking about this, and one of the things you said about the strategic focus of your office there is that the first hat that you wear, and you have a lot of hats that you wear. The first one is collaboration that you're also working with your team is about 300 people, but you are very deeply tapped into the engineering the Global Engineering Group, which is some 8000 data and systems engineers. That's where I have that right.
Shawn Edwards 7:44
That's all right. Yeah, it's growing every day. So number does change, quite frankly. Okay. Growing wise. Yeah. So yes, so we were a couple of different hats in the in my team helps lead the research for both technology and product ideas. We do that in a, in a very hands on fashion. We build prototypes in the lab with our engineering partners with our product people. We collaborate with every lots of teams internally, our sales team operations team, we also lead the efforts to with our collaborations with the broader tech community. And so we establish and run the open source office. We also lead all the academic funding programs, we actually run a program called the Bloomberg fellowship program, essentially, is where we are collaborating with PhD students and postdocs around the world where we fund their research for a year. We they come in internship with us, I sit side by side with our engineers and people in the CTO office, and we collaborate on their research, we ended up co authoring research in peer reviewed journals and conferences, instead of fantastic way. Both of those are a fantastic way to stay on top of the some of the most interesting innovations in science and in infrastructure technology. Clearly an open source, open source world is important just like any modern tech company, open source software. We use open source software throughout our tech stack from everywhere from, you know, the developer tools to the infrastructure to some of our most advanced analytic systems, including AI systems. And so we are in the research phase and that has, we are doing a lot of experimentation. We're doing a lot of communication about these ideas. Now, one thing to point out, though, is that this research is very directed. This is not, you know, in the days of the Bell Labs, or you know, Watson Research Center where we're doing fundamental scientific research, this is all about what types of problems do we think we have to tackle All to solve, to bring really interesting products into solve our customers problems. So it's very direct. And we try to optimize from research to products as quickly as possible. We try to shorten that time. Yeah. So the other half we were is that we are product managers. And so when some of these ideas make sense, and we socialize them, and we are starting to develop them into capabilities, or infrastructure or external products, or that we were as the product managers for those systems, no solutions. So we're, my team happens to be the product manager for most of our core infrastructure at Bloomberg. Okay?
Maryfran Johnson 10:43
And when you say, yeah, and, Shawn, when you say your product managers, you're referring to the kind of product management where you don't just create the product and hand it over, you stay with it for the lifetime of it, essentially, how, how long ago, did your CTO office expand into that, because in some, in a lot of companies, product management is a reasonably newer function for the IT organizations, it could be they've been doing it maybe for three to five years, but not much longer than that, you know, we
Shawn Edwards 11:13
We, we did it in phases. And it's kind of a, it wasn't a kind of a clear delineation. Bloomberg has always had a very strong product culture, they just use different terminology. Even before I came along, they were very, very concerned about each pixel on the screen going see customers and taking on back and optimizing what you're working on. So all the things that are essential to product management. So in the beginning, we would stay with a product for a while or capability. But we would tend to hand it off after some time, you know, to what we realized is some of the things that we've been building are, are very advanced. Technically, they are we're building products that we didn't build before, products for, for instance, for quants mathematicians on the street to code in Python to talk to our API's into our data. And that kind of you need product managers with deep technology experience. And so we felt it was better for us to stay with our products indefinitely, sometimes for longer periods times, and all kinds of is kind of a little bit flux.
Maryfran Johnson 12:28
Well, because your team there, your 300, they are PhD scientists, they are machine learning and AI experts. They're probably I mean, they sound like they're very deeply technical. But they also have a lot of business acumen from working at Bloomberg, how do you make sure that you keep that business acumen as high as the technical acumen? Yeah, what kind of how do you approach sort of the retention in the education? So,
Shawn Edwards 13:01
Bloomberg, the core culture of what we do is, we're obsessed with solving our customers problems. And so our product managers and our engineers are deeply embedded with solving those customer problems. They go see customers, they are in on the whiteboard with our product managers, and our designers and sales people. When you look at our footprint of where our engineers work, they've worked in New York City and in London. Now, those are two biggest hubs and we have offices, locations, smaller ones around the world, San Francisco and Frankfurt, but largely they're in in the financial centers around the world. Why is because we always wanted the people building the product to be right next to our customers right next to the salespeople right now. So it's impossible not to get in, you know, to be absorbing all this in, in in to to not be involved and not not to be close to it. In fact, we don't we don't have a philosophy of offshoring. We don't do that. We don't we don't believe in kissing. Okay. Even sending, you know, building some product way out somewhere far away. We believe in building the product and coding, sitting right next with to the product managers, Salesforce. And so right here in New York City in this building, and while I'm speaking and 731 Is it lots of engineers are sitting side by side with their product teams, the customers every day. Well,
Maryfran Johnson 14:38
and it sounds to that you get to stay. I don't know if you're still hands on the keyboard or if you're still doing you know, like, hands on work, but it does sound like you don't you don't let the rest of the team have all the fun. How much of that? How do you how do you keep yourself kind of oku wrong on all all of these different techniques. Don't chase. Yeah,
Shawn Edwards 15:01
look, I surround myself with people far smarter than me in any one of these technologies. And one of the strategic areas that we're investing in, you know, CTO office doesn't focus on focus on all the technology in the company, we focus in key strategic areas, and that list changes over time. But who do focus on it's, it's, it's, it's a, it's about, you know, collaborating, like I said, with other people to understanding what the problems are. My job is to help set them in the right direction, my, my job is to help them pull in all the right resources and the right ideas. But with that, I do get to join them with deep discussions on technology and architecture, product direction, it is the best part of the job, it is the job that keeps me coming to work. It's absolutely, you know, it's fine. Yeah, I like to tell you, but I have the best job in the company. Michael,
Maryfran Johnson 16:04
you know, the interviews with you over the years, you always are saying that, and you have had kind of an extraordinary longevity in a CTO role. But the more we talk about it, the more I realized that it is far from a typical CTO role. How fun that you got to create your dream job, and then you get to keep it for 16 years.
Shawn Edwards 16:25
Yeah, you know, I had the pleasure of being able to work with the founders of this company. directly with Thomson Kondo, one of the founders of a company for many years and incredible product, person, one of the best in the world. I got to work closely with Mike Bloomberg, and I still do. And it's this company is dynamic, it's, it is never resting on their laurels, we we have an obsession with doing better for our customers and what we did yesterday. Yeah, we we don't look a lot at what our cutting competitors do. That's that's kind of a thing at Bloomberg, you know, why are your competitors because then you're just going to be a me too product. Think about what we're doing. And think about what what problems we haven't solved? What are they? What is our customer struggling with? And the whole idea is that we're going to come up with solutions that customers could even draft off, right, that's our job to dream of.
Maryfran Johnson 17:17
They never knew they always needed and wanted.
Shawn Edwards 17:20
But the other thing is that Bloomberg is a place where people spend a lot a lot of time on, you know, a lot, a lot of time that they spend their careers here, I'm you know, I'm 20 years here. There's plenty of other people 20 years, a lot of our senior leadership has been here 20 plus years. It is a place where if you're the right type of person in the right personality, it's an amazing place to work. And I, obviously, I think that I've been here for so long. But the length of of tenure ship, I think shows that for a lot of
Maryfran Johnson 18:01
Yeah, well, the and I think that was also something that helps companies that are able to stay private, too. I've worked for many years. IDG was a private company that was the producer of computer world CIO magazine info world, all of these tech publications. And it was kind of it was quite wonderful to be part of a private because you get more of a focused view of what the company should be and what the mission is. And that sort of thing. I wanted to veer away a little bit before we get into talking more about AI and your this advanced the new alternative data initiatives that you've mentioned, I wanted to go up to that kind of big picture view across the financial services industry. And this was something I asked you before when we were preparing for this, about what your industry peers are struggling with the most today. And I was a little surprised to find out that it's still data, it's still the massive amounts of data talked about why that is? I mean, it seems it almost feels like especially to people who don't work with data at the kind of depth and levels that you all do. It seems like why don't why can't we get our arms around this? Finally, once and for all in businesses?
Shawn Edwards 19:19
There's one thing about collecting data and storing it and having search engines to point at it. And I think the world is, you know, both private and or proprietary and open source software is really made that kind of almost a solved problem. But you what you find, quickly find when you're trying to use data that it's so much more than access to the data in order to, to to to be able to generate insights out of data to really understand what's going on in the world. There's the data has to be linked. The data has has to be processed and organized in a way that you can ask the right questions that you can derive insights out of it. And so the part that I think a lot of people realize now and Bloomberg realized from day one is that domain expertise has to go in hand in hand with data. And so we have been collecting high quality data for over 40 years. It's it's a great asset that we have. But it wasn't the fact that we were just collecting the data, it was the fact that we, we had teams and teams of people who became experts, and understood every facet of that data, understood what was good, what was bad, understood how to process it, but more importantly, understood its place with relationship to all the other data and all the systems that we have. Having a pile of data is one thing. Using it is very, usually very, very hard. There's, you're just given an example. You know, it's often quoted that, you know, 80% of quants job mathematicians job on the street, is just processing state of getting the data into the right form. And that's a lot of time that people are doing that, that's time spent, not generating some other insight that you could have been generating, it's time spent away from looking at the other idea. And so, even at Bloomberg, you know, we've been rethinking and reimagining how even are structured data, then we'll get to unstructured data. You know, I'm sure we'll talk about that with the AI. But even the data we've been collecting for many years, we used it very effectively, in certain ways. But then when we started thinking about building, cross asset class analytics, being able to look at a query across all of our data domains, you know, looking at Bond information and company information, but also macro information or even weather information, how do you make that simple to, to be able to ask a question that joins across all of them, it's not trivial. So we have been spending a lot of time modeling our data, creating this unified Bloomberg model data model that joins and creates a captures all the relationships between these entities and these objects we built. As far as we know, on the you know, largest knowledge graphs there is, which captures these relationships, which allows you to reverse through this data to to in kind of the nth order, if you will, or almost indefinite kind of relationship traversal, through relationships through this data. And that allows you to ask questions you couldn't ask before it if you didn't organize it this way, if you didn't connect it in this way. And so beta is still a very hard problem. And there's no shortcut, we found that there's no shortcut, A, you go ahead and get any system you want. If you don't have the domain knowledge to process and link this data, then you're not getting everything you can get out of your data. It's also it's also labor intensive, because it's, it's not about typing on the keyboard, it's about thinking about all these relationships and how to connect them. And what makes sense, this data is messy. It's not like it's it's all indexed on one thing. We're dealing with financial markets, looks at the world and it at a very broad scope, very broad, wet manner. And so there's all these heterogeneous, I can use no term, but lots of dots of different types of objects in in relationships that you're trying to capture. Right?
Maryfran Johnson 23:49
Well, and I think about how many companies especially in the last five years, have created an entire chief data officer position that often sits, you know, at the same level and right next to a CIO or a CTO? I don't know, do you have a position like that at Bloomberg? Or is that essentially what you're doing in the CTO office?
Shawn Edwards 24:08
We do have a, we have the Global Head of data, we have a data division, they operate different from a lot of those chief data officers at the, let's say, the banks. Our head of data is a person who runs the organization that's responsible for ingestion and PaaS. And in processing, and a lot of it he has this person has a lot of the domain experts who are the initial creating the initial insights on that data, which then gets to other teams, and they use it to power their analytics. And so it I would say it's a different kind of job. But we do have somebody who's a globally looking at our data assets.
Maryfran Johnson 24:54
All right. The one of the other areas that Uh, we had talked about and I'm, I'm kind of itching to get into it, too. It's about the AI. About when you and I talked about AI, you mentioned you said, Well, you know, there's a really big whoop right now about Gen AI. But we've been doing AI, we've been ingesting AI for 15 years, so and it just, it's reminiscent to me a little of some of the other CIOs I've talked with, they're like, we're not exactly new to AI. It's just generative AI has so captured the minds of marketers, and CEOs, and probably more people who don't really understand the complexities of data, like folks like you would, but let's talk about what you're doing in that area. Because you said you've got, you've got some a whole story around Gen AI, and you'd like to talk about it. So let's go ahead and Veer over?
Shawn Edwards 25:50
Well, let me let's start with AI in general, right. So you mentioned, we have been building machine learning models for over 15 years, we, we have hundreds, if not 1000s of models running in production right now, who are doing all it's doing everything from processing this massive amount of data that I mentioned, that ingest of data, we have it, we're building analytics and insights and deriving creating new datasets using it, we also use it directly in our terminal, our flagship product, we build features out of it to running lung cut, when customers are using some of our most popular functions. They're using these the they're these models are generating a lot of the information and also features that we have there. So let me just give you an example. Right out of that two and a half million documents that we ingest every day, we have multiple machine learning models, looking at every one of those documents, right. So we have some models that are doing things determining which topics are discussed in let's say, a new story. Other ones that are doing named any disambiguation. Basically, understanding what people and what companies are being discussed in are located in there, we map them to our IDs and map them to our databases, we have other ones doing things like sentiment analysis. So this is all unstructured world, we have other models that are generating, you know, prices for bonds and overcome market generating prices for bonds that are illiquid. And so this is our bread and butter. This is something that we are using every single day. So it is kind of funny when you know, people are excited about AI, it is great. It's an exciting field we've been talking about for years, it's something we've been building up our capabilities for quite a while. So we have an established process, establish, you know, teams and process for running AI at scale at Bloomberg, and we invest quite a bit in the people and the technology, or we have a very large data science platform, you can you can imagine what that is lots of interesting software running on massive amount of GPUs. We collaborate with everyone on the domain aspects of building any of these products, any of these models are always built with the domain experts I mentioned before the subject matter experts in all these different fields, whether they're in our data team, or on our product team, we're always building our AI models hand in hand with them. And, and it's in we're always we're using AI, we're using our rich datasets in AI and we're using AI to enhance our dataset. So to kind of know, but getting to Gen AI. Yeah, you know, it's really interesting, you know, this all obviously captured the world's attention when open AI released their product. And now money, my uncle and aunt are coming talk to me about AI solid, right? Yeah.
Maryfran Johnson 28:59
And they must they must sit you down and say, Shawn, this kind of stuff, don't you? What should I be worried about this? Should I be excited about it? And I fortunately, you told me one of your functions at at one of the kinds of upper case things that you do at Bloomberg is that you keep silly stuff from invading the company, you know, like you don't chase every technology trend out there. We had an interesting chat about blockchain and how everybody wanted to like, throw lots of dollars at that. And you guys looked at it and said, it's really just another database, you know?
Shawn Edwards 29:36
Yeah, exactly. Exactly. We looked at it from a first principles kind of approach and said, You know, it's a database that has these properties. When do we need these properties? And you know, we talked to everybody at that. Almost everybody on the West Coast, East Coast. And yeah, we held my ground and, you know, and we're, we were right, you know, this was, you know, we don't we don't I need blockchain to solve the problems that we have. And I think it's kind of died down. And so you're absolutely right, I often joke and say, half my job is keeping the silly things out of the company. But Gen AI is not a silly thing. It is, we are convinced that large language models will have played an important role in solving the kinds of problems we're solving. However, there has been a tremendous amount of hype. And I think everybody now kind of has been writing about 2024 is kind of where everybody comes down from the from the high and the excitement of the overhype and people are going to realize it, it's not every thing that people are saying. So we were never in the camp that general AI is the solution to the problems, we have a particular philosophy, how we think about large language models. The you know, first of all, these models operate very, very differently from our existing AI models, right. So the first thing to say is that these models have are more general, these models out of the box, you can actually do this, you can ask it to do sentiment analysis, you can do it information extraction, you can do a lot of those things where we've built previous, you know, individual models for now, the point I want to make, though, is that we will continue to invest and build the libcom traditional AI models, which is kind of funny just the state of the art and they still are because we we operate at this very interesting intersection of, of constraints, we have mentioned that we ingest a massive amount of data. So huge volume, but we also operate, often our processing and our analytics have to operate at very low latency. And so we this, we operate in usually in the 10s of milliseconds to do a lot of our processing. And so that's unlike many other tech firms, they usually can take a little time to to deliver a message or to analyze something. And we also the other constraint is our precision has to be extremely high. Bloomberg is a trusted source of information on the street. And so we have to have correct numbers. So this intersection of those three constraints usually means that we have to build very, very high precision models, very custom tuned for a particular problem that can operate at low latency, and we obsess on the quality of that. And so large language models might be able to do sentiment analysis, but they do it in a very slow fashion. It takes seconds to get an answer, etc. They're not there yet to process the amount of data that we have. The accuracy is a big problem. All right. Well, it's that in a second. So they finished the kind of interesting the interesting part, the positives right there general, they have a broad knowledge of the world. trained on huge amount of data, they really do know it, no quotes, understand a lot of parts of the world a lot of different topics that you can talk about with it. And the interesting thing is that they they're accessible you you program through language, you give them sentences and paragraphs, and all of our other models. You need a AI engineer to program with Python or CUDA or something.
Maryfran Johnson 33:42
Yes, you're in and talk to it really
Shawn Edwards 33:47
opens the door to it. But now that the challenges are like we were just getting to is that they have some limitations. And I think that's what the world is kind of finding right now. They hallucinate, right? Where they give wrong answers. There's no, there's no ground truth. And these models, they are predicting the next word. Ultimately, these large language models, they're more than they do it in a very incredible, incredibly interesting way in a very advanced way. But ultimately, they're they have a, a statistical view of what the next word should be in a sentence. That's different from having a ground truth. So they hallucinate or give wrong, factual, wrong.
Maryfran Johnson 34:26
They create a kind of science fiction views of the data
Shawn Edwards 34:32
Most the time it's good, but you know, you view the they do give wrong answers. Enough of the time that yeah, it's tough to use it in a in a in a in a for a problem where you need super high accuracy. The other problem is that they don't really understand logical reasoning, all that well. They don't do math. They don't understand the temporal information and the temporal nature of information that you trained it on. So there's limitations. And so our philosophy on this is that they're still useful, they're still useful in a way that if you combine it with our trusted data and our trusted analytics, we can use it to it, we can use their properties of understanding language. And so you so instead of, we wouldn't we don't want to use the large language model to answer questions that up by itself, we're not going to have a bloomer chat GPT type of thing work with a model has been trained on lots of information, and we trust a model to give the right answer. That's where you get in trouble. Instead, you teach the model to talk to our databases, teach the model to talk to our analytics, embedded into our workflows, and it becomes just one ingredient that you mix with all of our trusted sources of information in our workflow. Yeah, with that, you can solve some really interesting problems, right?
Maryfran Johnson 36:00
Well you use the phrase where you said we use it to nourish our data. Yeah, which is just thinking of the data as a big, hungry, hungry animal at the table. Give us an example of how it's something that is happening right now where it's nourishing data, that was a much harder road to come by before.
Shawn Edwards 36:23
The area that I'm really excited about one of the areas we're really excited about is the unstructured data in these documents is millions of documents that we have, it's typically been very difficult or time consuming, to build a model a machine learning model to be able to extract information on out of a particular document. And so we like I said before, we build particular models to go and look at a document and extract information, let's say earnings on a company's filings and, and we process that and everything else. But when somebody want comes to when they want to part, when somebody in the company comes to us and say, Hey, we want to extract this kind of information, it's usually a whole different, you know, model that you have to go through large language models make ad hoc exploration of data, and unstructured data and unstructured documents. It makes it a reality, it allows people to be able to mine data, and also make the data and the information in that in those documents almost as liquid as the data that's in our databases, you now have a different way of thinking about this data that's been kind of difficult to use, you can make this accessible. And I do would be eventually to work on joining it with your structured data. And so I now can think of I keep harping on earlier, I was harping on the fact that we have to link our data sources. Now linking the information that's contained in the unstructured world with the the databases and the structured data is incredibly promising and powerful. And so this is some of the things that we're working on. I think when we look at the kinds of products that we are now releasing, there, they have been really around first and foremost around efficiencies in how do you allow people to digest and understand this massive flow of information, and how you allow people to interact with our products in a more easy way, a simpler way to discover the information that we have to be able to interact with our systems in a more natural way. And I think that's what's really exciting. That's what we're focusing on. Okay.
Maryfran Johnson 38:45
Well, you'd mentioned to the last year I think you've mentioned in March, there was a paper published on Bloomberg GPT, where it was the world's first finance specific LLM the large language model. And that's is that essentially something happening in your quant research area? Or is is that one of many such products you've already rolled out? Or is that kind of the first of its kind?
Shawn Edwards 39:11
So that was a research project. That is an example of the kinds of research that we do. Both the CTO office and our engineering teams, our AI engineering teams partnered on exploring, what would how would we go about building a large domain specific large language model for financing, as you mentioned, at that time, there wasn't one that existed but we also set out to do something novel and interesting and part of the reason why the paper was very popular is that we want to build a model that was both a domain specific model but also also a generic model. And so what we did, we ended up building a, a 5050 billing perimeter model to the size you can think about the size of the model was a large language model, it was built on 700 billion tokens tokens, you can think about it to work. So a large, large amount of documents, a lot of data, half of half of that data was from kind of an open source world, think of like Wikipedia and the Library of Congress. The other half was financial data that Bloomberg had. And paper really goes through how really most of the paper was, how we built it, but also how it would go through the evaluations, how did it how did it pass, you know, how does it compare? And how does it work against benchmarks. And what we showed was you can build a model that is really good at the generic, just as good in the generic domain, you know, the generic, generic questions of English and, and problems, but much better performing in domain in finance. So this was a research project, we since then we did not take this model and just release it because of all the other issues that we mentioned, the hallucination problem, and and how do you build safety into these models, there was a lot for us to learn before we released a model like that. And so since then, we have been building more and more powerful models, we are building models that now understand not just about companies and news, but also about derivatives, municipal bonds, a lot of macro data, you know, Uber has been collecting for years collecting data from central banks, their reports and their transcripts. And we're really rich discussions on on macro information and economies of the world. And so we've been gathering lots of real high quality data. But there's a lot of work that goes into curating even though you have to process the data before you train it. And so there's been a lot of work going in there. In the meantime, we've been building successfully, larger models, different models of different sizes that we use for different things. And we are now just now releasing a series of products that are using our large language models. We just released this week, our first product that uses large language models, and it's an earnings call transcript, summarization and AI generation generated summaries. It's called transcripts. And we do it in a different way, I think then most other people do it. Very interesting way. And the so much time was spent to make sure it's accurate, it doesn't loosen a lot of processing that goes along, it's not just sticking a large language model. And saying go ahead and summarize it, there's much more to it than that, and a lot of checks to make sure that it's doing the right thing and training on those models. This is just a first of a series of products that we have that will be coming out in the next several months, that and throughout the year. That Are they are really advancing the the state of the art for people to you know, stay in the art of use of AI. But more importantly, we're solving customer problems. Transcription themselves, right? If you think about the problem we're solving, analysts are inundated with lots of information, sellside reports, etc. But they during an earnings call season, they are they have to either sit through hours of these calls. And and also read other call a company's call, sometimes they cover.
Maryfran Johnson 43:27
They're companies interpreting the calls, I know I've read some of those myself. And I think if I had to read more than one or two a day, I would get a little suicidal, they are just there, there were very dense.
Shawn Edwards 43:39
We're helping our users be able to get a you know, what's the essence, what's the most important things that were said in that call, but we can also bring to light some of our structured data to give people context, as well, when a CEO gives guidance or CFO gives guidance on a particular, you know, fundamental number that they're giving out, we can check is that above and beyond, above or below expectation, we can look at historical trends of their debt paid out, etc. So we can do a lot to help with customers understand even what really happened because sometimes it's not just a number, it's also what's going around number and content. This will help our customers digest and look at more transcripts, maybe there is not even the companies that they might be covering, they might cover a company but they this could help them look at the supplier to that company or their that company's customers and, and get a better sense overall of, of of what's going on in the world. Okay,
Maryfran Johnson 44:43
I have two really excellent questions that have come in from our alert and watching listeners here. One of them is asking if you could please elaborate a little more on your ongoing data quality challenges and how you and your team came up with the To ensure the best data possible constantly?
Shawn Edwards 45:05
Well, first, first and foremost, it starts with using judgment about what are the high quality sources of data? Right? You know, you have to be careful about where you get your data and whom you get from etc.
Maryfran Johnson 45:17
Who is reading about this?
Shawn Edwards 45:21
High quality stuff, right. And, and so there's a lot of fun. And there's a lot of people who put a lot of thought into sourcing the right data. A lot of a lot of a lot of what we have done from when it comes to my processing and a quality perspective, is understanding what the data should look like, what are the anomalies? What are what what do problems look like. And so it there's not one type of approach, it's really a collection of approaches when it comes to this. There are rules based systems where people have typed up rules, and you have scripts that are looking at and processing this data, capturing a lot of their domain knowledge into programs. There are sanity checks, where you're looking at the differences between yesterday and today and looking at whether or not there are some really abnormal kind of movements and things. There are there are machine learning tools that we use and approaches to look at statistical understanding of what the data looks like. So there's kind of a multifaceted approach to data quality data processing, there's not one way one thing to do. And it takes a lot of experience building that up over time. Yeah. And so it's a huge effort that we put into it. It's an investment we make continually to make because it's so important, and we get better at it. And in there is it there will say that the systems also rely on human beings, when something's wrong, when there's an anomaly. There's something that says something's not right here, it kicks it out to a human being. And so we have experts who live and breathe this particular domain, this data, and they can look at and say, something's wrong here. Right? Okay. Well, I know this is still human in the loop. This is not about getting rid of the humans, it's about making our people, you know, when we started using machine learning in our data division, you know, we've had when I joined the company, it was largely, you know, rows and rows of data clerks, people who would use rules based scripts and engines and and But largely, they were like typing into our databases and copy and paste all entry. Yeah. You know, so much has been automated, but we haven't gotten rid of any of those people, that team has only grown. Those people are the experts who deal with the exceptions. They're also the ones that train our models that give feedback to the models, they are the ones who are keeping the systems knowledgeable and keep them in training them and keeping them in tuning those systems will return. So it's really interesting, in and of itself, data processing.
Maryfran Johnson 48:09
Yes. Well, there was another question to about and this gets into the security aspects of data. Are there any insights from you on how to tackle the data security issue, especially in an AI enabled world of the future? And the preface to this question was how easily we connect and retrieve data from all these different sources and how important that makes the probably the data quality as well as the security? And then there are some that of course, have national security aspects associated with sharing them across supply chain ecosystems? Sure. And I know that you know, the topic of cybersecurity is always a very dicey, Thin Ice area for CTOs or CIOs to speak to. But if you have insights or advice you'd like to offer there, then by all means, here's your opportunity.
Shawn Edwards 49:00
Well, I think it is a pretty complex answer, because it's a complex question. First and foremost, it starts with getting the right data sources. It's also, you know, how you think about how you're processing this data, the systems that you build, you know, we just don't take data from anywhere we have connections to sources to exchanges over private lines, we build security into all of our infrastructure. Are there every one of our employees has biometric authentication to do to get into our systems and then multiple layers of that. There's a lot of checks that go back and forth on this data we, we we capture the original documents in original sources and actually we provide transparency to our customers oftentimes will have a company's fundamentals shown their revenue numbers are their debt numbers. And you can click right on a Bloomberg terminal and see an actual PDF that we sourced from a government website or from whatever source that we got, we use that ourselves to validate a lot of this of the sources. So source data is source information is is vitally important. I don't know if there's any silver bullet, I can give terms of advice, it's kind of investment along the entire chain security has to be built from not as an afterthought, it has to be built from the ground up and everything we do. And that's, you know, I think that's been the ethos and understood from the beginning when the company started.
Maryfran Johnson 50:39
Yeah, well, and I always love examples that say, well, this isn't the silver bullet, because from what I can tell, the only thing a silver bullet has ever killed is a vampire infection. no silver bullets really don't exist in the real world, although maybe we wish they did right. Now, another topic I wanted to touch on, because I know that this is close to your heart, about creating the right environment for innovation to thrive. And how have you how I, it's pretty clear how you're doing that today in your 16 year career as the CTO at Bloomberg, how has the way you approach that changed or adapted or matured over time?
Shawn Edwards 51:20
Yeah, it is a, like you said, it's a topic that's near and dear to my heart. It's something that we we think about a lot here, I think a lot about it. But I have to say with first starts with you know, we are building on top of the company culture that we that I came to, and 20 years ago, Mike Bloomberg had is known is famous for creating an open environment, both at the company and also when he went to, you know, became mayor, he brought that same kind of open seating arrangement environment where people are sitting next together, together who are working on projects. I mentioned how we get to the whiteboard. And we are collaborating across teams. I mentioned how the engineer sit here, the way we sit is not based on reporting changes based on what you're working on. What teams are you working on. And so you'll have people from different departments all sitting around each other. That spirit of an in that spirit of an innovative environment where it allows for the free flow of exchange of ideas is something that I've been trying to build upon. For my time. We helped build the first UX design department at Bloomberg. And suddenly, we had people with really interesting backgrounds, artists.
Maryfran Johnson 52:48
They were not quants and math majors.
Shawn Edwards 52:51
They were programmers or quantum math majors a different perspective on the environment, on the problems and how to solve them. These people with psychology degrees, etc. And it really this was some time ago, when I first started, it really drove home this concept that if we could create an environment where we have a cross of, I'm sorry, a collection of different disciplines, a collection of different backgrounds, people from very diverse, you know, histories and experiences, then, and we are all working in focus on a particular problem. What I saw was it was incredible exchange of ideas are really challenging over ideas, and you have to foster that environment that challenging ideas is okay.
Maryfran Johnson 53:44
Psychological safety, I've heard that called recently, yes.
Shawn Edwards 53:46
If you can build upon that, then and and, and really encourage people, then I think you can get some of the best solutions out of people, you can get the best ideas and it creates an incredible environment for people to thrive and they're all learning from each other. And so I've been trying to do that. And in in drive home, the idea that we are really dependent on each other the the best ideas don't necessarily come from the researcher who might be an expert in the idea, the best idea might come from a salesperson or an operations person. And a really great idea can come from our customers. Really great idea can come from really almost anywhere. We have to be open to accepting those ideas and challenging our way of thinking. And so we do a lot of this work within those into Office of sharing ideas and presenting and challenging and creating this, this ecosystem of of, of knowledge sharing, if you will, and it's it's trying new things it's it's it's experimenting with that idea and seeing what works, what doesn't work, but being opening to people with backgrounds that might be non traditional Um, yeah, in bringing them into the mix. Okay. So it's been, it's been exciting. It's it is one of where you also have to train people to communicate properly. There's a lot of focus on how do we communicate our ideas? How do we present our ideas, not just to the experts who have the same background as you, but for people who who don't have expertise. There was a really interesting research by Katherine Phillips, I think her name was she was vice dean at Columbia Business School, she passed away some time ago, but one of her research was studying why do diverse teams work better, and her research showed that it wasn't because there was new it necessarily, there was new ideas, that was part of it. But it'd be it because it made people uncomfortable. It made people work harder at explaining and defending and challenging each other's ideas. So this idea that all these new ideas is one part of it. But the other part is that you have to communicate differently, you have to express it and makes you think about your problems differently. So that's a big focus of what ours because
Maryfran Johnson 56:10
If you're in your total comfort zone, you'll just be the way you always are. And your brain will be running along the same tracks. Whereas when you're being a little more careful, to be direct and respectful, and also to I think people listen differently, as well, when they're in diverse groups like that. Very last question for you. In these last few minutes, there was a press release out that was throwing your name all around on the internet about alt alt data, Bloomberg making alternative data accessible alongside traditional financial data. This is a couple of months ago in September, what first of all, tell the psych majors and journalism majors out there? Is alternative data unstructured data? Or what do you know?
Shawn Edwards 56:51
Yeah, you know, it's a kind of a, it's kind of funny, we joke about and say, you know, it's a, it's a silly name, alternative, it does data. But it's, it's just non traditional data for financial markets. It's, it's the data that the companies have reported, or the stuff that the markets haven't reported. But it typically comes in really interesting ways. It's often the exhaust of other businesses is oftentimes unstructured data, but sometimes structured data, but very complex, it typically involves advanced processing capabilities and techniques in advanced data science to deal with it. So it's things like credit card observation, credit card, receipts. And it it is things like satellite imagery and shipping information, things that, you know, traditionally in finance, we're very much on the outside in the edge. And just true to Bloomberg original mission to bring transparency and efficiencies into the market. We're looking at how do we take this complex data that only a few hedge funds know how to process and bring it to the masses? How do we take this and give it out to all the analysts and portfolio managers so they can get observations and insights out of this data? And we're, then we've released some interesting capabilities on on the on the Bloomberg terminal for everyone. And we're going to continue looking at really interesting assets and data sources to generate new insights for people.
Maryfran Johnson 58:21
Yeah, well, I can see how it would be a little more nerve wracking to because the chances of getting something a little skewed with non traditional data are probably a lot higher. I imagine your models probably figure that out. But you
Shawn Edwards 58:35
know, it's it's a I don't know, if it's, it's the hard part is generating the insights, but tying it back and making it useful to somebody who's analyzing a company. And so it's great to have all the satellite imagery and or this credit card data. What does that mean? What does that mean to an analyst who's looking at Starbucks same store sales or Lululemon online sales? How does that How do you tie it back to the the information that we're already used to looking at? How do you tie it back to the company's reported data? KPIs, how do you allow them to now cast on it? How do you make it really simple, then you have to give them all the statistics about the aerobars and the reliability of the data. So you do have to do that part of it. The hard part is actually tying it back to what they understand and the other information about the company.
Maryfran Johnson 59:32
Okay, well, that's a really great explanation. Thank you. I can see I can tell you work with people and universities and the academics that you all have around you, because you've just, you probably have been like this all your life, but you really explained things very well. So thank you. Thanks for that. And thanks for joining us here today. We got a couple of great questions from our audience. And you've been an absolute delight to talk to so I can't thank you enough for being here. Ron Reubens leadership live. All right. If you joined us late for this conversation today, do not despair. You can watch the full interview this whole episode later right here on LinkedIn but also on cio.com. and on our YouTube channel CIOs YouTube channel. Leadership live is available as an audio podcast wherever you find your podcasts today, and I hope that you enjoyed and learned from this conversation today with CTO Sean Edwards of Bloomberg as much as I did. I'll be back again next month on Wednesday, February 17, at noon, Eastern when I'll be joined by Karl Pierburg, who is the Senior Vice President and CTO and AMB Sports and Entertainment, which among other things is the parent company of the Atlanta Falcons. Do take a moment to subscribe to our CIO YouTube channel, where you can find more than 120 of these similarly fabulous kind of interviews with some of the leading lights in the CIO and CTO world today. Thanks so much for joining us, and we'll see you here again next time.