In today’s global economy, financial institutions are managing data across multiple physical locations and countries. Regulations including General Data Protection Regulation (GDPR) have changed the way organizations can move data across borders. With this in mind, our team at IBM Research has designed a new library for “fusion AI” that allows machine learning models to be trained on data at different locations, possibly belonging to different institutions, without having to move the data from its original location, while addressing privacy and trust concerns. Fusion AI helps organizations build predictive AI models without the security and regulatory problems that arise from having to move data to a central repository, while maintaining the full audit trail and lineage of the process on the blockchain in a tamper-proof and privacy-preserving manner.
Why fusion AI?
Every institution manages data in a different way. Data could be stored in many different systems, locations, and across geographic boundaries, including in a private data center, a private cloud, a public cloud, or split between any combination of these locations. Some institutions prefer not to move private data onto the public cloud. Other institutions might not have restrictions on sharing data across sites, but the data might be too large to be shared. In other cases, several organizations may be interested in training a machine learning model jointly without actually revealing their data. In addition, the process of moving data into one central location is time-consuming and may not be legally possible or desirable due to security concerns.
How fusion AI works
Fusion AI is based on distributed agents running machine learning algorithms on data at different locations working jointly with a Federation Manager that is responsible for fusion of the locally trained models. Specifically, the agents train their own models locally, after which they send only the model training parameters to the Federation Manager. The Federation Manager then combines these training parameters and sends the combined model back to the local agents. The net result is that each location gets a model equivalent to one trained against the different silos of data, even though they never see each other’s data. That’s what we call fusion AI. Fusion AI is designed with a suite of algorithms that address varying requirements, allowing institutions to choose the approach that is best suited for them.
Fusion AI saves time and money
The benefits of this method are that data does not have to move, stays private, and can be in different formats. Because fusion AI can use data in different formats, technology teams no longer have to convert all the different silos of data into common storage such as a relational database, flat files or blockchain. Instead, the data stays in its existing format and on its existing hardware. A company could keep its data in a private data center and could also keep data in the public cloud. They don’t have to be combined in order to create a model that uses insights from the combination. Also, the data can stay in different countries, so financial institutions won’t have to worry about regulations like GDPR that come into force when data moves across boundaries. The technology can save companies time and money, since instead of spending time on data collection, technology teams can be training AI models to better automate processes and, ultimately, provide better client service.
Fusion AI is fast and accurate
By moving model parameters instead of data, fusion AI leads to a 5x or more improvement in the network transit time. And because the AI models are being trained in parallel, it improves AI modeling time. We see this performance improvement as a big benefit. Another benefit is improved accuracy that comes when machine learning models are trained on diverse data as opposed to just training on one silo of data.
Fusion AI builds trust
Another benefit of fusion AI is trust based on blockchain. Each local agent records a hash of its data on the blockchain, along with additional information such as the dataset’s name, owner, and the date and time it was used in training. This information stored on blockchain provides a unique, immutable, and verifiable fingerprint of what exact data was used during the training without actually sharing the data itself. Local agents also record on blockchain information about the training process, such as what technique, input data, and input model parameters were used (or hashes of it), along with the accuracy achieved with each local training set. Also, the final model (or hash of it) is put on the blockchain. This way, we can use the blockchain to see the historical provenance of how a model was trained: exactly what data it was trained with, where the data came from, and if some data came from an agent that was malicious and actually decreased the accuracy. The blockchain-based approach for infusing trust into the distributed training process fits into the broader theme of trusted AI and, in particular, into the category of lineage services, which help to ensure that all components and events of the AI system are trackable and verifiable.
Industry applications of fusion AI
We have seen a lot of interest in fusion AI from the financial services industry. Fusion AI is becoming one of the most secure ways to provide more accurate and trusted machine learning models across international banks and other institutions. For instance, it can be used to build predictive models of risk to an overall financial portfolio based on liquidity in different countries, or to combine private data with public data such as social media posts to get insights into the credit scores of small and medium-sized enterprises to make more informed investment decisions.
Technologies like fusion AI enable IBM to offer a secure and trusted cloud platform for financial services, a differentiated platform that enables financial institutions to scale and collaborate with their FinTech partners and the rest of their ecosystem.
In addition to financial services, fusion AI offers advantages to any organization that manages a large amount of data across multiple sites or agencies, such as healthcare organizations and consortiums where individual members retain their local health data. Fusion AI is now available to IBM clients, and our team is interested in working with new partners to deploy fusion AI technologies into global organizations.
The post Fusion AI Builds Trust, Improves Global Data Management appeared first on IBM Blog Research.
via IBM Blog Research https://ibm.co/2cYaEHU
October 17, 2018 at 11:45AM