Archive for the ‘Machine Learning’ Category

New York Institute of Finance and Google Cloud launch a Machine Learning for Trading Specialisation on Coursera – HedgeWeek

The New York Institute of Finance (NYIF) and Google Cloud have launched a new Machine Learning for Trading Specialisation available exclusively on the Coursera platform.

The Specialisation helps learners leverage the latest AI and machine learning techniques for financial trading.

Amid the Fourth Industrial Revolution, nearly 80 per cent of financial institutions cite machine learning as a core component of business strategy and 75 per cent of financial services firms report investing significantly in machine learning. The Machine Learning for Trading Specialisation equips professionals with key technical skills increasingly needed in the financial industry today.

Composed of three courses in financial trading, machine learning, and artificial intelligence, the Specialisation features a blend of theoretical and applied learning. Topics include analysing market data sets, building financial models for quantitative and algorithmic trading, and applying machine learning in quantitative finance.

As we enter an era of unprecedented technological change within our sector, were proud to offer up-skilling opportunities for hedge fund traders and managers, risk analysts, and other financial professionals to remain competitive through Coursera, says Michael Lee, Managing Director of Corporate Development at NYIF. The past ten years have demonstrated the staying power of AI tools in the finance world, further proving the importance for both new and seasoned professionals to hone relevant tech skills.

The Specialisation is particularly suited for hedge fund traders, analysts, day traders, those involved in investment management or portfolio management, and anyone interested in constructing effective trading strategies using machine learning. Prerequisites include basic competency with Python, familiarity with pertinent libraries for machine learning, a background in statistics, and foundational knowledge of financial markets.

Cutting-edge technologies, such as machine and reinforcement learning, have become increasingly commonplace in finance, says Rochana Golani, Director, Google Cloud Learning Services. Were excited for learners on Coursera to explore the potential of machine learning within trading. Looking beyond traditional finance roles, were also excited for the Specialisation to support machine learning professionals seeking to apply their craft to quantitative trading strategies.

View post:
New York Institute of Finance and Google Cloud launch a Machine Learning for Trading Specialisation on Coursera - HedgeWeek

Iguazio pulls in $24m from investors, shows off storage-integrated parallelised, real-time AI/machine learning workflows – Blocks and Files

Workflow-integrated storage supplier Iguazio has received $24m in C-round funding and announced its Data Science Platform. This is deeply integrated into AI and machine learning processes, and accelerates them to real-time speeds through parallel access to multi-protocol views of a single storage silo using data container tech.

The firm said digital payment platform provider Payoneer is using it for proactive fraud prevention with real-time machine learning and predictive analytics.

Yaron Weiss, VP Corporate Security and Global IT Operations (CISO) at Payoneer, said of Iguazios Data Science Platform: Weve tackled one of our most elusive challenges with real-time predictive models, making fraud attacks almost impossible on Payoneer.

He said Payoneer had built a system which adapts to new threats and enables is to prevent fraud with minimum false positives. The systems predictive machine learning models identify suspicious fraud and money laundering patterns continuously.

Weiss said fraud was detected retroactively with offline machine learning models; customers could only block users after damage had already been done. Now it can take the same models and serve them in real time against fresh data.

The Iguazio system uses a low latency serverless framework, a real-time multi-model data engine and a Python eco-system running over Kubernetes. Iguazio claims an estimated 87 per cent of data science models which have shown promise in the lab never make it to production because of difficulties in making them operational and able to scale.

It is based on so-called data containers that store normalised data from multiple sources; incoming stream records, files, binary objects, and table items. The data is indexed, and encoded by a parallel processing engine. Its stored in the most efficient way to reduce data footprint while maximising search and scan performance for each data type.

Data containers are accessed througha V310 API and can be read as any type regardless of how it was ingested. Applications can read, update, search, and manipulate data objects, while the data service ensures data consistency, durability, and availability.

Customers can submit SQL or API queries for file metadata, to identify or manipulate specific objects without long and resource-consuming directory traversals, eliminating any need for separate and non-synchronised file-metadata databases.

So-called API engines engine uses offload techniques for common transactions, analytics queries, real-time streaming, time-series, and machine-learning logic. They accept data and metadata queries, distribute them across all CPUs, and leverage data encoding and indexing schemes to eliminate I/O operations. Iguazio claims this provides magnitudes faster analytics and eliminates network chatter.

The Iguazio software is claimed to be able to accelerate the performance of tools such as Apache Hadoop and Spark by up to 100 times without requiring any software changes.

This DataScience Platform can run on-premises or in the public cloud. The Iguazio website contains much detail about its components and organisation.

Iguazio will use the $24m to fund product innovation and support global expansion into new and existing markets. The round was led by INCapital Ventures, with participation from existing and new investors, including Samsung SDS, Kensington Capital Partners, Plaza Ventures and Silverton Capital Ventures.

Originally posted here:
Iguazio pulls in $24m from investors, shows off storage-integrated parallelised, real-time AI/machine learning workflows - Blocks and Files

Federated machine learning is coming – here’s the questions we should be asking – Diginomica

A few years ago, I wondered how edge data would ever be useful given the enormous cost of transmitting all the data to either the centralized data center or some variant of cloud infrastructure. (It is said that 5G will solve that problem).

Consider, for example, applications of vast sensor networks that stream a great deal of data at small intervals. Vehicles on the move are a good example.

There is telemetry from cameras, radar, sonar, GPS and LIDAR, the latter about 70MB/sec. This could quickly amount to four terabytes per day (per vehicle). How much of this data needs to be retained? Answers I heard a few years ago were along two lines:

My counterarguments at the time were:

Introducing TensorFlow federated, via The TensorFlow Blog:

This centralized approach can be problematic if the data is sensitive or expensive to centralize. Wouldn't it be better if we could run the data analysis and machine learning right on the devices where that data is generated, and still be able to aggregate together what's been learned?

Since I looked at this a few years ago, the distinction between an edge device and a sensor has more or less disappeared. Sensors can transmit via wifi (though there is an issue of battery life, and if they're remote, that's a problem); the definition of the edge has widened quite a bit.

Decentralized data collection and processing have become more powerful and able to do an impressive amount of computing. The case is point in Intel's Introducing the Intel Neural Compute Stick 2 computer vision and deep learning accelerator powered by the Intel Movidius Myriad X VPU, that can stick into a Pi for less than $70.00.

But for truly distributed processing, the Apple A13 chipset in the iPhone 11 has a few features that boggle the mind: From Inside Apple's A13 Bionic system-on-chip Neural Engine, a custom block of silicon separate from the CPU and GPU, focused on accelerating Machine Learning computations. The CPU has a set of "machine learning accelerators" that perform matrix multiplication operations up to six times faster than the CPU alone. It's not clear how exactly this hardware is accessed, but for tasks like machine learning (ML) that use lots of matrix operations, the CPU is a powerhouse. Note that this matrix multiplication hardware is part of the CPU cores and separate from the Neural Engine hardware.

This should beg the question, "Why would a smartphone have neural net and machine learning capabilities, and does that have anything to do with the data transmission problem for the edge?" A few years ago, I thought the idea wasn't feasible, but the capability of distributed devices has accelerated. How far-fetched is this?

Let's roll the clock back thirty years. The finance department of a large diversified organization would prepare in the fall a package of spreadsheets for every part of the organization that had budget authority. The sheets would start with low-level detail, official assumptions, etc. until they all rolled up to a small number of summary sheets that were submitted headquarters. This was a terrible, cumbersome way of doing things, but it does, in a way, presage the concept of federated learning.

Another idea that vanished is Push Technology that shared the same network load as centralizing sensor data, just in the opposite direction. About twenty-five years, when everyone had a networked PC on their desk, the PointCast Network used push technology. Still, it did not perform as well as expected, often believed to be because its traffic burdened corporate networks with excessive bandwidth use, and was banned in many places. If Federated Learning works, those problems have to be addressed

Though this estimate changes every day, there are 3 billion smartphones in the world and 7 billion connected devices.You can almost hear the buzz in the air of all of that data that is always flying around. The canonical image of ML is that all of that data needs to find a home somewhere so that algorithms can crunch through it to yield insights. There are a few problems with this, especially if the data is coming from personal devices, such as smartphones, Fitbit's, even smart homes.

Moving highly personal data across the network raises privacy issues. It is also costly to centralize this data at scale. Storage in the cloud is asymptotically approaching zero in cost, but the transmission costs are not. That includes both local WiFi from the devices (or even cellular) and the long-distance transmission from the local collectors to the central repository. This s all very expensive at this scale.

Suppose, large-scale AI training could be done on each device, bringing the algorithm to the data, rather than vice-versa? It would be possible for each device to contribute to a broader application while not having to send their data over the network. This idea has become respectable enough that it has a name - Federated Learning.

Jumping ahead, there is no controversy that training a network without compromising device performance and user experience, or compressing a model and resorting to a lower accuracy are not alternatives. In Federated Learning: The Future of Distributed Machine Learning:

To train a machine learning model, traditional machine learning adopts a centralized approach that requires the training data to be aggregated on a single machine or in a datacenter. This is practically what giant AI companies such as Google, Facebook, and Amazon have been doing over the years. This centralized training approach, however, is privacy-intrusive, especially for mobile phone usersTo train or obtain a better machine learning model under such a centralized training approach, mobile phone users have to trade their privacy by sending their personal data stored inside phones to the clouds owned by the AI companies.

The federated learning approach decentralizes training across mobile phones dispersed across geography. The presumption is that they collaboratively develop machine learning while keeping their personal data on their phones. For example, building a general-purpose recommendation engine for music listeners. While the personal data and personal information are retained on the phone, I am not at all comfortable that data contained in the result sent to the collector cannot be reverse-engineered - and I havent heard a convincing argument to the contrary.

Here is how it works. A computing group, for example, is a collection of mobile devices that have opted to be part of a large scale AI program. The device is "pushed" a model and executes it locally and learns as the model processes the data. There are some alternatives to this. Homogeneous models imply that every device is working with the same schema of data. Alternatively, there are heterogeneous models where harmonization of the data happens in the cloud.

Here are some questions in my mind.

Here is the fuzzy part: federated learning sends the results of the learning as well as some operational detail such as model parameters and corresponding weights back to the cloud. How does it do that and preserve your privacy and not clog up your network? The answer is that the results are a fraction of the data, and since the data itself is not more than a few Gb, that seems plausible. The results sent to the cloud can be encrypted with, for example, homomorphic encryption (HE). An alternative is to send the data as a tensor, which is not encrypted because it is not understandable by anything but the algorithm. The update is then aggregated with other user updates to improve the shared model. Most importantly, all the training data remains on the user's devices.

In CDO Review, The Future of AI. May Be In Federated Learning:

Federated Learning allows for faster deployment and testing of smarter models, lower latency, and less power consumption, all while ensuring privacy. Also, in addition to providing an update to the shared model, the improved (local) model on your phone can be used immediately, powering experiences personalized by the way you use your phone.

There is a lot more to say about this. The privacy claims are a little hard to believe. When an algorithm is pushed to your phone, it is easy to imagine how this can backfire. Even the tensor representation can create a problem. Indirect reference to real data may be secure, but patterns across an extensive collection can surely emerge.

Read the original:
Federated machine learning is coming - here's the questions we should be asking - Diginomica

Clean data, AI advances, and provider/payer collaboration will be key in 2020 – Healthcare IT News

In 2020, the importance of clean data, advancements in AI and machine learning, and increased cooperation between providers and payers will rise to the fore among important healthcare and health IT trends, predicts Don Woodlock, vice president of HealthShare at InterSystems.

All of these trends are good news for healthcare provider organizations, which are looking to improve the delivery of care, enhance the patient and provider experiences, achieve optimal outcomes, and trim costs.

The importance of clean data will become clear in 2020, Woodlock said.

Data is becoming an increasingly strategic asset for healthcare organizations as they work toward a true value-based care model, he explained. With the power of advanced machine learning models, caregivers can not only prescribe more personalized treatment, but they can even predict and hopefully prevent issues from manifesting.

However, there is no machine learning without clean data meaning the data needs to be aggregated, normalized and deduplicated, he added.

Don Woodlock, InterSystems

Data science teams spend a significant part of their day cleaning and sorting data to make it ready for machine learning algorithms, and as a result, the rate of innovation slows considerably as more time is spent on prep then experimentation, he said. In 2020, healthcare leaders will better see the need for clean data as a strategic asset to help their organization move forward smartly.

This year, AI and machine learning will move from if and when to how and where, Woodlock predicted.

AI certainly is at the top of the hype cycle, but the use in practice currently is very low in healthcare, he noted. This is not such a bad thing as we need to spend time perfecting the technology and finding the areas where it really works. In 2020, I foresee the industry moving toward useful, practical use-cases that work well, demonstrate value, fit into workflows, and are explainable and bias-free.

Well-developed areas like image recognition and conversational user experiences will find their foothold in healthcare along with administrative use-cases in billing, scheduling, staffing and population management where the patient risks are lower, he added.

In 2020, there will be increased collaboration between payers and providers, Woodlock contended.

The healthcare industry needs to be smarter and more inclusive of all players, from patient to health system to payer, in order to truly achieve a high-value health system, he said.

Payers and providers will begin to collaborate more closely in order to redesign healthcare as a platform, not as a series of disconnected events, he concluded. They will begin to align all efforts on a common goal: positive patient and population outcomes. Technology will help accelerate this transformation by enabling seamless and secure data sharing, from the patient to the provider to the payer.

InterSystems will be at booth 3301 at HIMSS20.

Twitter:@SiwickiHealthITEmail the writer:bill.siwicki@himssmedia.comHealthcare IT News is a HIMSS Media publication.

See the original post:
Clean data, AI advances, and provider/payer collaboration will be key in 2020 - Healthcare IT News

An Open Source Alternative to AWS SageMaker – Datanami

(Robert Lucian Crusitu/Shutterstock)

Theres no shortage of resources and tools for developing machine learning algorithms. But when it comes to putting those algorithms into production for inference, outside of AWSs popular SageMaker, theres not a lot to choose from. Now a startup called Cortex Labs is looking to seize the opportunity with an open source tool designed to take the mystery and hassle out of productionalizing machine learning models.

Infrastructure is almost an afterthought in data science today, according to Cortex Labs co-founder and CEO Omer Spillinger. A ton of energy is going into choosing how to attack problems with data why, use machine learning of course! But when it comes to actually deploying those machine learning models into the real world, its relatively quiet.

We realized there are two really different worlds to machine learning engineering, Spillinger says. Theres the theoretical data science side, where people talk about neural networks and hidden layers and back propagation and PyTorch and Tensorflow. And then you have the actual system side of things, which is Kubernetes and Docker and Nvidia and running on GPUs and dealing with S3 and different AWS services.

Both sides of the data science coin are important to building useful systems, Spillinger says, but its the development side that gets most of the glory. AWS has captured a good chunk of the market with SageMaker, which the company launched in 2017 and which has been adopted by tens of thousands of customers. But aside from just a handful of vendors working in the area, such as Algorithmia, the general data-building public has been forced to go it alone when it comes to inference.

A few years removed from UC Berkeleys computer science program and eager to move on from their tech jobs, Spillinger and his co-founders were itching to build something good. So when it came to deciding what to do, Spillinger and his co-founders decided to stick with what they knew, which was working with systems.

(bluebay/Shutterstock.com)

We thought that we could try and tackle everything, he says. We realized were probably never going to be that good at the data science side, but we know a good amount about the infrastructure side, so we can help people who actually know how to build models get them into their stack much faster.

Cortex Labs software begins where the development cycle leaves off. Once a model has been created and trained on the latest data, then Cortex Labs steps in to handle the deployment into customers AWS accounts using its Kubernetes engine (AWS is the only supported cloud at this time; on-prem inference clusters are not supported).

Our starting point is a trained model, Spillinger says. You point us at a model, and we basically convert it into a Web API. We handle all the productionalization challenges around it.

That could be shifting inference workloads from CPUs to GPUs in the AWS cloud, or vice versa. It could be we automatically spinning up more AWS servers under the hood when calls to the ML inference service are high, and spinning down the servers when that demand starts to drop. On top of its build-in AWS cost-optimization capabilities, the Cortex Labs software logs and monitors all activities, which is a requirement in todays security- and regulatory-conscious climate.

Cortex Labs is a tool for scaling real-time inference, Spillinger says. Its all about scaling the infrastructure under the hood.

Cortex Labs delivers a command line interface (CLI) for managing deployments of machine learning models on AWS

We dont help at all with the data science, Spillinger says. We expect our audience to be a lot better than us at understanding the algorithms and understanding how to build interesting models and understanding how they affect and impact their products. But we dont expect them to understand Kubernetes or Docker or Nvidia drivers or any of that. Thats what we view as our job.

The software works with a range of frameworks, including TensorFlow, PyTorch, scikit-learn, and XGBoost. The company is open to supporting more. Theres going to be lots of frameworks that data scientists will use, so we try to support as many of them as we can, Spillinger says.

Cortex Labs software knows how to take advantage of EC2 spot instances, and integrates with AWS services like Elastic Kubernetes Service (EKS), Elastic Container Service (ECS), Lambda, and Fargate. The Kubernetes management alone may be worth the price of admission.

You can think about it as a Kubernetes thats been massaged for the data science use case, Spillinger says. Theres some similarities to Kubernetes in the usage. But its a much higher level of abstraction because were able to make a lot of assumptions about the use case.

Theres a lack of publicly available tools for productionalizing machine learning models, but thats not to say that they dont exist. The tech giants, in particular, have been building their own platforms for doing just this. Airbnb, for instance, has its BigHead offering, while Uber has talked about its system, called Michelangelo.

But the rest of the industry doesnt have these machine learning infrastructure teams, so we decided wed basically try to be that team for everybody else, Spillinger says.

Cortex Labs software is distributed under an open source license and is available for download from its GitHub Web page. Making the software open source is critical, Spillinger says, because of the need for standards in this area. There are proprietary offerings in this arena, but they dont have a chance of becoming the standard, whereas Cortex Labs does.

We think that if its not open source, its going to be a lot more difficult for it to become a standard way of doing things, Spillinger says.

Cortex Labs isnt the only company talking about the need for standards in the machine learning lifecycle. Last month, Cloudera announced its intention to push for standards in machine learning operations, or MLOps. Anaconda, which develops a data science platform, also is backing

Eventually, the Oakland, California-based company plans to develop a managed service offering based on its software, Spillinger says. But for now, the company is eager to get the tool into the hands of as many data scientists and machine learning engineers as it can.

Related Items:

Its Time for MLOps Standards, Cloudera Says

Machine Learning Hits a Scaling Bump

Inference Emerges As Next AI Challenge

More:
An Open Source Alternative to AWS SageMaker - Datanami