The current wave of digital transformation is organizations moving compute tasks and the associated data storage from on-premise hardware to the cloud, in particular Amazon Web Services (AWS), Microsoft Azure and Google Cloud. Advantages of this cloud model include a pay-as-you-go infrastructure model to flexibly scale up and down based on demand, the ability to access the cloud from any connected location, and the future-proofing of being able to use multiple, interconnected cloud services (e.g., machine learning or ML) as compute and storage requirements change.
The next wave of digital transformation is a synthesis of on-premise and cloud workloads where the same compute and storage cloud services are also available on-premise, in particular, at the “edge” near or at the location where the data is generated. This combined model provides the best possible outcome, where workloads can be done at the edge and in the cloud and where the optimal split can be determined and even dynamically changed. As an example, autonomous cars need to process and store data at the edge (i.e., on the car) to make real-time auto-pilot decisions without relying on a network connection. At the same time, sharing that anonymized data with other data sources at a central cloud location enables efficient data archiving, combined ML model training, and other collective analysis.
For this edge-cloud synthesis to work, the cloud services—specifically programming APIs—must be available at the edge. For example, popular AWS services such as S3 for data storage and Lambda for serverless computing need to be at the edge without requiring cloud access. Then, the programmer can use the same APIs or even the same programs to run at the edge and the cloud, and edge and cloud workloads can interoperate.
Some cloud APIs are more useful to have at the edge because of the specific nature of the edge where raw data is generated and real-time decisions are needed. An example is the ability to filter or process raw data as done by AWS S3 SELECT. Because raw data is created at the edge but not usually read back, the data write capability is much more important than the data read capability, and APIs such as AWS Sagemaker Feature Store are useful. Finally, APIs and programs that analyze data for real-time decision-making (e.g., AI and ML functions) are needed. Below, we describe examples of how these analytics APIs are, or could be, used at the edge.
Bringing the cloud to the data
One of the first appliances to support Amazon’s new Greengrass IoT platform was the Amazon Snowball. It allowed teams to securely run cloud applications, serverless code, and process data from the field even when physically disconnected from the Internet. For example, one early use case was an Oregon State University project that used a Snowball to collect hundreds of terabytes of real-time oceanographic data to improve environmental sustainability.
Edge analytics allowed the project participants to analyze this data while at sea using their existing cloud applications, configuration, and analytics models. Later, they could refine their analytic algorithms in the cloud, which were pushed back out to improve future operations and research projects.
PetaGene was an early adopter of S3 Object Lambda to process sizeable genomic data sets in the cloud. Their tools can compress genomic data 11 times smaller without loss using algorithms built on S3 Lambda. Other S3 Lambda code can selectively retrieve data. Refactoring these kinds of apps to run on the edge can make it easier to compress data locally before sending it to the cloud. It can also make it easier to transform compressed data into the appropriate format at the edge.
Detecting problematic device behavior and other anomalies
Improving the security of devices in the field is one of the lesser appreciated aspects of edge analytics. Many early IoT devices have struggled with automating security patches. Early developers never imagined that a permanently connected appliance would be the perfect launching ground for distributed denial of service attacks such as the Mirai Botnet. A new mindset is required as hackers find new ways to break into large and dangerous things such as cars and water treatment systems.
Organizations are starting to adopt anomaly-detection algorithms to identify new threats or faults sooner and with fewer false positives than rule-based approaches. These algorithms analyze data at high fidelity, organize a small subset as a reference point, and throw away the rest. Anomaly-detection algorithms tend to run in the cloud, making it easy to develop and tune them using tools such as SageMaker. However, these same types of algorithms could help the security and resilience of edge devices by extending the algorithms to run on local S3 buckets.
Similarly, anomaly-detection algorithms can be applied to identify fraudulent credit card transactions at the point-of-sale for retail or e-commerce purchases. In this case, a fraud detection model is used at the edge in real-time, while the transaction stream is also sent to the cloud where combined data from multiple sources can be used to create and update the ML model. Then the updated ML model can be pushed out to the edge periodically.
Other use cases for anomaly-detection algorithms include identifying faulty manufacturing components or missed steps in an automated assembly line process and auto-classifying unusual behaviors captured by surveillance cameras.
Optimizing for autonomy
Most automakers have decided that autonomous cars require expensive LiDAR arrays. Tesla has stubbornly bucked this trend by betting that edge analytics built into each vehicle can provide adequate performance using low-cost cameras. They have made sufficient progress by developing an edge analytics infrastructure that efficiently processes most data locally, minimizing the data uploaded to the cloud for new analysis. The jury is still out as to whether Tesla will achieve true self-driving capability with this approach.
Nevertheless, Tesla has optimized this process to operate as a byproduct of normal consumer behavior. In contrast, competitors hire large teams of specialized drivers to work out the bugs for their more expensive equipment. Other industries could apply this same principle to different kinds of equipment and applications by using local analytics to reduce sensor complexity or bandwidth requirements.
New tools for automating data collection, summarization, and analysis, could make this kind of large-scale data collection and local analysis accessible to more companies. For example, companies already using Amazon SageMaker might be able to parse large data sets at the edge, store them on local S3 buckets, and then upload appropriate summaries daily.
Edge computing is still a work in progress. The first wave of digital transformation helped IT-focused companies gain a competitive edge. The next wave will extend existing cloud development practices, applications, and data models to the edge to help accelerate the digital transformation for other industries.
About the Author
Gary Ogasawara is Cloudian’s Chief Technology Officer, responsible for setting the company’s long-term technology vision and direction. Before assuming this role, he was Cloudian’s founding engineering leader. Prior to Cloudian, Gary led the Engineering team at eCentives, a search engine company. He also led the development of real-time commerce and advertising systems at Inktomi, an Internet infrastructure company. Gary holds a Ph.D. in Computer Science from the University of California at Berkeley, specializing in uncertainty reasoning and machine learning.
Sign up for the free insideAI News newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1
Speak Your Mind