Under the Hood of Egnyte’s AI Engine

•

November 1, 2023

Egnyte embeds AI into a variety of content-related workflows – from file administration to document search and discovery, and even data compliance. These have been purposely designed to be simple to use while hiding the complex and intelligent layers underneath that enable them. To power these capabilities, Egnyte has developed an architecture that must:

Maintain an active ecosystem of integrations.
Perform at scale for thousands of customers, some of which have thousands of users.
Allow continuous technology evolution and innovation without disruption.
Maintain privacy and security of customers.

In this article, we’ll share some of the components of the underlying architecture that make this possible.

Ingest Layer: Data Sources and Integrations

As part of the ingest layer, Egnyte connects to a variety of data sources. These include Egnyte’s own cloud object storage repository as well as Microsoft offerings such as SharePoint, OneDrive, and Exchange Online. Google repositories such as Google Docs and Gmail are also connected, as well as other repositories such as Box and Dropbox, and bulk storage like Microsoft Azure, Google Cloud, and Amazon S3.

In addition to connecting to these “formal” repositories, Egnyte also integrates with Microsoft 365 tools, Google Workspace, Slack, and dozens of other “secondary” repositories where documents are stored in the course of everyday business. This ecosystem forms the foundation of the Egnyte platform architecture.

Overall, thousands of Petabytes are accessible in storage and hundreds of Terabytes are streamed each day from meshed data centers around the world.

Foundational AI Model Layer

In the next layer, Egnyte has assembled a collection of various best-of-breed AI models to perform specific tasks related to all that data. Each AI model has been carefully chosen to optimize accuracy and performance for individual tasks required by a service above it.

Not only is each AI model chosen specifically for each task, but the individual models continue to change over time as more effective technologies emerge. To support that, the Egnyte architecture has been designed to isolate AI models and provide abstraction to avoid disrupting the services above and when changes are needed. This allows our users to continue using our AI services seamlessly, even as we make constant improvements. Here are just a few examples of the various types of AI models currently in use by Egnyte:

Text classification is based on Transformer technology coupled with Deep Learning models, which allows vast amounts of data to be analyzed, greatly improving both accuracy and performance.
Object detection is provided by some standard object classification models, while general image classification is provided by other models. These have been replaced over time. If text in an image is detected, Egnyte uses AI models that provide Optical Character Recognition (OCR) to support, for example, reading a driver's license number from a photo of a license.
Document classification has been based on a Support Vector Machine (SVM) model for years. It’s proven so valuable that Egnyte has now introduced the capability for customers to train it on customized documents themselves.
Multivariate Anomaly Detection (MAD) is enabled by still other AI models. This enables Egnyte to issue alerts that combine information from different sources to detect unusual user behavior, such as impossible travel logins (from geographically distant locations within a short period) or unusual file access.
Audio/Video transcription is being enhanced using Large Language Models to augment Natural Language Processing. This new approach overcomes many of the accuracy problems of previous transcription technologies that are used in the industry.
Document Summarization and Queries are being supported by combinations of GPT and PaLM, with others being evaluated in parallel.

AI model integration is a complicated process. Each model goes through a rigorous evaluation and testing process before integration into the architecture, followed by more testing and monitoring after deployment. Egnyte requires not only performance-at-scale but a high level of accuracy and privacy of customer data. Further, each model is continuously reviewed based on its fit for purpose as newer AI models become available. Additional specifics can be provided to customers under Non Disclosure Agreement (NDA), if necessary.‍

Customized Egnyte AI Models Add Customer Value

Beyond general-purpose AI models, Egnyte integrates proprietary data models to train the AI models within each layer. Models learn from working with millions of documents that are generated by many thousands of customers across dozens of vertical industries. Based on more than 10 years of experience, this is where Egnyte creates a unique value for customers. Many of the techniques here are proprietary, but some examples include:

Customized Classes and Entities - Developed by Egnyte, it's based on experience with millions of documents and files across dozens of industries.
Fine Tuned Models - Developed to further adapt the general AI Models in the layer below to specific tasks that are required by our services.
Parameter Efficient Fine-Tuning - Used to fine-tune a small number of (extra) model parameters while freezing most parameters of the pre-trained Large Language Models (LLMs). Based on experience, this reduces processing footprint and improves performance.
Customized Models - Have been introduced and trained over a number of years for particular vertical industries. For example, building specification documents in the Architecture, Engineering, and Construction (AEC) industry or safety documents in the Life Sciences industry can be detected more accurately.

Taken together, these “Egnyte-specific” enhancements provide a better user experience and make our systems faster by optimizing compute resources in our cloud. Verdicts become more accurate with better false positive and false negative rejection capabilities and scoring.

Most importantly, these capabilities make our customers' data more secure by detecting sensitive information to be protected and then protecting it with intelligent safeguards.

Egnyte also uses proprietary software to constantly tune the underlying models and provide a “sanity check” on outputs. Where 100% accuracy is impossible, such as with alert detection, Egnyte provides tunable scoring and notification options so that customers can customize the sensitivity of the reports and alerts.

User Privacy When Working with AI

Egnyte takes the responsibility to protect the data of our customers very seriously. Our architecture has been assembled and integrated to prevent leakage of customer data outside defined security boundaries. Even as data is passed to a service, like Azure OpenAI, it is confined to staying within a security boundary and encrypted in flight. More importantly, the security boundary for that user is temporarily extended to the service only when needed and then retracted after the transaction occurs. All user data is purged from the AI service between transactions and is never exposed to other users or Egnyte’s employees or, to the AI service provider.

Conclusion

Over the years, Egnyte has developed a sophisticated architecture that’s required to support continuing innovation. However, Egnyte doesn’t pursue advanced technologies just for the sake of staying current with the industry. Instead, each step has been focused on making the experience of our customers better. Accurate document classification helps our customers to locate and protect sensitive information in their environments. Optical Character Recognition helps them to locate sensitive information in photos and images for additional protection. These technologies, in turn, enable our customers to maintain compliance with privacy regulations and protect intellectual property. Meanwhile, technologies like Multivariate Anomaly Detection help to correlate widespread events and turn them into actionable alerts. This helps to prevent compromises of users’ accounts and company information. Taken together, our customers benefit from reduced administrative workload, simplified user workflows, and better document management while reducing overall costs. If you are interested in how Egnyte can help your organization become more effective and efficient, contact your Egnyte representative for more information.

‍

Get started with Egnyte today

Explore our unified solution for file sharing, collaboration and data governance.

Start Free Trial Request Demo

Why Intelligent Content is the Key to Unlock the Potential of Data

July 24, 2024

Kris Lahiri

Read Article

Part 2: How Egnyte Built its Turnkey Retrieval Augmented Generation Solution

July 18, 2024

Andriy Zaretskyy

Read Article

Author

Amrit Jassal

View All Posts

Welcome to
Egnyte Blog

Welcome to
Egnyte Blog

Welcome to
Egnyte Blog

Welcome to
Egnyte Blog

Under the Hood of Egnyte’s AI Engine