Teach Egnyte’s AI to Recognize Your Documents for Better Security

Egnyte has been developing and using AI technology with Machine Learning (ML) for quite some time. We use it internally to detect sensitive information for our customers so that policies can be put in place to protect that information, and we continue to find new ways to implement these models to better support our customers.

The Egnyte AI engine does this in several ways. First, the AI has been trained to detect sensitive information, such as phone numbers or addresses in documents, spreadsheets, and even images. However, this capability goes beyond simple pattern matching; it uses context clues in the surrounding text to detect sensitive information—similar to the way a human would make the same determination. For example, it can decide whether a string of nine digits is a harmless figure or a sensitive Social Security number that must be protected. 

Egnyte can detect (and protect) hundreds of different types of personal information in this way. The platform’s AI can also:

  • Support a customer’s custom keyword lists if needed.
  • Detect unusual behaviors such as an unusual number of file accesses or logins. The system can then flag potentially malicious or dangerous activities for IT, helping to better secure your environment.
  • Detect document types. For example, using advanced heuristics, it can detect common formats for invoices, contracts, and personal resumes, all of which should be protected. 

Now, we’re excited to share that the last capability on that list—document type detection—has now been extended to allow customers to train the AI on their own document formats.

How It Helps 

Suppose your organization routinely generates a document type that is unique to the company. That data needs protection from unauthorized access, but, in the past, it wouldn’t be caught by Egnyte automatically because it doesn’t follow a standard format used across most businesses. 

Such a document might be a routine progress report from the Development team, a product design spec, machine settings in a factory, or other information. It may also be a standardized format that contains personal information like personnel reviews or complaints filed with Human Resources. Or, it may even be a routine order with a supply chain partner or client. 

In these cases, these documents don’t necessarily follow a template that’s common across businesses, but they do typically follow a standardized format established by your organization. With this latest update to the platform, the Egnyte AI can now be trained in minutes to recognize and protect your unique document formats, too. 

How It Works

To illustrate how easy this is, let’s take a quick look at how you can train Egnyte’s AI to recognize your document types. From the Egnyte Secure & Govern screen, go to Settings and select Content Classification. A new menu option appears called “Custom Document Classes,” at which point you would click Create New Class

A screenshot of the Egnyte UI.

From there, fill out the name and description for the new class. Then, simply upload  some example documents to Egnyte by dragging and dropping them into the window. These will be the training documents Egnyte’s Al will analyze.

A screenshot of the Egnyte UI.

We recommend you provide 20 documents to fine-tune the analysis. However, if documents are very similar, you may be able to get by with far fewer—perhaps as little as five to 10. The system will go through and analyze the documents and find similarities in format, structure, terms, etc. It will then build a model so it can recognize similar documents in the future.

A screenshot of the Egnyte UI.

 The status indicator shows that the file has been uploaded and included in the sample group by the AI for analysis. Upon upload, analysis of the files only takes a few minutes (as opposed to days with other systems).

A screenshot of the Egnyte UI.

If some documents don’t seem to align with the new model, you’ll be given the opportunity to remove them. This is important because files that don’t match could negatively impact the accuracy of the training set.  

The similarity score gives you the opportunity to see how close the documents match.  In most cases, documents may be 95% similar, which provides a very high confidence rate of detecting matches. However, in other cases such as shown above, you may see lower scores. This indicates more differences between individual documents. You can add more documents to the sample and/or remove lower scoring documents to improve the confidence in the results.

After the Document Type has been generated, the documents can be found by Egnyte, and you can create a policy to manage those documents. For example, Egnyte provides a tool to create Content Safeguard policies to prevent unauthorized sharing of documents or Content Lifecycle Policies to automatically perform Retention, Archival, and Deletion on the documents. 

Here, a policy is being created to find and protect cybersecurity artifacts:

A screenshot of the Egnyte UI.

From there, you click on Configure in the “Document Type” panel. The new custom document class will appear as an option. Once you select it and hit Save, it will be added to, and managed by this policy. 

A screenshot of the Egnyte UI.

And that’s it. In a matter of minutes, you’ve improved your organizations’ ability to track and secure sensitive information.

How to Get Started

Teaching Egnyte AI’s to recognize your document types is quick, simple, and very useful in managing your sensitive information. You can customize policies and rules that help protect documents by preventing unauthorized sharing or disclosure. 

You can receive alerts when these sensitive documents are misused. It can even be used to manage the lifecycle of these documents with automated Retention, Archiving, and Deletion (RAD) policies. And most importantly, it does all this without interfering with your users’ workflows or burdening your IT team with additional tasks.

Contact your Egnyte representative for more information on how you can get this capability for your organization.

Get started with Egnyte today

Explore our unified solution for file sharing, collaboration and data governance.

Tour the Egnyte Platform

See why Egnyte is rated a Leader by customers on G2 across multiple categories: Content Collaboration, Data-Centric Security, Data Governance, and more.

Empowering Your Defense: Synergy between Data Loss Prevention Controls and Automated Alert Remediation
May 6, 2024
Ram Boreda
Read Article
Navigating the Multi-Layered Landscape of Data Governance in Life Sciences
April 11, 2024
Cat Hall
Read Article
David Buster

Senior Manager, Security and Governance; Product Marketing

View All Posts
Don’t miss an update

Subscribe today to our newsletter to get all the updates right in your inbox.

By submitting this form, you are acknowledging that you have read and understand Egnyte's Privacy Policy

Thank you for your subscription!

Welcome to
Egnyte Blog

Company News
Product Updates
Life at Egnyte
Industry Insights
Use Cases