Share This Article:Share on FacebookTweet about this on TwitterShare on LinkedIn

Egnyte Protect relies heavily on Machine Learning technology to defend you against insider attacks. Powered by the Google Document Understanding AI which was just announced at Google Next 2019 – our advanced insider attack detection identifies access anomalies that could potentially be a data leakage.

Insider attacks are common

Insider attacks are one of the most serious risks in our connected world. In 2018, 53% of companies surveyed had confirmed insider attacks against their organization in the previous 12 months.

Data theft

A serious type of insider attacks is data theft, wherein an employee downloads a substantial amount of data.  A typical case is that of a disgruntled or laid-off employee stealing data for their next employer.

Egnyte Protect addresses this problem by alerting administrators when a user accesses an unusually large number of files. We will alert you if your employee, John Smith, suddenly downloads 1,000 files today while he usually accesses only around 25 files per day.

Not all data is created equal

Though user activity is useful, the consequences of data theft can vary depending on the type of documents that are being stolen. For a startup, patent application drafts may be of extreme importance, while RFIs could be crucial for consulting companies. We need a way to identify the business document type (not just the mime type) of a file.

Automatic document type detection with machine learning

With the exponential rise of the number of files in corporate file repositories, it is unrealistic to assume that document types will be manually labeled by humans. This problem is typically handled by organizations through the use of corporate document templates, though such internal guidelines can be easily side-stepped. Therefore there is a need for an automated way to identify the document type of a file without any human involvement.

With the help of Google’s Document Understanding AI, we can automatically classify documents by type, such as contracts, NDAs, employee lists or patent applications. By providing enough examples of such types of documents, classifiers built with Google AutoML Natural Language can be used to identify document types in large file repositories.

Business insights about data leaks

With this capability, Egnyte Protect’s anomaly models can also incorporate the business document types being accessed in file download events. We can not only inform administrators when an abnormally high number of files has been downloaded, but we can also provide information about the types of downloaded files: how many contracts, how many NDAs, how many employee lists, etc.

Detecting espionage

Another serious type of insider attacks is espionage, i.e. accessing confidential data that is not relevant to a user’s role for the benefit of an external actor.

In the case of espionage attack detection, the number of downloaded documents is irrelevant. What matters is the ability to detect that a user is accessing different types of files than what they usually access.

Profiling users’ download activity using machine learning

Based on the classifiers built with Google AutoML Natural Language, we can build a personalized profile of the type of files a given user usually accesses. For instance, John might access about 4 invoices per week, 10 NDA and no patent applications.

We can use these classifiers to verify that a user’s file access profile is stable. If the profile is not stable, it means that the user is downloading unusual files relative to their profile, possibly to access confidential data not relevant to their role).

Following on our former example, let assume that John suddenly accesses 152 NDAs and 110 patent applications! Egnyte Protect would then inform the administrator about the shift in access profile as a potential espionage attack. And if John is indeed accessing patent applications for espionage, Egnyte Protect will alert you in a timely manner and help secure your organization.

If you are interested in Content Collaboration. Data Protection. Infrastructure Modernization.

Please start a Free Trial now.