AI at Egnyte: The First Ten Years

July 17, 2023

In the 1960s, Theodore Levitt published his now famous treatise in the Harvard Business Review in which he warned CEOs of being “product oriented instead of customer oriented.”

Among the many examples cited was the buggy whip industry. As Levitt wrote, “had the industry defined itself as being in the transportation business rather than in the buggy whip business, it might have survived. It would have done what survival always entails — that is, change.”

I sometimes cite this article when people ask me about how we arrived at the non-conventional set of solutions we offer today. Egnyte, after all, had its beginnings as a means for migrating on-premises file shares to the cloud, whereas, today, our solutions go way beyond what a legacy file share was ever designed to do. In fact, it’s a difference in kind, not degree.

For example, beyond just helping people share and secure files, our platform enables IT and security teams to gain visibility into the vast amounts of data buried within their company content. To date, these AI-powered solutions have helped tens of thousands of companies transform everything right from their data security posture management and critical business processes, to their IT cost structures.

We got here by consistently refusing to define our scope narrowly unlike those buggy whip manufacturers in Levitt’s article. Instead, we took the uncomfortable path of focusing on wider customer jobs-to-done (and adapting to technological shifts surrounding those jobs).

The first big inflection began around 2013 when we started to notice firsthand the exponential growth of unstructured data. This presented both a challenge and an opportunity.

The challenge was that more digital data meant greater sprawl, data security and compliance risk, and increased complexity for knowledge workers trying to locate information in the course of getting their jobs done.

While the opportunity was that more digital data meant more machine learning training sets were available. Coupled with significant improvements in algorithms and compute power, this paved an opportunity to unlock vast troves of knowledge previously buried in offline (and, for that matter, online) file formats.

The First Wave of AI at Egnyte: 2013-2016

Late in 2013, while continuing to help customers move their file server workloads to our cloud, Egnyte set about launching an AI-powered classification service, which would be capable of gleaning insights into unstructured data stored on any cloud.

At the center of the new offering was an intelligence engine enabling customers to identify and classify sensitive data such as PII, PCI, PHI, and IP, tag that data, run alerts, and apply policies (such as retention, archival, or access rights management policies) based on the metadata tags.

We began receiving recognition for this innovation in early 2016 when Gartner listed Egnyte as a representative File Analysis provider. Note: None of the “buggy whip” providers were mentioned in this report or the ones that were released in the subsequent years.

Our assumption was that customers who found sensitive data sprawled across their other file systems would, naturally, want to move that data to a safe and well-lit environment i.e. in Egnyte’s cloud. In practice though, this was only partly true (as in the case when a statutory, regulatory, or contractual obligation required them by law to do so). However, even if they didn’t move that data over to Egnyte’s repository, we found that customers always found important information they didn’t know existed, often mislabeled and misplaced, buried in their own data. So, we knew we were on to something.

The Second Wave of AI at Egnyte: 2016-2019

The first-generation intelligence engine was built on a combination of Machine Learning and Natural Language Processing which we have improved upon ever since through constant fine-tuning as newer modeling paradigms were introduced.

Where the first wave solution focused on automating data classification, the second wave targeted a more expansive set of use cases, including:

● Preventing exposure of sensitive information across file repositories.

● Analyzing and taking action on redundant, obsolete, or trivial information.

● Speeding up the delivery of the right files in the office, from home, or on the go.

In addition to improving upon the original ML and NLP models as noted above, it was during this time that we started to employ ‘Deep Learning’ techniques to predict the expected behavior of a user. These models have enabled Egnyte to flag unusual activity that, for instance, might be indicative of an employee preparing to leave the company for a competitor with valuable information in tow, or a piece of malware preparing to wreak its own form of havoc.

Also around this time, the GDPR was coming into effect. So, Egnyte’s AI started becoming a vital tool for customers to find documents containing PII for a specific data subject for the purpose of complying with disclosure and deletion requests. Relatively speaking, this information was easy to locate in structured databases, but like finding a “needle in a haystack” when it came to locating personal identifiers within files (something our AI was able to do). Sometimes, the PII was even stored in image files (as in the case of a driver’s license) which required an even more advanced set of techniques (Machine Vision) in addition to the usual text-based classification patterns.

Last but not least, Egnyte introduced several AI-powered “invisible apps” that operated in the background, without requiring any user intervention. One such app was the content recommendation generated within and around search queries and another is the “Smart Cache” app (also employing Deep Learning techniques) that preloads large files locally based on anticipated user-need at a given time and place.

The Third Wave of AI at Egnyte: 2019-2022

Beginning around 2019, we started to introduce our third wave of AI applications. These included advanced classification to detect document types. For example, Egnyte can now recognize whether a particular PDF is a resume versus a contract versus a sales proposal. These classifiers are used both to map documents to labels and to apply downstream policies, such as data retention and minimization policies.

During this time, we also began using BERT to generate embeddings (representations) of documents provided by our customers to train their own custom models, on their own data, for their own particular purposes.

Egnyte’s ability to recognize document types (in general) as well as custom types (unique to a specific customer) has improved the efficacy of all of our models, much in the same way that it’s easier to recognize a long-lost classmate at a college reunion than on a random subway in Tokyo. Context matters for AI, just as it does for humans.

Speaking of humans, a key part of our strategy is that we’ve constantly fine-tuned our models over the years using actual human feedback. In cases of inconclusive detection, Egnyte will sometimes ask a user for confirmation of a verdict, and then use that input to further improve the model. Additionally, in recent years, we have started employing subject-matter-experts in each of our main industry verticals, and part of their job is to help educate our models on domain-specific terminology.‍

Looking Ahead: From the Fourth…to Nth Wave

Taking a page from Levitt’s playbook, Egnyte has always prided itself on being more customer-oriented than product (category) oriented. To this day, we spend a much higher percentage of revenues than our peers on R&D, which is driven disproportionately by customer feedback.

This has sometimes resulted in us being miscategorized and misunderstood (it probably would have been much easier, in the 2010s, had we stuck to being a “file server in the cloud”), but it’s the only way we know how to operate - following the customer, not the “category”.

It’s the reason why our solutions go way beyond what a legacy file share was ever capable of, and also why, while others are just beginning their AI journeys, we’re entering our second decade of AI-fueled product development.

Watch out for many more AI-related announcements in the weeks and months ahead. While we still don’t know exactly where our customers will take us in the future, I am now more excited and confident than ever in our direction.

‍