Everything you need to know about managing data in the cloud

Even the biggest and most security-obsessed corporates (aka the banks) are adopting a cloud-first policy for managing and storing data. We examine how we got to where we are today and consider what an enterprise needs to migrate its data to the cloud today.

At the turn of the century, ’cloud’ was not part of the nomenclature. The few vendors in the space were known as managed service providers (MSP) offering hosted solutions. Corporates found it extremely hard to trust anyone other than themselves with a copy of their data. But the lure of switching to a cloud provider became irresistible for smaller companies, which lacked the resources to really manage their own servers and infrastructure efficiently.

The sales pitch for storing data in the cloud was simple: “You don’t put your money under your mattress, you put it in the bank. Why manage your data in your own server? Give it to a specialist archive cloud provider.”

Global Relay has now been helping companies access the cloud for 22 years, since the days when enterprise archive services were the norm. The early adopters were often unsophisticated firms that recognised they had the choice of using these simpler services. Archive was an ideal candidate for inclusion in the cloud. It is self-contained and, despite being challenged by complications from the multitude of data types and sources that require connectors to achieve a clean and complete migration, is a much easier and less expensive approach than continued on-premise management.

Insufficient controls

In the early 2000s, there were not as many data types to be captured and archived as there are now. Finance firms started out storing their own emails and Bloomberg messages. Communications from public providers such as Yahoo, Hotmail, and AOL followed but these were problematic as they were controlled by the end user. An individual could set up archiving for Bank of America email at Yahoo, then move to Citi and still have access to their archive from Bank of America. There were not sufficient controls to give financially regulated firms confidence in archiving public messaging.

As message types proliferated in finance, they complicated the task for in-house solutions to layer these into the typical enterprise vault. This trend has only continued. Email led to social media, then mobile phone records and beyond.

Companies managing this themselves would eventually be unable to cope with all these different feeds to archive, manage, and supervise. This complexity resulted in data gaps and outages being uncovered when improperly produced data got passed to regulators and the courts during litigation. This was not as worrying to the IT team as it was to the legal, compliance, and risk professionals who made the determination to outsource. There was no more tolerance for data gaps and badly reconciled data or methods.

The maturity of reconciliation and the ability for sophisticated vendors to do this was a big driver behind outsourcing. Reconciliation requires cooperation between vendors who provide the messaging and archive services, and the customers who consume the messaging. Specialist archive vendors are used to working with messaging platforms to ensure that the data matches, and can be recovered if it does fall through a gap.

In the beginning, the biggest impediment to customer acquisition was the trust needed to hand over data to a new provider.

Cooperation can provide the reconciliation component which is the number one requirement to meet the stricter regulatory imperatives in certain industries where books and records need to be presented to official regulators within specified retention periods. The capture and reconciliation of all your data is a fundamental regulatory requirement. Quality, accuracy, and completeness is impossible without a strong reconciliation and controls methodology.

Most companies just reach a stage where the complexity and risk involved in trying to do this internally outweighs any cost and control considerations. Once this point is reached, the key questions are always: “Can we migrate everything to you? How will you deal with all the connectors?”

A lot of archive companies started to make data connectors. As this specialized vendor sector consolidated, the businesses that thrived were those who invested in both archive and connector technology.

Migration of data from a traditional on-premise approach to the cloud enables a shift from incurring capital expenditure to incurring operational expenditure. There is no more lumpy hardware investment, no amortization policies.

Labor costs

The debate on whether cloud is more expensive continues and there are acceptable arguments on both sides. It is important to consider the labor costs attached to maintaining your own infrastructure – this usually tips the balance in favor of cloud. If you then factor in intangible but potentially serious risk and security issues, the decision becomes simpler. Established cloud companies have mature security strategies encased behind publicly referenced controls under standards such as ISO 2001.

A monthly recurring subscription to a cloud service is no different to a phone service or a cable bill. Subscription businesses are appealing as they are more recession-proof than traditional business models that are related to payment at consumption. Telcos rarely go out of business. A subscription service to software is no different.

In the beginning, the biggest impediment to customer acquisition was the trust needed to hand over data to a new provider. No customer was comfortable in being the first to hand over its data. But once that hurdle was crossed with the help of references and getting assurance by meeting management, it became a question of the data’s ongoing security and access to it.  

Encryption for security

The next significant challenge was encryption. Two factor authentication ensures that two layers of security are in place to restrict access to data and this beckoned the introduction of RSA keys. RSA is part of a public key cryptosystem where a private key and a public key are required to generate digital signatures (private) and verify digital signatures (public). It was the very beginning of internal controls.

Control evolved in parallel with an assortment of new certifications related to data management and verification such as SOC 1, SOC 2, and then ISO 27001. These control documents became part of the fabric for hosted vendors. They derived the security frameworks and control to ensure vendors were acting based on best practice. It was the best source of independent evaluation through exacting audit.

Eventually best practice on security standards related to encryption demanded that vendors could no longer have access to the keys. This evolved into the hardware security module (HSM) underpinning the cloud today. This HSM manages a key that can never leave it – the key only exists while it remains in the HSM. Without physical access to the HSM, the data can never be decrypted. Providers now offer HSM tech and key management services that assure customers that their data cannot be breached.

The tug of war between using public and private cloud continues. There has been a noted concentration of service from Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). Canalys estimates that the three biggest providers now have 58% of the market share in cloud infrastructure (AWS 32; Azure 19; GCP 7).

Some market commentators are beginning to raise concerns about the oligopoly power that is in the hands of the Big Three. Cost control is the chief concern. As an example, Apple is planned to shell out $300m on GCP in 2021, which is a 50%  jump on its 2020 spend. It already uses AWS to help handle its iCloud data storage app.

Analytics and AI

Original archive vendors had to work out initially how to charge for their service and some chose to measure storage ($xxx per gigabyte). This approach was not that popular as the ultimate cost was unpredictable. The vendors moved to user-based service models and charged $xx monthly per user.

Charging models have gone full circle and there has been a return to the old methods, driven by the arrival of AWS and Azure which now charge for both storage and CPU, taking a bite of both ends!

The rise in popularity of analytics and artificial intelligence (AI) has raised the demand for processing power and larger metadata stores beyond the core data. When comparing business models and cloud maturity, public cloud is getting very expensive; the market wants AI and real-time analytics that are extremely CPU-intensive. A private cloud provider like Global Relay can offer the sort of CPU processing power that customers need at 25% of the cost of a public cloud provider. The business has the scale required to do this, rather than resorting to the public cloud route that many of its competitors are taking. This offers a huge commercial advantage and cost savings that can be passed on to its customers. This lower service cost is complemented by mature and more sophisticated technology for customer benefit.

Veterans of the digital age still remember dial-up, though not fondly! Internet connectivity has evolved fast and gone are the days when archive data providers were sending hard drives by courier all over the planet in an encrypted case. There was no other option to transfer encrypted data over the internet on any scale.

Fiber on telco networks became commonplace 10 years ago and bandwidth started to increase considerably. Cloud migration has been facilitated today by the incredible capacity to pass data over the internet. Global Relay’s own connectivity evolution has been powered by its data center that boasts 12 10-gig connections. So many layers of development have combined with mobile chip technologies and mobile bandwidth to offer advances in technology and consumer solutions that were unthinkable in the previous decade.

Data protection across borders

The expectation is the ability to access any and every piece of corporate communication in a second through a handheld device. An API mobile search driven query should provide an instant result. The smartest person in the room could be the person with the fastest access to the most data, which is the best organized and analyzed by the most refined AI models. There is a lot at stake when competitive advantage is the priority.

But the current state has a lengthy history that has taken time to evolve and to become normalized, secure, and reliable.  

The past reveals corporate indifference to data sovereignty and ownership. Legal recognition and the requirement for books and records was not a concept until the turn of the century. Regulation around data privacy has changed dramatically in this decade as consumers and lawmakers have recognized the value of data and the need to protect it. In tandem, a number of highly regulated sectors such as finance have created prescriptive rules that state which communications are business correspondence and must be retained to meet recordkeeping requirements.

But the road towards unified data protection has been a bumpy one and there are still some notable variances across national borders (EU to US is the best example) that have created huge conflicts with the proposition of accessible public cloud globally.  

Most private cloud companies exist for their customers, who own their data and to which providers have no access. But not all data houses have the same philosophy or business model. Some scrape it, analyze it, access it, and even resell it. Privacy can be jeopardized. Navigating this terrain as a cloud company with all the new rules on privacy, and in each country, has become very complicated. Cloud vendors need to have an in-depth understanding of all the countries under which they deliver services to cover all the legal, regulatory, and privacy angles.