Moving Data is Expensive

Two hundred years ago it would take months to transport a letter between or across continents

Then came the telephone, and one could place a phone call and communicate the information in real- time. Slowly, the connectedness of the world placed heavy demands on data access. Achieving data access using traditional approaches, such as moving or copying data to where compute is, has associated costs, complexity, and time delay consequences. Organizations attempt to address the symptom of the problem “how to move more data faster over distance,” but they should be asking “how to achieve real-time data access in support of applications without requiring data to be moved or copied.”

There are three main sources of costs when moving or copying data across distance: cost of underutilized network, the cost of productivity loss due to the requirement of moving or copying data, and the cost of storage used to store duplicates of data in multiple locations.

Underutilized Network

It is common to deploy a 1 Gigabit or 10 Gigabit network connection between two sites. The longer the distance between the two sites, the more latency should be expected. Typically, an application using storage located in the same datacenter may expect latency in the single milliseconds, but when reaching over distance, latency can run in the tens and hundreds of milliseconds. The greater the latency, the more throughput rates get affected. This is due to the use of TCP/IP over distance. TCP was designed for accuracy and resiliency at the cost of timeliness. As a result, the resiliency and accuracy of TCP create network inefficiencies over distance.

Example: it is not unusual to see only ~6 MBps throughput on a Gigabit connection, that is only 10% of available bandwidth. If the requirement is to move 10 TB across 3,000 miles, even with latency less than 100 milliseconds, it would take 9.6 days. The other way to think about it is if a gigabit Ethernet connection costs you $1,000 per month and you use less than 10% of available bandwidth, instead of the price per gigabyte moved being $0.00309 per gigabyte ($3.09/TB), the cost is $0.032 per gigabyte ($32/TB).

Loss in Productivity

Data created at the edge must be accessed and processed by the applications in the datacenter. The necessity to move data to the application incurs a productivity penalty. Take media and entertainment: editors, colorists, and special effects artists in multiple locations may sit idle waiting for data to become accessible. A 30 minute delay across 200 animators may result in ~$400K unintended cost. Data may have to be moved multiple times, each time incurring the productivity penalty.

Proliferation of Data Duplicates

Every time data is moved or copied, storage resources must be made available to store it. Whether it is persistent storage or a caching device, disk drives are deployed to catch data being sent. Moving 10TB requires 10TB of storage to be available in every location requiring data access. The cost of storage varies from $120/TB/yr for archiving tier to $720/TB/yr for high-performance tier. Every copy created incurs an added storage cost. These estimates are marginally accurate; procuring small amounts of storage may be even more costly since economies of scale kick in at over 40TB.

Example: In legal eDiscovery, sensitive information must be reviewed and analyzed, often requiring large data sets to be moved. To ensure against unintended exposure, service providers must insure the transfer at a high cost. Data is often moved by loading it on to a storage system and shipping it to the service provider. At the end of the eDiscovery review, data must be completely destroyed. If data can’t be moved at all, additional costs, complexity, and delays are incurred in deploying compute near the data.

It is not enough to calculate the cost of acquisition, power and cooling, floor space, and administrative costs; determining the total cost of ownership requires the consideration of cost of opportunity. In financial institutions, it is standard to calculate the opportunity cost of an investment, but in the enterprise, it is too complex to determine the costs associated with an investment or lack of investment. It is clear that data mobility is expensive, not only in dollars spent per terabyte moved, but also hidden costs such as cost of lost productivity, loss of revenue, diminished addressable market, and exposure to security risks.

So what are then the consequences if one organization chooses to rely solely on mail versus another leveraging telephony?


About the Author

Noemi Greyzdorf is the VP of marketing at Vcinity. Her previous roles as director of product marketing at Quantum and VP, strategy and alliances at Cambridge Computer System gave her access to innovative companies and a rich network across the technology industry. As research manager, Storage, at IDC, Noemi’s research focused on the emerging storage architectures.

Featured image: ©Max_776