For enterprises dabbling with the Hadoop data analytics framework, on-premise is usually the route they choose. New research suggests it may be time to start thinking about Hadoop in the cloud.
The market for so-called “Hadoop as a Service” is projected to grow to $16.1 billion by 2020, according to a report from Allied Market Research. While the exact size of the market is hard to gauge, growing acceptance of cloud computing among CIOs signals that the growth trajectory is headed in a positive direction.
Hadoop is an open-source framework for processing and storing vast amounts of business data. It breaks data into manageable chunks that programmers can then structure, move into a relational database and study or visualize.
At this point, customers who run Hadoop in the cloud are more or less confined to cloud- or Web-first companies, social media companies, gaming firms and other organizations with fluctuating data-processing needs. “The technology is yet to enter into the mainstream commercial market,” the Allied report said.
Research around Hadoop as a Service “is very early,” Gartner Inc. analyst Nick Heudecker told CIO Journal. Gartner doesn’t do primary research on the space, but he said the sector is drawing interest because it makes data processing easier and potentially less expensive. Outsourcing Hadoop to a cloud provider could help alleviate the challenge of building and managing Hadoop clusters on-site, or integrating Hadoop with existing enterprise technology. A cloud solution could also mean a company needs fewer Hadoop-trained employees.
The exact definition of Hadoop as a Service is still being ironed out, and likely will be debated for a while. Allied defines it as “a method of using Hadoop technology without physical installation of the infrastructure on premises.”
That includes what the research firm calls “run it yourself” offerings like Amazon Web Services Elastic MapReduce, which lets companies quickly move data into the cloud for processing without having to download the Hadoop software. In this case, Amazon’s cloud runs the analysis, but companies are responsible for managing jobs and other activities associated with Hadoop’s operations, Allied said. Mr. Heudecker called firms like Amazon EMR, “Hadoop running on infrastructure as a service.”
Other companies in the HaaS space, such as Altiscale of Qubole, are what Allied calls “pure play” providers. They run fully managed services for Hadoop in the cloud and require less systems administration from customers, the Allied report said. Services include getting Hadoop up and running, updating software on a regular basis and expanding capacity when needed.
Altiscale has about 16 customers, and the average customer has about 90 terabytes of data stored there, founder and CEO Raymie Stata told CIO Journal. Most of those customers are more digital media or Web-based companies, what he calls “Hadoop veterans.” But he is starting to get some inquiries from larger enterprises, some of whom are looking into Hadoop for the first time. “The off-the-shelf nature of what we do seems to be what’s compelling to them.”
Hadoop in the cloud could essentially eliminate the time and resources required to complete a complex Hadoop installation on premise. That said, there are issues around security and reliability that still must be addressed.
“For Hadoop, concerns about the cloud are higher,” said Mr. Stata, a former CTO of Yahoo Inc. “It’s valuable data assets, it’s a lot of data, and certainly the NSA stuff hasn’t helped.” Gartner’s Mr. Heudecker added that Hadoop vendors are making acquisitions in the security space, but because the space is in early stages, “that story is still being fleshed out.”
As CIO Journal reported in April, Hadoop is gaining traction as a key tool companies can use to break chunks of data into smaller pieces, which they can then analyze. Hadoop vendors have also been bringing in big bucks from VCs.