How did companies manage their big data? How do they manage it now and what will be next in this domain for companies?
Here’s a bit of history… When the data scientists at Synchronoss Technologies started a brand new project in 2014 to store and manage the huge amounts of information created in their business and extract insights from it, they thought they’d found the perfect solution using the software named Hadoop.
It is able to distribute and organize data across a nearly unlimited range of computers way cheaper than in typical data centers. Hadoop finally enabled the possibility to derive insights from big data sets. “We could just put everything in there,” said Suren Nathan, Synchronoss’ vice chairman of engineering, digital transformation, and analytics.
But it all quickly became a mess: Hadoop didn’t provide any services for discovery and searches. Of course, it had some tools to bring order out of the mess, but it’s still remained the open source ecosystem. And it was up to Synchronoss to work out the way to place a dozen or more open-source software parts along, but it wasn’t easy at all.
Synchronoss’ experience is very common. In fall 2017 Gartner claimed that 85% of companies big data projects failed.
Did that mean the entire concept of big data was a failure? Of course, not. Data is too important asset to ignore. Moreover, the vast majority of companies believe it’s their most important asset. Therefore, all failed projects broke the new ground to totally new products and services, like ready-to-use data warehouses, or machine learning in the cloud. They deliver value to customers without additional integration and management efforts. As an outcome, big data changed the way companies relate to customers and other stakeholders.
But big data is still an “ongoing project”, and it’s still unclear where it may lead the entire market. There are currently a number of ready-made big data solutions (concentrating users on deriving values) along with the existing open-source ecosystems (working more like do-it-yourself services).
There is one certain thing about it: companies that do not manage their big data will be market outcasts in the nearest future.
Back to Synchronoss experience: the company acquired Razorsight Corp. (analytics provider) in 2015. They also implemented a few major changes:
- They used a Hadoop cluster running MapR’s distribution for big data implementation.
- They used Spark as a faster alternative to MapReduce.
- In 2017 Synchronoss planned to implement MLlib to empower automated analytics applications with the extra possibilities, like detecting fraudulent activities and violations of mobile-device security policies.
Now one of their products is Analytics platform: “Our Synchronoss Analytics platform is a cloud-based SaaS, platform that improves customer experience, retention, acquisition, monetization and financial performance. Our Synchronoss Analytics platform analyzes substantial volumes of data from internal and external sources to deliver daily business insights to our customers’ executives and leverages data science, machine learning, artificial intelligence, or AI, and workflow automation while integrating with internal systems to deliver insights around customer behavior, sentiment, and operational performance. Our Synchronoss Analytics platform drives significant financial benefits to our service providers and enterprise customers and plays a key role in their efforts to monetize content and customer data in this era of digital transformation.”
So, you can see the progress of starting out by integrating big data components into ready-to-use packages to switching to the cloud and shifting focus towards the applications during just a few years.
And it’s not just about Synchronoss experience. “Data analytics is one of the main growth factors in our cloud business,” said William Vambenepe, group product manager for data processing and analytics at Google Cloud. “Cloud vendors are focusing on taking the complexity out of big data deployments to turn services into utilities for business users. We’re building it to be like the plumbing in your home,” he said.
“These applications have so many moving parts in a multi-vendor pipeline that customers are placing a premium on consistency and simplicity integrated out of the box,” said James Kobelius, Wikibon’s analyst. “The infrastructure equivalent of Steve Jobs’ ‘It just works’ is becoming more important.”
Big data is now set to move in varied more interesting directions.
- Wikibon predicts big-data infrastructure and analytics/application databases growth to slow down by 2023. Alongside, the advanced analytical applications will rapidly grow, machine learning will have even more impact in IoT.
- Systems of intelligence (like Google, Amazon, IBM products and services) will define the future. Powered by big data and machine learning they will enable companies with the opportunity to use new business models.
- Machine learning is already on top of many IT infrastructure components and consumer-dealing services. For instance, the evolving technology will help scan images to pick out cancerous cells. “A lot of amazing things will come out of machine learning,” said Gartner’s Leganza.
- Also, the trend is focusing on self-service and automated data preparation. Cleansing and normalizing data still take a great deal of data scientists’ time. Informatica and IBM are trying to solve this issue along with newer companies like Talend, Trifacta, Jitterbit etc. “Big data still needs to be simplified for regular users and developers; it’s way too complex,” said Kobielus from Wikibon. That will be the next big-data industry triumph. Software vendors such as Tableau and Data Robot are focusing on the front end of the self-service: “Self-service and consumability remain hurdles for mass adoption,” said Kobielus.
- The requirement to make big data more usable in companies is leading to a new focus on making data more structured, standardized and consistent. Those are exactly SQL databases characteristics. Microsoft’s Azure Data Warehouse and Google’s BigQuery have emerged owing to the companies’ need of leveraging standard SQL data investments in a big-data context. Also, varied open-source projects support SQL in different degrees.
Mariani from AtScale said about this trend: “data lake 2.0,” it’s when cloud vendors put things like catalogs on top of file systems and it looks like Hadoop. Combined with cloud serverless infrastructure (systems that cloud vendors deliver automatically without users touching the controls) people can apply their preferred reporting tools to semistructured data without going through the process of loading everything into a data warehouse.
Author: ASD team