Stéphane Walter - Perspective

Data tutorials, tools and languages

Spark Structured Streaming: performance testing

Spark is an open source distributed computing framework that is more efficient than Hadoop, supports three main languages (Scala, Java and Python) and has rapidly carved out a significant niche in Big Data projects thanks to its ability to process high volumes of data in batch and streaming mode. Its 2.0 version introduced us to…

Data tutorials, tools and languages

Spark Structured Streaming: from data transformation to unit testing

Spark is an open-source distributed computing framework that is more efficient than Hadoop, supports three main languages (Scala, Java and Python). It has rapidly carved out a significant niche in Big Data projects thanks to its ability to process high volumes of data in batch and streaming mode. Its 2.0 version introduced us to a…

Data tutorials, tools and languages

Spark Structured Streaming: from data management to processing maintenance

Spark is an open source distributed computing framework that is more efficient than Hadoop, supports three main languages (Scala, Java and Python) and has rapidly carved out a significant niche in Big Data projects thanks to its ability to process high volumes of data in batch and streaming mode. Its 2.0 version introduced us to…

Data Strategy

DataOps: data specification and documentation recommendations for Big Data projects

To exploit the full potential of Big Data projects, proper data documentation is essential. DataOps principles help set up an adequate approach – a prerequisite for the success of all ensuing projects and adding value to all the company’s data. Specific characteristics of Big Data projects A modern Big Data architecture should help: generate output…

Data tutorials, tools and languages

[TUTORIAL] First steps with Zeppelin

Zeppelin is the ideal companion for any Spark installation. It is a notebook that allows you to perform interactive analytics on a web browser. You can execute Spark code and view the results in table or graph form. To find out more, follow the guide!

Data tutorials, tools and languages

Tutorial: How to Install a Hadoop Cluster

You have read many articles on Hadoop and now you want to get familiar with it, but how do you install and apply this new technology? The recommended approach is to install a turnkey virtualized machine supplied by a major publisher.

All posts from Stéphane WALTER

Spark Structured Streaming: performance testing

Spark Structured Streaming: from data transformation to unit testing

Spark Structured Streaming: from data management to processing maintenance

DataOps: data specification and documentation recommendations for Big Data projects

[TUTORIAL] First steps with Zeppelin

Tutorial: How to Install a Hadoop Cluster