EclairJS – Putting a Spark in Web Apps


Presentation by David Fallside from IBM, images extracted from the presentation.

Table of Contents

Introduction

Web Apps development has moved from Java to NodeJS and Javascript. It provides a simple and rich environment with NPM.

EclairJS is a NodeJS library that provides bindings to a Spark application:

  • An RDD is bound to a JS object that is made immutable
  • Spark operators are transparently mapped to JS functions (ex: flatMap, filter, …)
  • Every Spark operator mapped returns a promise

The use of promises allows to emulate Spark’s use of the DAG:

  • Transformations return a new object and are added to the DAG
  • Actions executes the whole DAG to get a result


EclairJS - Code semantics

Architecture

EclairJS has two main components:

  • Client: JS API, installed with NPM
  • Server: JS providing Java mapping and able to run in the JVM using Oracle Nashorn, has to be run

The server also uses Jupyter Notebook to provide a WebSocket endpoint between client and server


EclairJS - Architecture

Performance

In terms of performances, Spark’s native Java API is way faster, however EclairJS is twice as fast as Spark’s PySpark API.


EclairJS - Performances

Conclusion

EclairJS seems to be a great project if you need to integrate Spark jobs into a web application.


Share this content:

I am a passionate blogger with extensive experience in web design. As a seasoned YouTube SEO expert, I have helped numerous creators optimize their content for maximum visibility.

Leave a Comment