Skip to main content

Internal code refactoring

· 11 min read

Vitessce has a relatively large and complex code base for a JavaScript project. Like many other JavaScript projects, we not only develop a library for distribution on NPM but also an internal development web application and a documentation website. This alone presents many challenges for JavaScript build tools: namely hot module reloading in development plus a single-file bundle for NPM. Often additionally, we want to produce a matrix of different builds: minified/unminified, development/production, esm/umd/commonjs. If we have any custom CSS that goes along with our JavaScript, we also need to think about how that will get included in the bundle.

However, in Vitessce, many other things complicate both the development and production setups.

Luckily, there are new ways to address these complications due to improvements in language standards and build tooling. Thanks to discussions with Trevor Manz, we have been able to greatly improve our build setup.

In the following sections I describe each requirement and how we approached it in the new setup, in the hopes that it may be useful to others.

Previous build setup

We previously used ejected create-react-app (CRA) scripts. The base CRA scripts work well for publishing web apps but are not meant for bundling a library. We had to eject the base scripts to add support for the library build. The resulting scripts consisted of two Webpack configurations which used many plugins that were difficult to maintain but supported many of our requirements.

Failed attempts

While I previously made attempts to simplify our bundle scripts by using Rollup or Vite in place of CRA, these efforts were never successful. They usually required hours of debugging to find a perfect (ordered) set of plugins and parameter settings to support all of the requirements of Vitessce and its dependencies. It often felt like a game of whack-a-mole, fixing each issue just for another to emerge. After days of debugging, I achieved fully working setups on a few occasions. Unfortunately, even on these successful occasions (i.e., bundles, tests, and development server with no errors), I would still encounter issues preventing full adoption: the bundle size would be larger, or hot reloading in the development server would be unacceptably slow.

Incentives for this work

As a graduate student writing academic software, there is little external incentive to spend time to improve these build processes (or address tech debt in general), as they take time away from the implementation of new features. It is difficult to communicate to others the current problems and the benefits that a smaller, faster, modular, or more modern JavaScript bundle would enable. It is not possible to quantify how many people didn't use your package because the bundle size was too large or its format was not compatible with direct inclusion in an HTML page via CDN.

Further, it is difficult to estimate how long this work will take. It often requires major code refactoring, during which any unrelated code changes by other team members will need to be re-implemented in the refactored code base. For external consumers of the NPM package, bundle structure or format changes often constitute breaking changes. All of these factors mean that this work is often perfomed outside of normal work hours so that it does not prevent the development of new features or conflict with the work of other team members.

The problems

Our previous bundle was very large at 127 MB. The Unpkg CDN refuses to serve files from packages this large, preventing direct usage in HTML files and sites like ObservableHQ. Further, this prevents dynamically loading Vitessce from CDN for our Python or R widgets, for example. For library consumers, our unminified UMD build was 32.7 MB, slowing down their build processes and bloating their bundles. For end users, such a large bundle of JavaScript increases network usage and slows down page load times. For potential library adopters, seeing these numbers on NPM adds a sense of sticker-shock that could prevent adoption.

New build setup, by requirement

Bundle size and individual component bundles

One factor contributing to the overall bundle size is that we also included self-contained builds for individual React components. With such a large main bundle file, library consumers who only want to use individual components can achieve better development performance in their own environments by directly importing from the individual component bundles. However, this requirement adds redundant code to the bundle and contributes to the overall bundle size on NPM, as each individual component also requires a matrix of builds (minified/unminified, development/production, etc).

We addressed this by adopting a "monorepo" repository structure. This way, we can implement individual components in their own sub-packages which can be published to NPM as standalone bundles. Practically, this meant using PNPM for dependency management (as opposed to NPM), which implements many features for monorepo ("workspaces") management, including recursive installation and parallel script running.

Minimizing bundle steps in sub-packages

As browser, NodeJS, and JavaScript module standards have matured, there is becoming less of a need for a complex bundle step in between source code development and published code deployment, as described by others. In short, when the consumer of your library is another JavaScript library, it makes more sense for them to import from the original source code. On the other hand, when the consumer of your library is an HTML page, it currently makes more sense for them to import a single-file, minified bundle. As browser module features like importmaps improve, perhaps these considerations will evolve.

In the meantime, bundlers can be useful for generating the single-file, minified bundles used in the browser. Historically, another role of a bundler has been to resolve non-JS and non-standard file imports. For instance, it has become common to want to import CSS, SCSS, SVG, JSON, GLSL, and Web Worker files via JS-like syntax (e.g., import obj from './file.json'). But such non-JS import statements are not supported in the JavaScript standard and therefore will result in an error when run in a browser. A bundler provides the ability to transform non-standard source code into browser-compatible code.

On the other hand, in order to publish re-usable source code to NPM and avoid bundle steps, we need to avoid these non-standard imports. We have achieved this by using only tsc as a simple build step in our sub-packages, with only minor exceptions. Where there are exceptions, the separation of concerns enabled by the monorepo structure allows more complex bundle steps to be scoped to the sub-repo code and its requirements, reducing complexity, improving performance, and allowing different bundler tools to be used in each context. For example, we make exceptions in the following places:

  • The main vitessce package: For both backwards compatibility and to enable inclusion in HTML pages, we generate a matrix of single-file bundles using Vite and its library mode.
  • The @vitessce/icons sub-package: To enable SVGs to be imported and transformed into React components automatically, we use Vite and vite-plugin-svgr to produce a plain JS bundle that can be consumed by other monorepo packages. (Note: This setup does not allow SVGs to be hot-reloaded during development, but we can probably assume they will not change frequently.)
  • The @vitessce/workers sub-package: To enable Web Workers to be in-lined, we use Rollup + rollup-plugin-web-worker-loader. (This comes with the same hot reloading caveat as the @vitessce/icons case above.)

Website development with hot reloading

We have two website requirements in this project: a minimal development website (an index.html page with a list of demos), and a documentation website. These websites both need to consume the library code, ideally with hot reloading support during development.

In the same way that the monorepo's separation of concerns facilitates different bundling tools scoped to each sub-repo, it also enables different development servers in each "sub-website". This allows us to use Vite development server mode and its React plugin for the minimal development website. Meanwhile, we can continue to use Docusaurus and its internal Webpack configuration for the documentation website. To enable hot reloading during development of both websites, rather than tsc, we run tsc --watch so that the dist directory of each sub-package is updated on every source code change. To further improve performance, we enable TypeScript's "incremental" mode with "incremental": true.

React 18

To leverage the benefits of React 18, we use the TypeScript option "jsx": "react-jsx", which also comes with the minor benefit that JSX code can be written without explicitly importing React at the top of each file. To use React 18 in the context of Docusaurus and Material UI dependencies, which do not yet specify compatibility above React 17 in their peerDependencies, we use PNPM's peerDependencyRules in the root of the repository to override them and ensure that we are only depending on React 18 (as two versions of React cannot be included in the same web page).

Externalizing React

Because the Vitessce React component is intended to be consumed by a React application (or other React component library), we can assume the consumer is using their own copy of React. In fact, including two copies of React in the same web application typically causes a website to crash. To make React an external dependency in the main development and production bundles, we can use the rollupOptions part of the Vite configuration:

...,
rollupOptions: {
external: ['react', 'react-dom'],
output: {
globals: {
react: 'React',
'react-dom': 'ReactDOM',
},
}
},
...

Then, we can use the React-less Vitessce bundle in different contexts: UMD, ESM with importmap, or d3.require.alias.

Managing and publishing sub-packages

To streamline the management of many sub-packages, we use PNPM and its meta-updater plugin. This allows syncing changes to every sub-package's package.json file, for instance to update the main "version" or particular dependency version values.

GLSL shaders and glslify shader modules

To implement custom DeckGL shaders and extensions, we need to write GLSL shader code. We use the glslify shader module system to leverage published GLSL functions such as colormaps. Writing GLSL code in .glsl files enables IDE support but introduces the need for a build step when they are imported into JS files. The glslify module system also requires a build step to substitute strings like #pragma glslify: plasma = require("glsl-colormap/plasma") with the actual colormap function. As a workaround to prevent a build step, we can use the glslify CLI and simply include the shaders as strings in our JS files. This is not ideal, so we will need to look into ways to address the IDE support and modularity that were lost.

Style definitions

To style our React components, we previously used a combination of SCSS and CSS-in-JS (via Material UI v4 + JSS). SCSS requires a build step to convert the source code to plain CSS code. Further, CSS imports in general also require a build step as they must be transformed from the JS-like import syntax to JS-compatible syntax. To address this, we have manually converted all SCSS to JSS using Material UI's makeStyles function and theming utilities. (Note: we have not upgraded to Material UI v5 which no longer uses JSS so this solution may not be advised in a new project.) Using JSS everywhere comes with the benefits of standardizing our styling code and facilitating direct style injection for consumers (i.e., no need for library users to include or think about a separate CSS file). While JSS auto-generates class names by default (which prevents naming conflicts), the jss-global-plugin enables global class names to be used to refer to elements defined by dependencies such as react-grid-layout.

Testing

To modernize our testing infrastructure, we have adopted Jest + Vitest + React Testing Library and upgraded to the latest version of Cypress. However, there were a few hiccups, namely using code that depends on Web Workers in unit tests and serving test fixtures as static files during unit tests. Jest + Vitest use jsdom rather than a real web browser during unit tests, for efficiency. However, jsdom lacks support for web workers, which can be addressed by including the jsdom-worker library in the test setup script. To serve static files during unit testing, we can hook into Vite's development server and add static file routes via serve-static.

Bundle analysis

It can be difficult to know what matters when it comes to the minified bundle size, and where to focus efforts when trying to reduce the size. Visualizing the bundle composition is a great way to understand which dependencies or source files are contributing. Luckily, rollup-plugin-visualizer is compatible with this build setup because Vite uses the rollup plugin API. From this visual analysis, I found that three different versions of the Zarr package were being included in the bundle, which I resolved using PNPM's overrides in the root of the monorepo.

Concluding thoughts

Hopefully, most JavaScript developers only have a small subset of these requirements, but maintainers of more complex projects might find these notes useful. If you are interested in learning more, check out the Vitessce repository or a minimal example.