Teaching a Billion People to Code: How JupyterLite Is Scaling the Impossible

By The New Stack

Share:

Key Concepts

  • Quantstack: A company founded by Savon Corlay in 2016, initially to support his work as a core Jupyter developer. It has grown to a team of 30, involved in Jupyter, Conda Forge, and Apache Arrow.
  • Jupyter Project: An open-source project providing interactive computing environments, widely adopted in academia and industry.
  • Jupyter Notebook: A web-based interactive computing environment that allows users to create and share documents containing live code, equations, visualizations, and narrative text.
  • JupyterLab: A next-generation web-based user interface for Project Jupyter, offering a more flexible and extensible environment than the classic Jupyter Notebook.
  • Conda Forge: A community-led collection of recipes, build infrastructure, and distributions for the conda package manager, primarily for scientific software.
  • Apache Arrow: A cross-language development platform for in-memory data, designed for efficient data processing and analytics.
  • JupyterLite: A distribution of Jupyter that runs entirely in the web browser, including language kernels, utilizing WebAssembly for code execution.
  • WebAssembly (Wasm): A binary instruction format for a stack-based virtual machine, designed as a portable compilation target for high-level languages like C, C++, and Rust, enabling them to run on the web.
  • Capital (Jupyter Deployment): A large-scale deployment of Jupyter in the French high school system, serving approximately half a million registered users annually, with code execution happening in the end-user's browser.
  • Emscripten: A compiler toolchain that enables developers to compile C and C++ code to WebAssembly.
  • Emscripten-forge: A software distribution developed by Quantstack, based on the Emscripten toolchain, for building WebAssembly applications.
  • Accessibility: Ensuring that software is usable by people with disabilities, a key challenge for the Jupyter ecosystem.
  • Real-time Collaboration: The ability for multiple users to work on the same document or code simultaneously.
  • Package Management: The process of installing, updating, configuring, and removing software packages.

Quantstack and its Origins

Quantstack was founded in 2016 by Savon Corlay, initially as a one-person company to support his work as a core developer of the Jupyter project. The company experienced rapid growth due to the exploding adoption of Jupyter and the high demand for commercial support and consulting services around its ecosystem. Today, Quantstack has a team of approximately 30 people, with half based in France and the other half distributed across Europe (Germany, Austria, Spain, and the UK).

While initially focused on Jupyter, Quantstack has expanded its involvement to other key open-source ecosystems, including the Conda Forge project (a software distribution for science) and the Apache Arrow project (a cross-language development platform for in-memory data). The business has primarily been service-based, but there are indications of future product development.

Savon Corlay's Background and Bloomberg's Role in Jupyter

Savon Corlay's journey into open-source development began at Bloomberg, where he worked as a Quantitative Analyst ("Quant"). His role involved auditing and creating financial models, as well as addressing mathematical challenges within the company. During this time, the Python scientific stack became his preferred toolset.

Upon the introduction of Jupyter Notebook, Corlay was highly impressed and successfully advocated for Bloomberg's investment in the project. This led to Bloomberg becoming a major funder of Jupyter, notably incubating and donating JupyterLab to the community. JupyterLab is now a widely adopted, multi-stakeholder project used by millions globally. Corlay's relocation to France for family reasons prompted him to start Quantstack as a way to continue his open-source contributions.

Innovations and Future Directions in the Jupyter Ecosystem

JupyterLite: Bringing Jupyter to the Browser

A significant development highlighted is JupyterLite. This project is a distribution of Jupyter that runs entirely within the web browser, including the language kernels responsible for executing user code. This is achieved through WebAssembly builds of interpreters for languages like Python, Octave, and R.

Key benefits of JupyterLite:

  • Scalability: It offers unprecedented scalability for Jupyter deployments, with examples already serving millions of users.
  • Use Cases:
    • Presenting results: Ideal for showcasing findings without requiring server-side computation.
    • Edge computation: Performing computations directly on the user's device.
    • Documenting open-source projects: Providing interactive examples and documentation.
    • Educational purposes: Making coding accessible for learning.

Example: Capital (Jupyter Deployment in French High Schools)

A prime example of a successful Jupyter-based deployment running entirely in the browser is Capital in the French high school system. Initially launched during the pandemic by the Paris school district, it has since expanded nationwide.

  • Scale: Serves approximately half a million registered users annually, with over 200,000 user sessions per week.
  • Architecture: Relies on a content management system for teaching materials (notebooks) on a single server, with all code execution handled in the end-users' browsers via CDNs.
  • Efficiency: This model avoids the need for large-scale cloud deployments of JupyterHub and Docker images, making it highly cost-effective for educational purposes, especially for tasks like learning basic programming concepts (e.g., for loops, greatest common denominator).

Challenges in Scaling and Accessibility

As the Jupyter ecosystem grows, several challenges need to be addressed:

  • Accessibility: Improving web accessibility standards for Jupyter is crucial, especially for the large student user base. Statistics indicate that a significant portion of the student population has disabilities requiring web adaptations. This is a complex task involving major code refactors, and it's not typically funded by corporate clients. Quantstack has received a grant from the Linux Foundation to improve keyboard accessibility in JupyterLab, but comprehensive accessibility requires more resources.
  • Global Reach and Infrastructure: The vision is to make Jupyter accessible to billions. For countries with large, young populations like Nigeria (projected to grow by 100-200 million people in the next 20 years), relying on cloud providers in other regions is not ideal due to infrastructure limitations and sovereignty concerns. JupyterLite offers a low-tech, static page solution that can be served via CDNs, making it a viable option for teaching programming at a global scale.
  • Real-time Collaboration: Implementing effective peer-to-peer real-time collaboration within a purely browser-based compute environment remains a significant challenge.
  • Package Management: Expanding the availability of software tools and packages for the browser is an ongoing effort. Quantstack has developed Emscripten-forge, a software distribution based on the Emscripten toolchain, which includes hundreds of packages for Python, R, and C++. However, maintaining and expanding this library requires a community effort.

Funding Open Source and Quantstack's Model

The discussion touched upon the perennial challenge of funding open-source projects, where companies benefiting from the software don't always contribute proportionally. However, Jupyter is considered fortunate due to significant support from large corporations like Bloomberg and AWS, as well as hedge funds. These organizations engage positively with the open-source community, addressing their priorities while contributing to the ecosystem. While direct funding for broad initiatives like full accessibility is rare, these conversations can influence corporate priorities.

Quantstack's growth model is deliberately "pedestrian" and organic, avoiding VC funding and the pressure to capture the entire community. They aim to build a sustainable business model with a traditional economic approach, focusing on linear team growth and low employee turnover.

Future News from Quantstack

Savon Corlay hinted at upcoming news from Quantstack. He emphasized that the company is actively developing JupyterLite and that various aspects of their work, including package management, JupyterLite, and real-time collaboration, are converging into a significant, yet-to-be-announced development. He encouraged followers to stay tuned for these announcements.

Conclusion

The conversation with Savon Corlay of Quantstack provided a deep dive into the evolution and future of the Jupyter ecosystem. Key takeaways include the transformative potential of JupyterLite for global scalability and accessibility, the ongoing challenges in areas like accessibility and real-time collaboration, and Quantstack's commitment to a sustainable, community-focused approach to open-source development. The future holds exciting developments from Quantstack as they continue to innovate within the interactive computing landscape.

Chat with this Video

AI-Powered

Hi! I can answer questions about this video "Teaching a Billion People to Code: How JupyterLite Is Scaling the Impossible". What would you like to know?

Chat is based on the transcript of this video and may not be 100% accurate.

Related Videos

Ready to summarize another video?

Summarize YouTube Video