Aug 2021 - Apr 2022 | InfuseAI | Product Designer

PipeRider: Data Observability Tool for Data Scientists

Objectives

To increase user engagement and drive the growth of our product, we will conduct user research and gather feedback to identify areas for improvement and potentially develop a new product.

Role

I collaborated closely with the Product Director, 2 Product Managers, and 8 engineers to initiate the research process, generate ideas, and validate solutions, and successfully launch the product.

Outcome

Successfully brought the product from concept to launch, generated initial interest, and secured additional funding.

Process overview

Why we started with a new product?

InfuseAI offers PrimeHub, an open-source MLOps platform that streamlines research for data scientists. PrimeHub simplifies the research process with easy dataset loading and resource management. To sustain growth, we're conducting new user research to identify new opportunities and improve our product.

Our original product, PrimeHub.

How we recruit the research participants?

To understand the pain points in the ML deployment process, I first helped the team create a survey. Then, in collaboration with the community manager, we distributed the survey across various data science communities. The survey was shared in 23 communities and we received 115 responses, resulting in 37 participants for further research.

A message that I sent to a well-known data scientists with 20k+ followers.

Social media posts

Fun fact #1

What’s the tip to reaching out to strangers on Linkedin?

When reaching out to strangers on the internet, my approach is to be human and think about how I would engage in a conversation with them in person. I try to ask questions that will make the person feel comfortable and willing to respond. Since I have worked for several startups that had limited resources in the beginning, I have had ample experience in reaching out to strangers online. Additionally, I make sure to tailor my approach to the specific person and their interests in order to build a connection and increase the likelihood of them responding positively.

Research

What we learned from talking to 20 AI practitioners

After we talked with the data scientists, we realized that data scientists cares a lot about model monitoring and collaboration on it. Rather than the overall deployment process.

Challenge #1

Data Scientists are using Google Spreadsheet to keep track of machine learning models

This is a common practice among data scientists as it allows them to easily organize and track the different models they are working on, along with their corresponding performance metrics and parameters.

Challenge #2

Data Scientists are using Notion + GSheet to collaborate

Notion and Google Sheets are popular tools used by data scientists to collaborate on projects. Notion is often used for project management and documentation while Google Sheets is used to store and share data. This combination of tools allows data scientists to easily share and access information, streamline communication and improve collaboration.

Challenge #3

Spreadsheets get out of control when they have multiple models

As the number of models increases, it can become difficult to maintain an organized and accurate spreadsheet. This can lead to errors and inconsistencies in the data, and it can also become increasingly time-consuming to update and manage the spreadsheet. This can result in data scientists spending more time on data organization and less time on actual data analysis, which can hinder the project's progress.

Ideation

Initial ideation

Based on the pain points we collected from interviewing data scientists, we organized a series of workshops to generate solutions and brainstorm ideas. These workshops provided an opportunity for the team to collaborate and visualize our ideas, which helped to bring our concepts to life and gain a better understanding of how they could be implemented. By sharing our ideas with the team, we were able to gather feedback and make adjustments as needed to ensure that our solutions were effective and met the needs of the data scientists.

Part of the workshop where I facilitated to help the team to visualize the ideas

Design

Moving forward with the sketches

After the workshop where we shared our sketches, I assisted the team in consolidating all of the ideas generated during the session. This involved analyzing the concepts that were presented and identifying common themes and ideas that could be combined to create a cohesive and comprehensive solution. By combining the ideas, we were able to further visualize the concept and develop a clear understanding of how it could be implemented in practice. This helped us to refine our approach and ensure that the solutions we proposed were effective and met the needs of the data scientists.

A message that I sent to a well-known data scientists with 20k+ followers.

Collecting feedback on the design

Test

Testing prototype with more interviewees

Designed with the modern data stack in mind, PipeRider fully supports a wide range of popular datasources, namely Snowflake, BigQuery, Redshift, Postgres, SQLite, DuckDB, CSV, Parquet.

Assumption workshop

Test with potential users

Define

Testing with customers - realizing a deeper pain...

After testing our initial concept and conducting further interviews with data scientists and machine learning practitioners, we realized that there was a deeper pain point that needed to be addressed. Instead of simply providing them with another general model monitoring platform, we recognized the need to solve a specific problem that they were facing. This involved taking a more in-depth look at the challenges they were facing and identifying specific areas where they needed the most support. Through this process, we were able to gain a better understanding of their needs and tailor our solution to better meet their requirements. This ultimately led to a more effective solution that addressed their specific pain points and provided greater value to data scientists and machine learning practitioners.

Challenge #1

Data versioning and tracking its impact across the whole pipeline

The non-linear nature of data versioning makes it challenging to effectively manage and track dynamic, massive datasets using traditional version control tools. Instead of focusing on the actual content of the dataset, data scientists often rely on tracking changes to the dataset's metadata, such as training-serving skew, feature definition changes, implicit data dependency, and bias. However, this approach does not provide a comprehensive view of the entire ML pipeline and makes it difficult to reason about the correlation between changes made at different stages. This lack of a tool to manage complexity in these pipelines can be a major obstacle to building robust ML systems.

Challenge #2

Visibility and Resilience of an ML system

The complex nature of ML pipelines reduce the reproducibility and resilience of your ML system:A subtle change in the feature definition can have a massive impact on the resulting model quality.An infrastructure failure can cause changes in the distribution of data and becomes a modeling problem.Debugging such issues requires having the whole picture, which is expensive and hard.

Challenge #3

Spreadsheets get out of control when they have multiple models

A real-time performance metric for ML systems is crucial for evaluating their quality. These metrics (e.g. accuracy of recommendation systems) are domain-specified and every problem requires a different way to acquire these metrics. The faster you can get the feedback from the real world, the faster you can iterate and improve your ML project quality.

Design

Our proposal - a simple, timeline-based tracker for data scientists to keep track of data

PipeRider, a pipeline-wide change management tool, aims to solve the hard problem by building on top of the shoulder of these players and providing a better feedback loop on metadata’s change.

Competitor screnshots and collections on Miro

Fun fact #2

How did we decide the branding colors?

This is another story so I decided to skip this part... as it’s a full month of brainstorming sessions to come up with the brand, idea, the product and the design. In short, we hosted a branding workshop to decide on the brand, a couple other workshops to discuss about the problem, and a hackathon to come up with the solution separately.

Design

Defining critical user journeys

After successfully implementing the initial solution, we gained confidence in the idea and moved forward to develop the critical user journeys that were necessary to fully realize the potential of our solution. This involved identifying the key steps and actions that users would need to take in order to effectively utilize the system, and designing the user interface and user experience to support these journeys. By focusing on the critical user journeys, we were able to create a solution that was intuitive and easy to use, and that effectively addressed the needs of our target users.

Critical user journeys that the team brainstormed together on

UJ 1 - Onboard New User

The user goal for this step is to successfully create an account, understand the basic features of the platform and be able to navigate the interface.

UJ 2 - Create New Project

The user goal for this step is to be able to create a new project, set the project's properties, and organize the data and models within the project.

UJ 3 - Project Setting

The user goal for this step is to be able to customize and manage the settings of the projects, such as adding collaborators, setting permissions, and configuring integrations.

UJ 4 - Experiment Comparison

The user goal for this step is to be able to compare the performance of different models and experiments, and to identify the best-performing models.

UJ 5 - Timeline

The user goal for this step is to be able to view the history of changes made to the project, including the performance of different models over time, and to understand the impact of different changes on the project's performance.

UJ 6 - User Settings

The user goal for this step is to be able to manage the user's account settings, such as changing the password, updating the profile, and managing the email notifications.

Design

Prototype - Defining Information Architecture

Before beginning the design phase, we also defined the information architecture (IA) to ensure that the design would cover everything within the critical user journey. This process involved organizing the content and functionality of the platform in a way that made it easy for users to find what they were looking for and complete the key actions they needed to take. This step was crucial in ensuring that the design would be intuitive and user-friendly, and that it would effectively support the critical user journeys identified earlier. By defining the IA together as a team, we were able to ensure that the design was comprehensive and met the needs of our target users.

Information architecture we made on Figjam

Test

Improving the design along the way...

During the prototype phase, we received feedback from customers on various aspects of the design, including the need for filters and the ability to connect multiple datasets on S3.

Information architecture we made on Figjam

Challenge #1

Filters

By providing a filter, the data scientists can quickly filter through the data and focus on the important data for their analysis.

Challenge #2

A way to handle multi datasets

the data scientists can easily combine data from different sources and thus have a holistic view of the pipeline. This allows data scientists to more effectively manage and track the dynamic and massive datasets, and to make better use of the data to improve the performance of their models.

Design

Final design

Before beginning the design phase, we also defined the information architecture (IA) to ensure that the design would cover everything within the critical user journey. This process involved organizing the content and functionality of the platform in a way that made it easy for users to find what they were looking for and complete the key actions they needed to take. This step was crucial in ensuring that the design would be intuitive and user-friendly, and that it would effectively support the critical user journeys identified earlier. By defining the IA together as a team, we were able to ensure that the design was comprehensive and met the needs of our target users.

User onboarding (Simplified)

To ease the onboarding process, we decided to use OAuth for Google and Github as these are the most popular services among data scientists. This allows for a secure and efficient signup process without the need to create a separate account. Once the user completes the onboarding process, they will be directed to an example project where they can explore the platform and familiarize themselves with its features. This way they can have an understanding of the platform's capabilities before creating their own project.

UJ 2 - Create New Project

During the project creation process, users will have the ability to select the project name, customize the color scheme, and follow a step-by-step guide to install Piperider in their Jupyter notebook. This allows for a personalized and streamlined experience, as users can tailor the project to their specific needs and preferences.

Timeline

The timeline feature in Piperider is a key component of the platform as it enables data scientists to gain a comprehensive understanding of the machine learning models they are working with. It provides an overview of the model's performance over time, and allows data scientists to identify any potential issues or inconsistencies. This feature is crucial for data scientists as it empowers them to make informed decisions about their models and take action to improve their performance. Additionally, it helps data scientists to keep track of the different stages of the models, and to reason about the correlation between changes made at different stages, which is important for managing complexity in the pipeline.

Tracking and comparing experiments

The tracking and comparing page in Piperider enables data scientists to keep track of all their models in one place. It allows them to easily compare the performance of different models and identify the best-performing models. This feature is crucial for data scientists as it helps them stay organized and avoid losing track of their models during experimentation. It allows them to make informed decisions about their models, and quickly identify models that need further improvement.

Managing datasets

The dataset page in Piperider allows data scientists to connect and manage multiple datasets in one centralized location. It enables them to easily access and analyze data from various sources, streamlining their workflow and helping them to make better use of their data.

Results

Successfully released the product in May 2022

We have successfully released the product in May 2022 and onboarded a few customers to learn and improve the product.

Raised extra funding

With PipeRider, the company has also secured some extra amount of funding that can allow the team to develop the feature further.

Learnings

1. Everyone is a designer

As the services that VdoTok provides consists of complicated connection data, initially it was a big headache to present 50+ metrics in a webpage. To soothe this, I first tried to categorize different data into different levels and categories, and then come up with some ideas to visualize the data without taking up too much space.

2. Work closely with engineers

As the end-users of this product are the developers, I worked closely with the developers in our own company during the design process. This ensured regular feedback from the developers to improve the design.

Check other case studies

Redesigning Lokalise's Marketing Content Management Page

Designing a User Feedback System to Improve Lokalise Messages

From Idea to Launch: Designing an AI-Enabled Scheduling App with OrganAI