Every PIM project begins with the task of cleaning up your product data. If that sounds daunting, don’t worry. In this article, we share advice on what to expect and how to prepare.

When to start your product data cleansing

There are always opportunities to clean your product data.

Series of circles in varying colors and sizes. Some are filled in. Some aren’t. Some are dotted lines. Some are solid. Solid grey circles are labeled as dirty data. Circles outlined in grey dotted lines are marked missing data.

But the truth is, most companies only do product data cleanup when they officially begin a PIM analysis or implementation. And that’s perfectly fine.

Wherever you begin your PIM journey is the right place to start. The most important thing is that you don’t skip this step.

Common data issues to flag during product data cleanup

Large circle in the center with a grey outline and white background. It has several circles in the center in varying shapes, colors, and sizes. Behind that on a black background are wavy bands of circles in varying shapes, colors, and sizes.

As PIM experts, we look for a variety of data issues. Here are a few of the most common, which all cause issues when it comes time for data ingestion:

  • Format inconsistencies: Disparate formats are commonly found in open text fields and for units of measure. For example, “inches” may be represented inconsistently with letters, numbers, abbreviations, symbols, etc.
  • Various data points in a single field: Users sometimes squeeze multiple data points—like color, style, descriptions, etc.—into the same field, instead of segmenting them out individually.
  • Missing context: Sometimes, a field may be defined as a measurement, but it won’t denote the type. For example, is it ounces, inches, or voltage? Context is critical.
  • Misspellings: Spreadsheet programs—often used to manage data before PIM—won’t automatically run spell checks. And most busy users won’t take the extra step to trigger proofreading, which often leads to misspellings.
  • Duplicate unique identifiers: This is another sneaky culprit of big data issues. Sometimes we find two lines of differing data for a single SKU or UPC. It’s a big problem causing users and automations to not know which one to use.

What to expect during a product data clean-up project

When you work with Ntara, you can expect detailed analysis and guidance on cleaning up product data, along with some opportunities to automate or speed things up.

Whether the project involves one file or many, we carefully analyze each column of data against five categories: product and item segmentation, duplication, data consistency, completeness, and the presence of marketing content.

Using a rubric score—zero to five—we establish a clear picture of the data’s condition and urgency for cleanup (for each category and the file as a whole).

Chart with 4 columns and 5 rows. The first column is grey and labels each row with a different white icon. The second column is red on top and shows red check marks on 2 of the rows. The third column is yellow on top and shows yellow check marks on 2 different rows. The fourth and final column is blue on top and shows blue check marks one 2 row.

A detailed analysis, plus recommended actions, guide the next steps. We explain what’s wrong with the data and how and when to fix it.

Ways to speed up the data cleansing process

Grey circle with a black outline. At the center, a black circle filled with other circles in various colors, shapes, and sizes. On the grey of the outer circle are two white arrows, indicating motion.

Remember, product data cleanup will likely involve manual work. It takes time to correct all that random, mismatched data. But there are ways we can assist or help speed things up.

Often, if there’s a column with a clear pattern, we can apply an automated fix with script or code to get you further down the line before the manual process begins.

There are creative things we’ve done over the years to help our clients. For example, we had one client with data in tables within PDFs. So, we had our developer create a script that could strip those tables and put the data into a format for ingestion.

Sometimes, we find issues that we can remedy for you during data ingestion — and we mark those opportunities, as well.

In one instance, we found a field with inconsistencies around cases and units, i.e., 10 cases of 1,000 units. We learned that they really wanted to capture both data points, so we said we could split that data into two fields during ingestion for them.

In another example, the client had a field for “feet” with formatting inconsistencies. To streamline, we suggested keeping just the number — and we noted the unit of measure elsewhere, so they didn’t lose that context altogether.

Adopting the right product data cleanup mindset

It’s easy to get overwhelmed with all the recommendations. The best thing you can do is take it one step at a time, one data set at a time. Don’t try to look at all the problems all at once.

Another mindset trick? Go into the process expecting bad data.

Just because some of your data is dirty, it’s not the end of the world. Identifying problems is the opportunity to get the data clean — after all, it’s likely part of why you needed PIM in the first place.

Setting expectations and a reasonable pace also helps you push through when the overwhelm creeps in. Because the worst thing you can do is rush the process and settle for “good enough.”

Sometimes, a client team will decide to reduce the stringency of the rules just to get the data into PIM as quickly as possible. Or they tell themselves, “Let’s just get the dirty data in now, and we’ll clean it up later,” but then later never comes.

Instead of lowering your standards, take an MVP approach.

Identify the low-hanging fruit is that you absolutely need to get done for your MVP — and handle that first.

And for the rest? Establish that you won’t ingest the bad data now, but you’ll keep it on the back burner to do a second pass later on.

What you’ll need to succeed

Drawing from our own PIM experience, here are key factors in getting the best product data cleanup results:

Black circle with a thick blue outline. White icon in the center of a person and a clock.

Employees with time and attention to detail

You’ll need employees with dedicated time carved out to do the work. Ideally, these will be people who will also look at the data through a granular lens versus rushing through the process.

Black circle with thick blue outline. A white icon in the center is an abstract series of circles and lines.

An example of what the standard should look like

Take a moment to consider: What does our perfect record look like? Even if you don’t have that perfect record today, if you’ve thought about it or discussed it as a team, that gives you a solid head start.

Black circle with a thick blue outline. In the center of the circle is a white icon with a circle in the center connected by lines to five smaller circles.

An open mind and understanding of your ecosystem

At a practical level, it helps to understand the state of your ecosystem and why your data matters.

As much as possible, we like to know at the beginning: Where’s the best source of truth for your data? What attributes are most important? What systems or channels are important? And where do you have blockers?

That’s a much clearer picture to start with rather than, “this is the data we need to clean.”

Having this holistic picture helps us think about the best ways to model the data to meet your goals, along with the easiest, most efficient ways for your people to enrich the data. Because ultimately, that’s what will help keep your data clean and effective for the long term.

Black circle with a thick blue outline. Inside the circle is a white icon of a checklist. Two items are checked off and one is marked with an X.

A way to confirm you’ve met your goals

Finished cleaning up your data? Take a final look and take steps to keep it clean.

We perform a spot check to make sure everything looks good. But we also have a failsafe in place during ingestion to PIM. Our developers insert error handling into the ingestion code. So, if something is off, an error will be recorded to show what went wrong so both teams know exactly where to apply a fix.

Take the first steps in product data cleansing

Before diving into your PIM project or ecommerce venture, you must make sure your product data is clean, consistent, and ready for ingestion.

Don’t go through it alone. Our PIM experts can help. To speak with a consultant about your product data, get in touch.

Want to stay in the loop on future Ntara content? Subscribe to our blog.