The world of data science and information technology is a constantly evolving landscape, where dozens of new tools and methodologies are created and updated daily, and many others quickly become obsolete.
Every organization has their own ecosystem of applications, but even the most advanced organizations sometimes fall behind in certain areas when compared to the bleeding edge of technological advances. This is a completely normal occurrence, and is part of what makes this business exciting.
Having some technological blind-spots is completely acceptable. Rather than being concerned about always using the latest and greatest tools, organizations need to focus on cultivating a data-centered culture (which has been linked to increased productivity and market value), and combating technological illiteracy.
In this blog post, we will address some of these problems, and give suggestions as to how traditional organizations can use data to become more efficient.
What is a traditional organization?
When we think about data science and data engineering, we usually imagine "tech companies" inserted in sectors such as banking, telecommunications, retail, and more recently, marketing. Nowadays, the data usage of these companies is so intense that we would have a hard time imagining them being able to function without an efficient data infrastructure.
However, there are also a lot of organizations - especially in more traditional sectors, such as politics, economic agencies, railroads, manufacturing, water management, to name a few - that have a different way of operating.
These organizations usually have one or more of these characteristics:
- their processes still rely mostly on intuition rather than a combination of intuition and hard numeric proof;
- a lot of processes are manual (for example: sending files by email, and receiving external information);
- most of the know-how (or at least the "tricks of the trade") is contained in the experiences of more senior members, and information is shared through ad-hoc, undocumented manual processes;
- there is a general difficulty in understanding the day-to-day impact of being data driven.
Usually these companies have been around for a long time, and are inserted into sectors that, for one reason or another - strict regulations, strong traditional culture, or less "data-intensive" nature - have not yet been shaken by the new age of data science.
In these situations, the path to investing in proper data pipelines and algorithms can be very foggy - although very beneficial, as we will see in the following real life examples.
Benefits of improving your "data efficiency"
At Daredata, one of our goals is to help companies become more data-driven. Being data-driven helps companies to navigate uncertain times with higher data reliability and better decision-making with accurate and real-time data.
One of the ways we do this is by exploring the company's data to its fullest potential, and looking for parts of their business that are not using it efficiently. The following three examples illustrate some ways we were able to do this.
Generating sales forecasts
In one of our projects, we developed a sales forecasting model for a large producer and distributor of beverages. This was a large, well established and relatively old company, but their data infrastructure was rudimentary in some aspects.
Our model used historical sales data, enriched with external data (weather and social events, such as football games), to predict future sales. These sales forecasts were then sent to the sales team, to aid them in generating monthly objectives.
For most geographical regions, these new forecasts lead to very significant improvements in accuracy. However, since the previous forecasts were made and adjusted based on the experience of senior sales managers, it took a while to gain internal support for the tool. One of our main focuses was to demonstrate that the newer system was not there to replace anyone, but rather to be used as a tool to support decision-making and to help the company make use of its data.
Why did we consider this a traditional organization: forecasts were generated based on intuition; decision-making was not very data driven.
Gains for the organization: Better use of their historical data to generate accurate sales forecasts. Less time wasted in manually adjusting forecasts by the sales managers, and more realistic sales objectives.
Collecting event data
On another project, we collaborated with a start-up in the business of wearable devices. The goal was to build data pipelines and a web server that allowed them to collect and store events triggered by their users through their application, and a dashboard to visualize this data.
In this case, a big part of the technology was already in place. However, the company lacked in-house experience to handle the data-related tasks, and were using software from external providers. Most of these applications were extremely expensive, and hid most of their useful features behind premium subscriptions.
The data inefficiency here was more cost-related, as the company was paying a lot more money that it should be to achieve the same data goals. The pipelines that we implemented are holding up extremely well, and are able to process millions of daily events!
Why did we consider this a traditional organization: inefficient data infrastructure, locked in to external service providers. Not enough in-house expertise to handle all the data.
Gains for the organization: Cost savings, more reliable data pipelines, more possibilities for analytics and different dashboards.
Organizing the business landscape
Finally, we did a project for a government agency, which was possibly our most "traditional" client to date. This agency worked with a large number of organizations, with the goal of attracting foreign investment.
This project really showed how data science can boost productivity across the entire organization, and help everyone do a better job. By using algorithms to sort and curate their entire business landscape, we were able to filter organizations of interest for our client, as well as exclude organizations that were less likely to bring value.
In doing so, we increased data efficiency upstream, by helping the commercial team decide which organizations they should invest time on.
Why did we consider this a traditional organization: hard data was not being used to segment the business landscape, and due to this a lot of time was being wasted chasing fruitless prospects.
Gains for the organization: Increased productivity across the organization.
What does this mean for you?
After reading the examples above, do you feel like some part of your organization might be data inefficient?
If so, the good news is: you don't need to break the bank to begin solving your problems - all you need is a low-risk, low-cost, high value added P.O.C. (proof of concept).
Let's give a concrete example of how this could happen. Let's imagine a fictional company ACME Corp., which sells consumer goods. Acme is not very data-driven, and also has the following characteristics:
- Lack of specialized resources to develop impactful data science projects;
- Unawareness about how they could better use their data, and what kind of pipelines they can implement;
- A large backlog of improvements and optimizations to be done;
- Too much data and no pipelines to process it effectively.
ACME reached out to us with the desire to improve their sales forecasts, which were under-performing and leading to wasted stock in some locations, and shortages in others.
The first step was to write user stories. These are small sentences that have the following shape:
As a <role>, I need to <summary of the needs> so that I <summary of the results>
In this case, one of the user stories could be, for example: "As a logistics manager, I need to be able to accurately estimate the sales volume in each location, so that I am able to better distribute resources and avoid stock waste/shortage".
This approach is focused on user experience, which is not that common in data projects, where usually the first steps are technical implementations. It allows every stakeholder to give feedback from day one and makes sure everyone is in agreement regarding the main goals. This in turn leads to a smoother company buy-in of the project.
Scoping out the existing data infrastructure
After having agreed on the user stories, we took some days to scope out what data is available, where it is stored, and how accessible it is. It turns out ACME had all its historical sales data stored in a very rudimentary database.
The data looked good enough to train a forecasting model, so we decided to follow that path. After a couple of weeks of experimenting with the historical data, we concluded that a basic version the model would've been able to predict sales with 95% accuracy on a small subset of locations and interval of time.
This was an improvement over ACME's forecasts, which were based on manual adjustments and intuition of the logistics managers. Furthermore, there was a good chance that the results could even get better, with proper model fime-tuning.
After presenting these promising results to ACME, it was decided to move forward with a longer term project. The goals were to:
- automate the transfer of daily sales data, and implement data pipelines to sanitize it;
- improve the model's results;
- complementing our data science work with strong software engineering practices, such as unit testing or continuous deployment.
This guarantees that our work will hold up after we leave, and makes it much easier for future developers to focus on what's really important - exploring the data to its full potential.
During the final stretch of the project, we also helped ACME hire an in-house data scientist to maintain and expand upon our work.
Data inefficiency can manifest itself in many subtle ways, and can lead to great losses of productivity and revenue. At Daredata, we believe that even organizations in more traditional or less data-intensive areas can benefit from data science and data engineering, if there's enough enthusiasm and desire to cultivate a data culture.
If you identified with the challenges mentioned in this post, we would love to exchange some ideas. Please find our contact in our website, https://daredata.engineering/.
Thank you for reading!