Is it time to think outside the spreadsheet?
The purpose of computation is insight, not numbers — Richard Hamming
The purpose of computation is insight, not numbers — Richard Hamming
Today, making decisions and communicating with numbers is still really hard. As an engineering team, we’ve been exploring the technical challenges of today’s programming languages and their impact on accessibility in data modeling.
One decision we’ve made on this journey is to build a new language to humanize computation with numbers in new ways. We wanted to share some of the challenges we see with existing programs…
To understand the current difficulties that exist in the world of programming and data modeling, we first need to think about how we got here. Computers started as custom-purpose calculation machines, built with a single purpose. For example, to calculate the trajectory of a bullet, decrypt a message or predict tides. But soon, as hardware progressed, people started building generic-purpose computers. Before, inputs were the program, but with new multi-purpose computers, they became programmable.
However, giving instructions to computers (programming) is no easy feat: you need to be familiar with many details of its hardware, software, and how to provide instructions in a way the computer will understand. Most often this task isn’t natural for a human to do and requires years of dedication and a certain level of expertise.
Remember the first high-level languages like COBOL? These programs were easier to understand than assembly languages but still required highly specialized skills. Since then, many other high-level languages have appeared. Some more approachable like Python or Ruby, and others more powerful and performant like C (but, less forgiving).
Yet, <1% of the population knows how to code. And, less than half of the world’s programmers know python, a requirement for most data tools.
We’re living in an increasingly data-rich world, yet most people can’t use the best tools.
In 1979 VisiCalc was released, it was a breakthrough toward the democratization of programming and working with numbers. It introduced a matrix where a user could input numbers inside each cell and quickly relate cells and perform calculations.
With this new spreadsheet software, tasks that used to take hours to do manually or that required specialized programmers and expensive computers were now made available to anyone who knew how to type and input a formula.
With VisiCalc and all following spreadsheet applications, people built many different applications such as data entry, payroll, taxes, accounting, operational planning, logistics and forecasts. At the time (and still today) this revolution allowed users to quickly adapt software to their needs, making spreadsheet applications the way most of us think with numbers, despite more powerful programming languages and tools.
Even though many spreadsheet applications started to democratize access to information and personalization, they didn’t come without their drawbacks. This type of software forces users to solve issues thinking “inside the box”which has lead to information silos, errors and plenty of unnecessary manual work. The reasons people typically build software generally fit within three buckets:
Decision Making — Software is used to model past, present or future reality. These models then may be used to plan, create forecasts or create different scenarios for better decision making.
Analyzing — Software is used to analyze the past, with different purposes such as extracting a behavior of a system to make better predictions or extracting indicators that will help in decision making.
Communicating — People need to communicate with one another in many and varying forms. Messaging, social networks, and e-mail applications fall into this category, but so do tools like Word or Powerpoint.
We can’t really predict all the ways people will use data analysis software. But for modeling, analysis and communication, the set of tools available today are too limited.
Don’t get me wrong — this is not to say that we think spreadsheets are bad. In fact, quite the opposite: they’re essential. It’s how they are used that can leave businesses open to vulnerability and can limit individuals from using data and numbers in more meaningful ways.
Here are some examples of areas that are currently problematic for many users:
With traditional programming languages and spreadsheets, people are using numbers. But, numbers have meaning. They represent a certain quantity of a subject (like the amount of sugar in a beverage, the office area or the percentage of a transaction fee). This means that numbers have an implicit unit (square meters, dollars, kilograms, etc.) and an implicit subject (grain of wheat, medicine, salary, office space, a transaction, etc.).
However, in traditional programming languages, numbers are treated as pure numbers without much context. From unit conversion errors, to scale and subject errors, this makes programs error-prone. These errors can lead to loss of value, loss of human lives, the loss of space probes and many other avoidable consequences if the system understood the context.
We need a way to make computers aware of the meaning and context behind the numbers they are processing. E.g. 2kgs or flour vs. 300g of sugar.
If you use spreadsheets, you can probably relate to this:
It’s often easier to build a new model versus expand an existing model to new circumstances.
The adaptability of a spreadsheet program is really poor. Let’s take an example:
You want to model the monthly cost of a new physical therapy business you are starting. You start with your biggest cost.
The cost of office space = (total area) x (monthly cost per unit of area)
Let’s say the business is going well. You’ve hired a team. Now, you need the model to account for space as a function of the number of employees.
The cost of office space is now related to the unit cost office space area andthe number of employees.
The amount of office space is no longer a single number. It is related to the number of employees and the cost of a unit of office space area.
And, on and on. As we make the model more complex, numbers are turned into relations of different quantities.
Spreadsheet software does not allow us to easily declare relationships between entities. Instead, each small calculation is usually encoded. The software does not have information on the relationships between these entities, making the programmer or spreadsheet author have to change the calculations accordingly. Not only this is more complex and time-consuming, but it’s also error-prone.
Why should adding complexity to a model also mean redoing a part of our program?
As we make models in our spreadsheets more complex, we quickly fall prey to that spreadsheet. It is no longer clear to someone other than its author. And, often the author loses track!
A new person is not able to easily understand where to start, as it hidesrelationships between cells. Serious effort must be made to decode the spreadsheet and the context behind it.
While other languages like python may be harder to learn, the ability to contribute to someone else’s work sets a foundation for community, extensibility and a shared language for connected thought and learning. Spreadsheets create information silos.
Also, spreadsheets have no inherent way of collaborating across different documents. Spreadsheets can hardly be reused (other than making copies), and cannot easily embed data and calculations from other spreadsheets (without a lot of effort).
We need a better and more transparent way for people to collaborate, to take data and calculations out of their silos and allow people to communicate and reuse them easily.
In most situations, complex spreadsheets are not used for communication.
Instead, their values or graphics are extracted and embedded in a document, e-mail or Powerpoint presentation. They lose their provenance, and there’s no good way for the recipient of that message to validate the process or correction of the data that they’re being presented with. They’re getting a value out of context in a static document.
Imagine if the medium that you communicated in was also the medium in which the calculations and data lived. All the data would be traceable to its origin and easily understood with minimal effort by anyone with enough diligence.
Documents like these are not meant to be static. These documents are meant to be living things, updated in real-time, created, changed and consumed by many people collaborating.
At Decipad, we believe a better and more accessible way to create, collaborate and share number-related knowledge is possible.
The software engineering and open source communities have opened up the world. They have helped me personally in expressing my creativity and improving my ingenuity, allowing me to help others be more productive and efficient. We need to make it easier for more people to use data tools that promote collaboration, extensibility and shared learning.
We’re just getting started. But, in the coming months, we will be sharing how we’re trying to approach issues like units, dimensions, reuse and more.