SeattleDataGuy

Posted on Feb 15, 2021

What In The World Is Dremio And Why Is It Valued At 1 Billion Dollars?

#bigdata #datascience #database #culture

The CTO's Focus Of The Week

Last year we watched as Snowflake sky-rocketed in value. Nearing nearly 100 Billion dollars in value, the start-up's IPO was the largest software IPO in history as of its debut.

But at the end of the day.

Snowflake is a data warehouse and analytics company.

And it's not the only database and analytics company that is currently pushing to be part of the data boom.

Two companies in the last few weeks have both raised upwards of 100 million dollars with billion-dollar valuations.

These companies are Dremio and Cockroach Labs.

Cockroach Labs has just raised another round of over $100 million, totaling more than $355 million in funding.

Both companies focus on different angles of data.

Dremio's focus is on eliminating the reliance on data-warehousing. They feel the practice is holding companies back and companies will need to modernize their data stacks to fully take advantage of said data.

Cockroach Labs has a scalable database management system that allows developers to take advantage of both classic RDBMS benefits while also allowing for the scalability of NoSQL systems. It scales horizontally and can survive disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention.

But really, why all the funding?

Why is so much venture funding into these data companies?

Because companies that can master big data, data analytics, and data science are proving to have an edge.

Last year McKinsey came out with a study that found that companies could attribute 20% of their bottom lines to AI and machine learning(based on the companies they surveyed).

Data makes a huge difference.

And companies that allow you to make that difference quickly will be rewarded.

That's a great segue into our next section.

I had two people sign up for office hours with a data consultant and I wanted to share some of the insights I provided around best data practices.

Ask A Data Consultant --- Office Hours

I held my first office hours this week and I think it went great. I got to answer a lot of great questions and hopefully helped provide a lot of insights for those who signed up.

I have listed two of the questions that were asked below because I think they can provide a lot of great perspective to anyone in the data field.

Next Open Office Hours

Questions From Last Weeks Office Hours

1. Should I get a separate Tableau server for external and internal clients?

Best practices suggest that you have two Tableau servers for your external and internal Tableau users.

Why?

Think about it this way.

Tableau is a door into your companies other dashboards and data sets. Kind of like the image below.

So if you only use one Tableau server for both internal and external usage you are exposing that door, even with a lock on it, to the outside world is a risk.

You are providing a portal for your external customers to access internal information.

If the only dashboards on your internal server are not sensitive, then some people may argue that you don't need to servers. But overall, you never know what data someone will put on your Tableau server(Also, let's be real, many people don't even know 2/3 of the data that is on their Tableau server).

This is why it is better to just assume the worst and have two servers. It reduces any risk of someone seeing something they shouldn't

2. How can I reduce the number of joins in a star-schema to provide a better experience for data scientists and analysts?

One of the easy ways to reduce the number of joins your analysts and data scientists perform is to create an analytical table.

What I mean by that is to create a pre-joined set of tables/materialized views that are commonly used. This does equate to more data being stored but you also reduce the amount of compute used because you only need to join your data once and you reduce the likelihood of errors.

Let's look at an example.

Below is an example of a star-schema model and a pre-joined table.

Instead of providing your data scientists with this core level of data. You could create another table.

Some people might add a "_plus" or "_analytical" suffix at the end to denote the fact that the table is not a core table. Instead, it is a denormalized version of the data that is easy for data scientists and analysts to use. However, it also comes with more risk.

More tables that the new analytical table depends on equates to an increased likelihood of failure. It's just how it goes.

There are a lot of benefits to creating this analytical layer, so I actually would recommend developing it.

Articles Worth Reading

No One Knows What Anything Is Worth

Opinion: As coverage on Wall Street Bets and TikTok finance gurus becomes mainstream. The idea of people knowing what an asset is worth is becoming less reliable every day. Alex Wilhelm covers this concept in his article, No One Knows What Anything Is Worth.

First Few Sentences: It was yet another week of startups that became unicorns going public, only to see their valuation soar. Already marked up by their IPO pricing, seeing so many unicorns achieve such rich public-market valuations made us wonder who was mispricing whom.

It's a matter of taste, a semantic argument, a tempest in a teacup. What matters more is that precisely no one knows what anything is worth, and that's making a lot of people rich and/or mad.

What Is Data Virtualization

Opinion: As various forms of workarounds for ETLs form. A question developers and managers need to start asking are. What is data virtualization?

This article is slightly older, but it will give you a general idea of what data virtualization is.

First Few Sentences: At its core, data virtualization falls within the domain of data integration. But unlike traditional data integration where Extract-Transform-Load (ETL) processes are used to physically move copies of data from disparate sources into a single, unified data warehouse, there is no physical movement of data with data virtualization. Source data remains where it is --- no additional copies are created and it is not moved physically anywhere when data virtualization technology is used. Instead, different views or snapshots of enterprise-wide data are provided through a virtualized service layer designed on top of disparate sources. In other words, data is accessed from where it resides.

5 Data Analytics Challenges Companies Face in 2021 With Solutions

A Few Sentences: Integrating data into strategy is proving to be a differentiator for businesses of all sizes.

The cliche term "Data-Driven" is for more than just a billion-dollar tech company.

Companies like DiscoverOrg and MVF are using data to help drive their decisions and create better products.

Even smaller companies are finding savings and new revenue opportunities left and right thanks to data.

However, this is all easier said than done.

Companies We Are Watching

Pecan.ai

Everyone wants to give analysts more power. Fivetran wants to get data to analyst faster with ELTs, dbt wants to make it easier to create transforms and Pecan.ai wants to give analysts the ability to quickly create ML models.

Actually, let me rephrase that, according to Bronfman.

"The innovative thing about Pecan is that we do all of the data preparation and data, engineering and data processing, and [complete the] various technical steps [for you],"

Pecan.ai has model templates that allow your team to easily implement constant models. For example, Pecan can automatically update a data repository with data the algorithm is measuring, such as churn rate.

This start-up just came out of Stealth mode last year and recently raised $11 million of funding and we look forward to seeing what will come of it.

Equalum

Equalum helps transport data from one platform to another. Nir Livneh, previously director of product management at Dell acquisition target Quest Software and a big data architect in the U.S. Army, found it difficult to manage even this basic task.

He would constantly need to manage data movement across systems that were highly disparate and struggled to find a product that could help manage these flows in real-time.

After trying (and failing) to find a plug-and-play product capable of performing real-time analytics on data sets from a range of apps, he created his own --- Equalum --- in 2015. It has now raised $7 million and we are curious to see if it will deliver.

The End Of Day Two

2021 is not slowing down.

VCs, SPACs, and r/wallstreetbets seem to be driving investments to the moon.

Data companies are getting billion-dollar valuations and 19-year-olds are becoming paper millionaires.

The roaring 20s has arrived.

But as a technology consultant.

I care less about the funny money valuations and more about the tech.

And as a reader, I hope you do too!

Thanks for following along, we hope to see you back soon!

If you have questions about data, technology, or programming or have an article or company you think is worth sharing, then please send them our way!

DEV Community