bg gradient

Google BigQuery Data types - An Ultimate Guide

Google BigQuery offers a number of data types that you can use to store your data. Having a firm understanding of these data types is necessary if you want to take full advantage of the warehousing tool's on-demand offerings and capabilities. By understanding the different data types available, you can choose the best option for storing your data in BigQuery. BigQuery is used to store and query very large datasets, so it's important to understand the capabilities of the system before working with it. It can be overwhelming trying to keep track of all the different data types available in BigQuery. To help you out, we've compiled a list of the most common data types that you'll come across:

Understanding the different Google BigQuery Data Types

NUMERIC

The NUMERIC data type is used to store numbers. This data type can store up to 38 digits of precision and 9 decimal places. Numbers stored as NUMERIC data types will have 64-bit precision.

Arithmetic/ Mathematical Functions

You can use the mathematical functions with the NUMERIC data type to perform various arithmetic operations. These include addition, subtraction, multiplication, division, modulo, exponential function, natural logarithm, square root, etc.

The following numerical functions can be performed on the NUMERIC data type:

ADD()

SUB()

MULTIPLY()

DIVIDE()

MOD()

EXP()

LN()

LOG2()

LOG10()

POW()

SQRT()

CEILING() and FLOOR()

ABS()

SIGN()

TRUNCATE()

Standard Compliant Floating Point Division

The IEEE_divide function can be used to avoid division by zero errors in SQL queries. This function returns NaN (not-a-number) when a division by zero is attempted, which enables the query to run without returning any errors. For example:

IEEE_divide(1,0) = NaN

IEEE_divide(2,3) = 0.666666

Using specific functions like IEEE_divide can help to prevent errors and ensure that SQL queries run smoothly.

Safe Functions

With the help of the SAFE command, you can return NULL for mathematical operations that would otherwise return an error. This is because the SAFE command enables you to perform undefined operations safely. For example, if you try to take the logarithm of a negative number, it will usually result in an error. However, with the SAFE command, it will simply return NULL. This can be extremely helpful in cases where you need to perform operations that may not have a defined answer.

Precise Decimal Calculations with Numeric 

Numeric data types are essential for financial and accounting applications that require exact values for calculations. BigQuery uses 16 Bytes for storage and can represent decimal figures accurately, making it an ideal choice for such applications. Let's say we want to perform a calculation and want our answer to be precise up to the 18th decimal place. We can use the NUMERIC data type with a scale of 18 to perform such precise calculations.

BOOLEAN

The BOOLEAN data type is used to store true/false values. Values stored as BOOLEAN data type can be either 1 (true) or 0 (false).

Logical Operations

You can use logical operators with the BOOLEAN data type to perform various operations. These include: and, or, not.

The following logical operators can be performed on the BOOLEAN data type:

AND()

OR()

NOT()

These are generally used in SQL queries to filter data based on certain conditions.

Conditional Operations

Conditional operations are used to check if a condition is met and then return a certain value accordingly. If the condition is not met, then another value is returned.

The following is the syntax of a conditional operation that can be performed on the BOOLEAN data type:

CASE expr

  WHEN expr_to_match THEN result

  [ ... ]

  [ ELSE else_result ]

STRING

BigQuery supports many functions for working with strings. Strings can be of variable length and must be quoted with single, double, or triple quotation marks. The following functions are available for string manipulation:

Casting and Coercion

The CAST function can be used to convert most data types from one type to another. However, using CAST may cause a query to fail if BigQuery is unable to perform the conversion. To avoid this, you can use SAFE_CAST. This will protect your queries from errors that could occur during casting. BigQuery will automatically convert the result type of an argument expression to another type if needed to match function signatures. 

For example, if a function is defined to take a single argument of type FLOAT64 and an expression being used as an argument has a result type of INT64, the expression's result will be converted to the FLOAT64 type before func() is computed. This allows for greater flexibility when working with functions.

Concatenation

The CONCAT command is a useful tool for manipulating strings. It can be used to concatenate separate parts of different strings into one, making it easy to combine text from different cells or the results of mathematical operations. This can be helpful when trying to create a single string from multiple sources, or when needing to perform complex string operations in a concise way.

For example, let's say we have two cells, A1 and A2, and each cell contains a different word:

A1: Hello

A2: World

If we wanted to combine these two words into one cell, we could use the CONCAT command. The resulting cell would look like this:

HelloWorld

Regular Expressions

Regular expressions are an important skill for anyone working with data. They can be used to match, search and replace text, or to validate a string input. For example, you may want to use a regular expression to extract certain parts of a string, or to validate whether a string has a specific format. Regular expressions can also be used to replace or remove certain characters.

BigQuery supports  RegEx functions like REGEXP_CONTAINS, REGEXP_EXTRACT, REGEXP_EXTRACT_ALL, REGEXP_REPLACE, etc which can be used to perform various operations on strings.

REGEXP_CONTAINS: This function returns true if the given string contains a match of the regular expression, or false otherwise.

REGEXP_EXTRACT: This function extracts the first matched sequence of characters that matches the regular expression pattern.

REGEXP_EXTRACT_ALL: This function extracts all matches of the regular expression from the given string.

REGEXP_REPLACE: This function replaces all matches of the regular expression in the given string with a replacement string. If there are no matches, it returns the original string.

BYTES

Although bytes and strings might look interchangeable, they are actually two different types that cannot be used interchangeably. Most functions that work on strings will also work on bytes, but the bytes version operates on raw data instead of Unicode characters. It's important to remember that bytes are distinct from strings, and trying to use them interchangeably can cause errors. Bytes are represented using either single, double, or triple quotation marks, but must always begin with the letter B (for example: b'hello').

TIME

The TIME data type represents a time of day, with no reference to a specific date. TIME values can range from 00:00:00 to 23:59:59. BigQuery supports various data types for Time instances they include:

TIME: A TIME value represents a time of day, as might be displayed on a clock.

TIMESTAMP: For representing an absolute point in time, a TIMESTAMP value is used.

DATE

If we want to represent a day, month and year, DATE values are used.  DATE values can range from 0000-01-01 to 9999-12-31. BigQuery supports various data types for Date instances they include:

DATE: A DATE value represents a day, with no reference to a specific time or time zone.

DATETIME: For representing an absolute point in time that includes both date and time components, DATETIME values are used.

GEOGRAPHY

Points, lines, and polygons on the surface of the earth can be represented using the Geography data type in BigQuery. Positions are represented in the World Geodetic System 1984 (WGS84) reference ellipsoid, which is also used by the Global Positioning System (GPS). This means that longitude and latitude can be used directly in BigQuery to find the location of a place. The geography type is a collection of the following objects:

Point geography:

A point on the Earth's surface is defined by its latitude and longitude. Latitude is a measure of how far north or south of the equator a point is, and is measured in degrees. Longitude is a measure of how far east or west of the prime meridian a point is, and is measured in degrees (or sometimes in time). Together, latitude and longitude define a point's position on the Earth's surface.

Linestring geography:

A linestring is a sequence of points that connects to form a line. Unlike polygons, linestrings do not have defined sides. Instead, they are created by connecting a series of latitude and longitude coordinates to create a path.

Polygon geography:

The polygon is the most complex of the three geometry types. A polygon is defined by a sequence of points, which together form the polygon's exterior boundary. Each point in the boundary has a latitude and longitude, which together define its position on the Earth's surface. In addition to an exterior boundary, a polygon can also have one or more interior boundaries, which define holes in the polygon.

ARRAY

The ARRAY data type stores an ordered list of zero or more values of the same data type. Arrays are used to store multiple values in a single column, and can be useful for working with data that is represented as a list (for example, a JSON object).

STRUCT

A STRUCT is an ordered collection of fields, each of which has a name and a data type. Structs can be used to store complex data types in a single column and can be useful for working with data that is represented as a set of key-value pairs (for example, a JSON object).

Conclusion

Datatypes are an important concept of BigQuery, and understanding how to use them is critical to getting the most out of the platform. In this article, we've looked at the different data types available in BigQuery, as well as how to convert between them. Boltic can help you to store and query very large datasets, Boltic is a freemium tool that makes it easy to work with BigQuery. With our easy-to-use interface, you can easily upload data, create tables, and run queries.

Our team can also help you to optimise your queries for performance and cost. Sign up for Boltic free account and get all of the benefits of BigQuery without having to worry about the technical details. Contact us today to learn more about how we can help you get the most out of BigQuery.

FAQ

What is the record data type in BigQuery?

The record data type in BigQuery is used to store a collection of fields, each of which has a name and a data type. Records can be used to store complex data types in a single column and can be useful for working with data that is represented as a set of key-value pairs (for example, a JSON object).

How many types of tables are there in BigQuery?

There are two types of tables in BigQuery: logical tables and physical tables. Logical tables are the tables that you query and physical tables are the underlying storage for those logical tables. Physical tables can be either materialised or non-materialized. Materialised physical tables are stored in BigQuery, while non-materialized physical tables are not stored and are only used for query optimization.

What is the schema in BigQuery?

arrow down
A schema is a description of the structure of a BigQuery table. It can be specified when a table is created, or it can be generated automatically by BigQuery. Schemas are used to define the data types of fields in a table, as well as the relationships between fields.

What is the STRUCT data type in BigQuery?

STRUCT is used to store an ordered collection of fields, each of which has a name and a data type. They are useful for representing complex data types in a single column. It really comes in handy when you have data that is represented as a set of key-value pairs (for example, a JSON object).

What is the difference between float and numeric in BigQuery?

The float data type is a 32-bit floating-point number, while the numeric data type is an arbitrary-precision decimal number. Float has a smaller range and precision than numeric, but it is faster to compute with. Numeric should be used for values that require a large range or high precision.

How do I change the datatype of a column in BigQuery?

To change the datatype of a column in BigQuery, you can use the ALTER TABLE statement. Using this statement, you can change the datatype of a column to any other valid BigQuery datatype.
Kickstart your journey with Boltic & make data operation easy
Try Now

Spend less time building pipelines and more time scaling your business

Manage Big Data operations with a free forever plan
No credit card required

Boltic is a Modern Enterprise Grade Data Platform for businesses of all sizes and industry. Built with the vision to simplify data exploration and make work life easier with automation.

Solve advanced data problems, automate ETL workflows, build and share reports at scale. Easily integrate data from multiple sources, transforming it, and sending it to desired destinations.

© 2024 Shopsense Retail Technologies  |  #MadeInIndia