opensource.google.com

Menu

Full support of PostgreSQL engine comes to Logica

Tuesday, September 26, 2023

Logica is a logic programming language designed for intuitive and efficient data manipulation, which we open sourced in 2020. It compiles to SQL, providing access to the power of SQL engines with the convenience of a logic programming syntax.

When it was open sourced, Logica's only fully supported engine was BigQuery, a powerful data warehouse, executing queries with high parallelization and processing terabytes of data within seconds.

Modern machines can store and process significant amounts of data, even within a single computer. Thus relational SQL databases are as popular as ever. They contain a lot of data and its analysis is important. Among open source database options, PostgreSQL and SQLite are some of the most popular database engines (example1, example2). Logica added support for SQLite in 2021.

Now we are pleased to announce a new release of Logica that adds support for PostgreSQL.

As Logica compiles to SQL, it is natural to extend the language to use PostgreSQL as the engine. However, there are nuances in the SQL dialect of Postgres which require addressing. The biggest distinction is that PostgreSQL requires types of records to be explicitly spelled out in your query, while BigQuery determines the types automatically.

For example, consider a Logica predicate where for each user we collect a list of records with information about their purchases.

UserPurchases(

    user_id:,

    user_name:,

    purchases? List= {item_name:, item_price:}) distinct :-

  Purchase(purchase_id:, user_id:, item_name:, item_price:),

  UserInfo(user_id:, user_name:);


We can translate this Logica predicate to GoogleSQL to run on BigQuery as follows:

SELECT

  user_id,

  user_name,

  ARRAY_AGG(STRUCT(item_name as item_name, item_price as item_price)) as purchases

FROM

  Purchases INNER JOIN

  UserInfo USING (user_id)

GROUP BY 1, 2;

Logica's record {item_name:, item_price:} simply compiles into GoogleSQL's STRUCT(item_name as item_name, item_price as item_price).

However, in the dialect of PostgreSQL composite types must be explicitly defined and specified. In our example, we need to define the type PurchaseRecord with fields item_name and item_price. We should also specify in the query that the purchases column is aggregating records of type PurchaseRecord. Thus PostgreSQL query for our predicate would be written like so.

CREATE TYPE PurchaseRecord as (item_name text, item_price numeric);


SELECT

  user_id,

  user_name,

  ARRAY_AGG(ROW(item_name,

                item_price)::PurchaseRecord) AS purchases

FROM

  Purchase INNER JOIN

  UserInfo USING (user_id)

GROUP BY UserInfo.user_id, UserInfo.user_name;


Records and lists are also useful as intermediates in calculations, even if the input and output data are normalized. For example, we have a table called ItemSales and want to find a list of most sold items in each of the stores that the table describes. Specifically, we want to assemble a table with information about the top three most sold items among all of the stores. For each of the items, we may want to list the department of the store where the item is being sold. This can be achieved intuitively using the ArgMax3 aggregate function, which accumulates all the information about the items that we need, and no extra join is needed.


# Collecting information of top 3 most sold items for each store.

StoreTopItemsCollection(store) ArgMax3= {item:,

                                         department:} -> sales_volume :-

  ItemSales(store:, item:, department:, sales_volume:);


# Flattening top items collection.

StoreTopItems(store:, item:, department:) :-

  {item:, department:} in StoreTopItemsCollection(store);


To support the PostgreSQL engine, we extended the Logica compiler with type inference. Logica now infers data types for all expressions that a user employs. For records and arrays, Logica specifies their type in the produced SQL, just as PostgreSQL requires. Commands to create necessary types are produced as part of the compiled SQL. In this collab, we show an example of a program that writes a PostgreSQL table, and in this collab, we show how to give type hints when the program does not have enough information for complete inferences.

As a byproduct of type inference, we were able to improve error messages. Now that we know the types, we can point to the user where a mistake is made within the Logica program, rather than the user having to debug the generated SQL statement.

PostgreSQL is a popular and powerful engine. It is easy to start your own instance (maybe just in CoLab!), or use a serverless option. We are excited to provide users of Logica with the option to run on Postgres. If you already use PostgreSQL, we encourage you to give Logica a try, it is a joy to write data analysis with logic programming! If you have any feedback or questions, please share at the discussion section of Logica repository.

By Evgeny Skvortsov, Software Engineer – Google

.