opensource.google.com

Menu

Posts from September 2023

Full support of PostgreSQL engine comes to Logica

Tuesday, September 26, 2023

Logica is a logic programming language designed for intuitive and efficient data manipulation, which we open sourced in 2020. It compiles to SQL, providing access to the power of SQL engines with the convenience of a logic programming syntax.

When it was open sourced, Logica's only fully supported engine was BigQuery, a powerful data warehouse, executing queries with high parallelization and processing terabytes of data within seconds.

Modern machines can store and process significant amounts of data, even within a single computer. Thus relational SQL databases are as popular as ever. They contain a lot of data and its analysis is important. Among open source database options, PostgreSQL and SQLite are some of the most popular database engines (example1, example2). Logica added support for SQLite in 2021.

Now we are pleased to announce a new release of Logica that adds support for PostgreSQL.

As Logica compiles to SQL, it is natural to extend the language to use PostgreSQL as the engine. However, there are nuances in the SQL dialect of Postgres which require addressing. The biggest distinction is that PostgreSQL requires types of records to be explicitly spelled out in your query, while BigQuery determines the types automatically.

For example, consider a Logica predicate where for each user we collect a list of records with information about their purchases.

UserPurchases(

    user_id:,

    user_name:,

    purchases? List= {item_name:, item_price:}) distinct :-

  Purchase(purchase_id:, user_id:, item_name:, item_price:),

  UserInfo(user_id:, user_name:);


We can translate this Logica predicate to GoogleSQL to run on BigQuery as follows:

SELECT

  user_id,

  user_name,

  ARRAY_AGG(STRUCT(item_name as item_name, item_price as item_price)) as purchases

FROM

  Purchases INNER JOIN

  UserInfo USING (user_id)

GROUP BY 1, 2;

Logica's record {item_name:, item_price:} simply compiles into GoogleSQL's STRUCT(item_name as item_name, item_price as item_price).

However, in the dialect of PostgreSQL composite types must be explicitly defined and specified. In our example, we need to define the type PurchaseRecord with fields item_name and item_price. We should also specify in the query that the purchases column is aggregating records of type PurchaseRecord. Thus PostgreSQL query for our predicate would be written like so.

CREATE TYPE PurchaseRecord as (item_name text, item_price numeric);


SELECT

  user_id,

  user_name,

  ARRAY_AGG(ROW(item_name,

                item_price)::PurchaseRecord) AS purchases

FROM

  Purchase INNER JOIN

  UserInfo USING (user_id)

GROUP BY UserInfo.user_id, UserInfo.user_name;


Records and lists are also useful as intermediates in calculations, even if the input and output data are normalized. For example, we have a table called ItemSales and want to find a list of most sold items in each of the stores that the table describes. Specifically, we want to assemble a table with information about the top three most sold items among all of the stores. For each of the items, we may want to list the department of the store where the item is being sold. This can be achieved intuitively using the ArgMax3 aggregate function, which accumulates all the information about the items that we need, and no extra join is needed.


# Collecting information of top 3 most sold items for each store.

StoreTopItemsCollection(store) ArgMax3= {item:,

                                         department:} -> sales_volume :-

  ItemSales(store:, item:, department:, sales_volume:);


# Flattening top items collection.

StoreTopItems(store:, item:, department:) :-

  {item:, department:} in StoreTopItemsCollection(store);


To support the PostgreSQL engine, we extended the Logica compiler with type inference. Logica now infers data types for all expressions that a user employs. For records and arrays, Logica specifies their type in the produced SQL, just as PostgreSQL requires. Commands to create necessary types are produced as part of the compiled SQL. In this collab, we show an example of a program that writes a PostgreSQL table, and in this collab, we show how to give type hints when the program does not have enough information for complete inferences.

As a byproduct of type inference, we were able to improve error messages. Now that we know the types, we can point to the user where a mistake is made within the Logica program, rather than the user having to debug the generated SQL statement.

PostgreSQL is a popular and powerful engine. It is easy to start your own instance (maybe just in CoLab!), or use a serverless option. We are excited to provide users of Logica with the option to run on Postgres. If you already use PostgreSQL, we encourage you to give Logica a try, it is a joy to write data analysis with logic programming! If you have any feedback or questions, please share at the discussion section of Logica repository.

By Evgeny Skvortsov, Software Engineer – Google

Showing Our Work: A Study In Understanding Open Source Contributors

Thursday, September 7, 2023

In 2022, the research team within Google’s Open Source Programs Office launched an in-depth study to better understand open source developers, contributors, and maintainers. Since Alphabet is a large consumer of and contributor to open source, our primary goals were to investigate the evolving needs and motivations of open source contributors, and to learn how we can best support the communities we depend on. We also wanted to share our findings with the community in order to further research efforts and our collective understanding of open source work.

Key findings from this work suggest that community leaders should:

  • Value your time together and apart: Lack of time was cited as the leading reason ‘not to contribute’ as well as motivation to ‘leave a community’. This should encourage community leaders to adopt practices that ensure that they are making the most of the time they have together. One example: some projects have planned breaks, no-meeting weeks, or official slowdowns during holidays or popular conference weeks.
  • Invest in documentation: Contributors and maintainers expressed that task variety, delegation, and onboarding new maintainers could help to reduce burnout in open source. Documentation is one way to make individual knowledge accessible to the community. In addition to technical and procedural overviews, documentation can also be used to clarify roles, tasks, expectations, and a path to leadership.
  • Always communicate with care: Contributors prefer projects that have welcoming communities, clear onboarding paths, and a code of conduct. Communication is the primary way for community leaders to promote welcoming and inclusive communities and set norms around language and behavior (as documented in a Code of Conduct). Communication is also how we build relationships, trust, and respect for each other.

  • Create spaces for anonymous feedback: Variable answers between demographic subsets in our research suggest that while systematic approaches can be taken to reduce burnout, there is no one-size-fits-all approach. Feedback is a valuable tool for any project to adjust to the evolving needs of their contributor and user communities. When designed appropriately, surveys can serve as safe, anonymous, retaliation-free spaces for individuals to provide honest feedback.

How do contributors select projects?

We asked respondents to share their most important criteria when selecting an open source project to contribute to in their personal time. The top responses were: welcoming community, clear onboarding path, and code of conduct.
Base: 517 international OSS developers, contributors, maintainers and students who worked on open source in their personal time

Within Google’s Open Source Programs office, we are constantly looking for ways to improve support for contributors inside and outside of Google. Studies such as this one provide guidance to our programs and investments in the community. This work helps us to see we should continue to:

  • Invest in documentation competency: Google Season of Docs provides support for open source projects to improve their documentation and gives professional technical writers an opportunity to gain experience in open source.
  • Document roles and promote tactics that recognize work within communities: The ACROSS project continues to work with projects and communities to establish consistent language to define roles, responsibilities, and work done within open source projects.
  • Exercise and discuss ‘better’ practices within the community: While we continually seek to improve our engagement practices within communities, we will also continue to share these experiences with the broader community in hopes that we can all learn from our successes and challenges. For example, we’ve published documentation around our release process, including resources for the creation and management of a code of conduct.

This research, along with other articles authored by the OSPO research team is now available on our site.

By Sophia Vargas – Researcher, Google Open Source Programs Office

GSoC 2023: project results and feedback part 1

Wednesday, September 6, 2023



In 2023, Google Summer of Code brought 966 new contributors into open source software development to work with open source organizations on a 12+ week project. We had 168 participating open source organizations with mentors and contributors from over 75 countries this year.

For 19 years, Google Summer of Code has thrived due to the enthusiasm of our open source communities and the 19k+ volunteer mentors that spend from 50-150 hours mentoring each of our 20k contributors since 2005! This year, there are 168 mentoring organizations and over 1,950 mentors participating in the 2023 program. A sincere thank you to our mentors and organization administrators for guiding and supporting our contributors this year. We are also looking forward to hosting many of the 2023 GSoC Mentors on campus this fall for the annual Mentor Summit.

September 4th concluded the standard 12-week project timeline and we are pleased to announce that 628 contributors have successfully completed this year’s program as of today, September 5th, 2023. Congratulations to all the contributors and mentors that have wrapped up their summer coding projects!

2023 has shown us that GSoC continues to grow in popularity with students and developers 19 years after the program began. GSoC had a record high 5,679 contributor applicants from 106 countries submit their project proposals this year. We also had huge interest in the program with over 43,765 registrants from 160 countries applying to the program during the two week application period.

The final step of every GSoC program is to hear back from mentors and contributors on their experiences through evaluations. This helps GSoC Admins continuously improve the program and gives us a chance to see the impact the program has on so many individuals! Some notable results and comments from the standard 12-week project length evaluations are below:

  • 95.63% of contributors think that GSoC helped their programming skills
  • 99.06% of contributors would recommend their GSoC mentors
  • 97.81% of contributors will continue working with their GSoC organization
  • 99.84% of contributors plan to continue working on open source
  • 82.81% of contributors said they would consider being a mentor
  • 96.25% of contributors said they would apply to GSoC again

Here’s some of what our GSoC 2023 Contributors had to say about the program!


At the suggestion of last year’s contributors, we added multiple live talks throughout the coding period to bring contributors together, providing tips to help them make the most of their GSoC experience. Each of these talks were attended on average by 42% of the 2023 GSoC contributors.

Another request from our previous contributors was to hear more about the cool projects their colleagues did over the summer and the opportunity to talk about their own projects with others. Over the coming weeks we are hosting three lightning talk sessions where over 40 of the 2023 contributors will have the opportunity to present their project learnings to the other contributors and their mentors.

We’ll be back in a couple of months to give a final update on the GSoC projects that will conclude later this year. Almost 30% of contributors (286 contributors) are still completing their projects, so please stay tuned for their results in part two of this blog post later this year!

By Perry Burnham – Google Open Source

.