opensource.google.com

Menu

Improving Developer Experience for Writing Structured Data

Thursday, November 14, 2019

Though we’re still waiting on the full materialization of the promise of the Semantic Web, search engines—including Google—are heavy consumers of structured data on the web through Schema.org. In 2015, pages with Schema.org markup accounted for 31.3% of the web. Among SEO communities, interest in Schema.org and structured data has been on the rise in recent years.

Yet, as the use of structured data continues to grow, the developer experience in authoring pieces of structured data remains spotty. I ran into this as I was trying to write my own snippets of JSON-LD. It turns out, the state-of-the-art way of writing JSON-LD is to: read the Schema.org reference; try writing a JSON literal on your own; when you think you’re done, paste the JSON into a validator (like Google’s structured data testing tool); see what’s wrong, fix; and repeat, as needed.

If it’s your first time writing JSON-LD, you might spend a few minutes figuring out how to represent an enum or boolean, looking for examples as needed.

Enter schema-dts

My experience left me with a feeling that things could be improved; writing JSON-LD should be no harder than any JSON that is constrained by a certain schema. This led me to create schema-dts (npm, github) a TypeScript-based library (and an optional codegen tool) with type definitions of the latest Schema.org JSON-LD spec.

The thinking was this: Just as IDEs (and, later, language server protocols for lightweight code editors) supercharge our developer experience with as-you-type error highlighting and code completions, we can supercharge the experience of writing those JSON-LD literals.

With IDEs and language server protocols, the write-test-debug loop was made much tighter. Developers get immediate feedback on the basic correctness of the code they write, rather than having to save sporadically and feed their code to a compiler for that feedback. With schema-dts, we try to take validators like the structured data testing tool out of the critical path of write-test-debug. Instead, you can use a library to type-check your JSON, reporting errors as you type, and offering completions for `@type`s, property names, and their values.


Thanks to TypeScript’s structural typing and discriminated unions, the general shape of Schema.org’s JSON-LD can be well-represented in TypeScript typings. I have previously described the type theory behind creating a TypeScript structure that expresses the Schema.org class structure, enumerations, `DataType`s, and properties.

Schema-dts includes two related pieces: the ‘default’ schema-dts NPM package which includes the latest Schema.org definitions, and the schema-dts-gen CLI which allows you to create your own typing definitions from Schema.org-like .nt N-Triple files. The CLI also has flags to control whether deprecated classes, properties, and enums should be included, what `@context` should be assumed by objects you write, etc.

Goals and Non-Goals

The goal of schema-dts isn’t to make type definitions that accept all legal Schema.org JSON literals. Rather, it is to make sure we provide typings that always (or almost always) result in legal Schema.org JSON-LD literals that search engines would accept. In the process, we’d like to make sure it’s as general as possible, without sacrificing type checking and useful completions.

For instance, RDF’s perspective is that structured data is property-centric, and the Schema.org reference of the domains and ranges of properties is only a suggestion for what values are inferred as. RDF actually permits values of any type to be assigned to a property. Instead, schema-dts will actually constrain you by the Schema.org values.

***

If you’re passionate about structured data, try schema-dts and join the conversation on GitHub!

By: Eyas Sharaiha, Geo Engineering & Open Source scheme-dts Project
.