Google Cloud recently launched AutoML Entity Extraction for Healthcare, a low-code tool used to build information extraction models for healthcare applications. There remains a significant execution roadblock on AutoML DIY initiatives caused by the complexity of translating the human cognitive process into machine-readable instructions. Today, this translation occurs thanks to human annotators who annotate text for relevant insights. Yet, training human annotators is a complex endeavor which requires knowledge across fields like linguistics and neuroscience, as well as a good understanding of the business domain. With AutoML, Google wanted to democratize who can build AI. The Healthcare Text Annotation Guidelines are a starting point for annotation projects deployed for healthcare applications.
The guidelines provide a reference for training annotators in addition to explicit blueprints for several healthcare annotation tasks. The annotation guidelines cover the following:
- The task of medical entity extraction with examples from medical entity types like medications, procedures, and body vitals.
- Additional tasks with defined examples, such as entity relation annotation and entity attribute annotation. For instance, the guidelines specify how to relate a medical procedure entity to the source medical condition entity, or how to capture the attributes of a medication entity like dosage, frequency, and route of administration.
- Guidance for annotating an entity’s contextual information like temporal assessment (e.g., current, family history, clinical history), certainty assessment (e.g., unlikely, somewhat likely, likely), and subject (e.g., patient, family member, other).
We’re thrilled to open source this project. We hope the community will contribute to the refinement and expansion of the Healthcare Text Annotation Guidelines, so they mirror the ever-evolving nature of healthcare.
By Andreea Bodnari, Product Manager and Mikhail Begun, Program Manager—Google Cloud AI