Coding your way into cinemas

Monday, March 19, 2018

This is a guest post from apertus° and, open source organizations that participated in Google Summer of Code last year and are back for 2018!

The apertus° AXIOM project is bringing the world’s first open hardware/free software digital motion picture production camera to life. The project has a rich history, exercises a steadfast adherence to the open source ethos, and all aspects of development have always revolved around supporting and utilising free technologies. The challenge of building a sophisticated digital cinema camera was perfect for Google Summer of Code 2017. But let’s start at the beginning: why did the team behind the project embark on their journey?

Modern Cinematography

For over a century film was dominated by analog cameras and celluloid, but in the late 2000’s things changed radically with the adoption of digital projection in cinemas. It was a natural next step, then, for filmmakers to shoot and produce films digitally. Certain applications in science, large format photography and fine arts still hold onto 35mm film processing, but the reduction in costs and improved workflows associated with digital image capture have revolutionised how we create and consume visual content.

The DSLR revolution

Photo by Matthew Pearce
licensed CC SA 2.0.
Filmmaking has long been considered an expensive discipline accessible only to a select few. This all changed with the adoption of movie recording capabilities in digital single-lens reflex (DSLR) cameras. For multinational corporations this “new” feature was a relatively straightforward addition to existing models as most compact digital photo cameras could already record video clips. This was the first time that a large diameter image sensor, a vital component for creating the typical shallow depth of field we consider cinematic, appeared in consumer cameras. In recent times, user groups have stepped up to contribute to the DSLR revolution first-hand, including groups like the Magic Lantern community.

Magic Lantern

Photo by Dave Dugdale licensed CC BY-SA 2.0.
Magic Lantern is a free and open source software add-on that runs from a camera’s SD/CF card. It adds a host of new features to Canon’s DSLRs that weren't included from the factory, such as allowing users to record high-dynamic range (HDR) video or 14-bit uncompressed RAW video. It’s a community project and many filmmakers simply wouldn’t have bought a Canon camera if it weren’t for the features that Magic Lantern pioneered. Because installing Magic Lantern doesn’t replace the stock Canon firmware or modify the read-only memory (ROM) but runs alongside it, it is both easy to remove and carries little risk. Originally developed for filmmaking, Magic Lantern’s feature base has expanded to include tools useful for still photography as well.

Starting the revolution for real 

Of course, Magic Lantern has been held back by the underlying proprietary hardware routines on existing camera models. So, in 2014 a team of developers and filmmakers around the apertus° project joined forces with the Magic Lantern team to lay the foundation for a totally independent, open hardware, free software, digital cinema camera. They ran a successful crowdfunding campaign for initial development, and they completed hardware development of the first developer kits in 2016. Unlike traditional cameras, the AXIOM is designed to be completely modular, and so continuously evolve, thereby preventing it from ever becoming obsolete. How the camera evolves is determined by its user community, with its design files and source code freely available and users encouraged to duplicate, modify and redistribute anything and everything related to the camera.

While the camera is primarily for use in motion picture production, there are many suitable applications where AXIOM can be useful. Individuals in science, astronomy, medicine, aerial mapping, industrial automation, and those who record events or talks at conferences have expressed interest in the camera. A modular and open source device for digital imaging allows users to build a system that meets their unique requirements. One such company for instance, Mavrx Inc, who use aerial imagery to provide actionable insight for the agriculture industry, used the camera because it enabled them to not only process the data more efficiently than comparable camera equivalents, but also to re-configure its form factor so that it could be installed alongside existing equipment configurations.

Google Summer of Code 2017

Continuing their journey, apertus° participated in Google Summer of Code for the first time in 2017. They received about 30 applications from interested students, from which they needed to select three. Projects ranged from field programmable gate array (FPGA) centered video applications to creating Linux kernel drivers for specific camera hardware. Similarly, an open hardware project for live event streaming and conference recording, is working on FPGA projects around video interfaces and processing.

After some preliminary work, the students came to grips with the camera’s operating processes and all three dove in enthusiastically. One student failed the first evaluation and another failed the second, but one student successfully completed their work.

That student, Vlad Niculescu, worked on defining control loops for a voltage controller using VHSIC Hardware Description Language (VHDL) for a potential future AXIOM Beta Power Board, an FPGA-driven smart switching regulator for increasing the power efficiency and improving flexibility around voltage regulation.
Left: The printed circuit board (PCB) (printed circuit board) for testing the switching regulator FPGA logic. Right: After final improvements the fluctuation ripple in the voltages was reduced to around 30mV at 2V target voltage.
Vlad had this to say about his experience:

“The knowledge I acquired during my work with this project and apertus° was very satisfying. Besides the electrical skills gained I also managed to obtain other, important universal skills. One of the things I learned was that the key to solving complex problems can often be found by dividing them into small blocks so that the greater whole can be easily observed by others. Writing better code and managing the stages of building a complex project have become lessons that will no doubt become valuable in the future. I will always be grateful to my mentor as he had the patience to explain everything carefully and teach me new things step by step, and also to apertus° and Google’s Summer of Code program, without which I may not have gained the experience of working on a project like this one.”

We are grateful for Vlad’s work and congratulate him for successfully completing the program. If you find open hardware and video production interesting, we encourage you to reach out and join the community–both apertus° and are back for Google Summer of Code 2018.

By Sebastian Pichelhofer, apertus°, and Tim 'mithro' Ansell,

Googlers on the road: FOSSASIA Summit 2018

Friday, March 16, 2018

In a week’s time, free and open source enthusiasts of all kinds will gather in Singapore for FOSSASIA Summit 2018. Established in 2009, the annual event attracts more than 3,000 attendees, running from March 22nd to 25th this year.

FOSSASIA logo licensed LGPL-2.1.
FOSSASIA Summit is organized by FOSSASIA, a nonprofit organization that focuses on Asia and brings people together around open technology both in-person and online. The organization is also home to many open source projects and is a regular participant in the Google Open Source team’s student programs, Google Summer of Code and Google Code-in.

Our team is excited to be among those attending and speaking at the conference this year, and we’re proud that Google Cloud is a sponsor. If you’re around, please come say hello. The highlight of our travel is meeting the students and mentors who have participated in our programs!

Here are the Googlers who will be giving presentations:

Thursday, March 22nd

2:00pm Real-world Machine Learning with TensorFlow and Cloud ML by Kaz Sato

Friday, March 23rd

9:30am BigQuery codelab by Jan Peuker
10:30am  Working with Cloud DataPrep by KC Ayyagari
1:00pm Extract, analyze & translate Text from Images with Cloud ML APIs by Sara Robinson
2:00pm    Bitcoin in BigQuery: blockchain analytics on public data by Allen Day
2:40pm What can we learn from 1.1 billion GitHub events and 42 TB of code? by Felipe Hoffa2:40pm Engaging IoT solutions with Machine Learning by Markku Lepisto
2:45pm CloudML Engine: Qwik Start by Kaz Sato
3:20pm Systems as choreographed behavior with Kubernetes by Jan Peuker
4:00pm The Assistant by Manikantan Krishnamurthy

Saturday, March 24th

10:30am Google Summer of Code and Google Code-in by Stephanie Taylor
10:30am Zero to ML on Google Cloud Platform by Sara Robinson
11:00am  Building a Sustainable Open Tech Community through Coding Programs, Contests and Hackathons panel including Stephanie Taylor
11:05am Codifying Security and Modern Secrets Management by Seth Vargo
1:00pm    Open Source Education panel including Cat Allman
5:00pm Everything as Code by Seth Vargo

We look forward to seeing you there!

By Josh Simmons, Google Open Source

Semantic Image Segmentation with DeepLab in TensorFlow

Thursday, March 15, 2018

Cross-posted on the Google Research Blog.

Semantic image segmentation, the task of assigning a semantic label, such as “road”, “sky”, “person”, “dog”, to every pixel in an image enables numerous new applications, such as the synthetic shallow depth-of-field effect shipped in the portrait mode of the Pixel 2 and Pixel 2 XL smartphones and mobile real-time video segmentation. Assigning these semantic labels requires pinpointing the outline of objects, and thus imposes much stricter localization accuracy requirements than other visual entity recognition tasks such as image-level classification or bounding box-level detection.

Today, we are excited to announce the open source release of our latest and best performing semantic image segmentation model, DeepLab-v3+ [1]*, implemented in TensorFlow. This release includes DeepLab-v3+ models built on top of a powerful convolutional neural network (CNN) backbone architecture [2, 3] for the most accurate results, intended for server-side deployment. As part of this release, we are additionally sharing our TensorFlow model training and evaluation code, as well as models already pre-trained on the Pascal VOC 2012 and Cityscapes benchmark semantic segmentation tasks.

Since the first incarnation of our DeepLab model [4] three years ago, improved CNN feature extractors, better object scale modeling, careful assimilation of contextual information, improved training procedures, and increasingly powerful hardware and software have led to improvements with DeepLab-v2 [5] and DeepLab-v3 [6]. With DeepLab-v3+, we extend DeepLab-v3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further apply the depthwise separable convolution to both atrous spatial pyramid pooling [5, 6] and decoder modules, resulting in a faster and stronger encoder-decoder network for semantic segmentation.

Modern semantic image segmentation systems built on top of convolutional neural networks (CNNs) have reached accuracy levels that were hard to imagine even five years ago, thanks to advances in methods, hardware, and datasets. We hope that publicly sharing our system with the community will make it easier for other groups in academia and industry to reproduce and further improve upon state-of-art systems, train models on new datasets, and envision new applications for this technology.

By Liang-Chieh Chen and Yukun Zhu, Google Research

We would like to thank the support and valuable discussions with Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille (co-authors of DeepLab-v1 and -v2), as well as Mark Sandler, Andrew Howard, Menglong Zhu, Chen Sun, Derek Chow, Andre Araujo, Haozhi Qi, Jifeng Dai, and the Google Mobile Vision team.

  1. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam, arXiv: 1802.02611, 2018.
  2. Xception: Deep Learning with Depthwise Separable Convolutions, François Chollet, Proc. of CVPR, 2017.
  3. Deformable Convolutional Networks — COCO Detection and Segmentation Challenge 2017 Entry, Haozhi Qi, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng, Yichen Wei, and Jifeng Dai, ICCV COCO Challenge Workshop, 2017.
  4. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille, Proc. of ICLR, 2015.
  5. Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille, TPAMI, 2017.
  6. Rethinking Atrous Convolution for Semantic Image Segmentation, Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam, arXiv:1706.05587, 2017.

* DeepLab-v3+ is not used to power Pixel 2's portrait mode or real time video segmentation. These are mentioned in the post as examples of features this type of technology can enable.