Text Recognition by mobile Vision API in Android

Mobile Vision is an API which help us to find objects in photos and video, using real-time on-device vision technology. I have had talked about 2 features of Vision API of Google Play Services: detecting faces on photo and scanning barcode, QR code. Now, through this tutorial, I would like to present to readers the remaining feature of this API: Text recognition in the photo.

According the document, we know defining of this API:

Text recognition is the process of detecting text in images and video streams and recognizing the text contained therein. Once detected, the recognizer then determines the actual text in each block and segments it into lines and words. The Text API detects text in Latin based languages (French, German, English, etc.), in real-time, on device.

Text Structure

The Text Recognizer segments text into blocks, lines, and words. Roughly speaking:

a Block is a contiguous set of text lines, such as a paragraph or column,
a Line is a contiguous set of words on the same vertical axis,
a Word is a contiguous set of alphanumeric characters on the same vertical axis.

The image below highlights examples of each of these in descending order. The first highlighted block, in cyan, is a Block of text. The second set of highlighted blocks, in blue, are Lines of text. Finally, the third set of highlighted blocks, in dark blue, are Words.

Project configuration

After creating a new Android Studio project, open app-level build.gradle file and add Google Play Services dependency to dependencies block:

compile 'com.google.android.gms:play-services-vision:9.8.0'

Because you will be using the device’s camera to capture texts, please add CAMERA permission to your AndroidManifest.xml:

<uses-permission android:name="android.permission.CAMERA"/>

Defining the activity layout

The activity layout will have a SurfaceView in order to display the preview frames captured by the camera. I also add a TextView to display the contents of the recognized text:

activity_main.xml

<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:padding="16dp">

    <SurfaceView
        android:id="@+id/surface_view"
        android:layout_width="match_parent"
        android:layout_height="match_parent"
        android:layout_alignParentLeft="true"
        android:layout_centerVertical="true" />

    <TextView
        android:id="@+id/text_value"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:text="No text"
        android:layout_alignParentBottom="true"
        android:textColor="@android:color/white"
        android:textSize="20sp" />

</RelativeLayout>

Capturing text with Camera device

In our activity programmatically code, apply the layout you created in the previous step and get references to the widgets defined in the layout first:

    private SurfaceView cameraView;
    private TextView textBlockContent;
    private CameraSource cameraSource;

    @Override
    public void onCreate(Bundle bundle) {
        super.onCreate(bundle);
        setContentView(R.layout.activity_main);

        cameraView = (SurfaceView) findViewById(R.id.surface_view);
        textBlockContent = (TextView) findViewById(R.id.text_value);
    }

Now, we're going to create a TextRecognizer object. This detector object processes images and determines what text appears within them. Once it's initialized, a Recognizer can be used to detect text in all types of images:

TextRecognizer textRecognizer = new TextRecognizer.Builder(getApplicationContext()).build();

Just like that, the TextRecognizer is built. However, it might not work yet. If the device does not have enough storage, or Google Play Services can't download the OCR dependencies, the TextRecognizer object may not be operational. Before we start using it to recognize text, we should check that it's ready. We'll add this check after we initialized the TextRecognizer:

if (!textRecognizer.isOperational()) {
            Log.w("MainActivity", "Detector dependencies are not yet available.");
        }

To fetch a stream of images from the device’s camera and display them in the SurfaceView, create a new instance of the CameraSource class using CameraSource.Builder. Because the CameraSource needs a TextRecognizer, initializing it with the TextRecognizer instance we've just built above:

cameraSource = new CameraSource.Builder(getApplicationContext(), textRecognizer)
                .setFacing(CameraSource.CAMERA_FACING_BACK)
                .setRequestedPreviewSize(1280, 1024)
                .setRequestedFps(2.0f)
                .setAutoFocusEnabled(true)
                .build();

Next, add a callback to the SurfaceHolder of the SurfaceView so that you know when you can start drawing the preview frames. The callback should implement the SurfaceHolder.Callback interface. Inside the surfaceCreated() method, call the start() method of the CameraSource to start drawing the preview frames and in surfaceDestroyed(), call stop() to closes the camera and stops sending frames to the underlying frame detector:

cameraView.getHolder().addCallback(new SurfaceHolder.Callback() {
            @Override
            public void surfaceCreated(SurfaceHolder holder) {
                try {
                    //noinspection MissingPermission
                    cameraSource.start(cameraView.getHolder());
                } catch (IOException ex) {
                    ex.printStackTrace();
                }
            }

            @Override
            public void surfaceChanged(SurfaceHolder holder, int format, int width, int height) {
            }

            @Override
            public void surfaceDestroyed(SurfaceHolder holder) {
                cameraSource.stop();
            }
        });

The most important work is we need to tell the TextRecognizer what it should do when it detects a text block. Create an instance of a class that implements the Detector.Processor interface and pass it to the setProcessor() method of the TextRecognizer. Inside the receiveDetections() method which we must override, displayed a SparseArray of TextBlock to a TextView like this:

textRecognizer.setProcessor(new Detector.Processor<TextBlock>() {
            @Override
            public void release() {

            }

            @Override
            public void receiveDetections(Detector.Detections<TextBlock> detections) {
                Log.d("Main", "receiveDetections");
                final SparseArray<TextBlock> items = detections.getDetectedItems();
                if (items.size() != 0) {
                    textBlockContent.post(new Runnable() {
                        @Override
                        public void run() {
                            StringBuilder value = new StringBuilder();
                            for (int i = 0; i < items.size(); ++i) {
                                TextBlock item = items.valueAt(i);
                                value.append(item.getValue());
                                value.append("\n");
                            }
                            //update text block content to TextView
                            textBlockContent.setText(value.toString());
                        }
                    });
                }

            }
        });

The last work is you should override onDestroy() method of our Activity. In this, stops the camera and releases the resources of the camera and underlying detector:

    @Override
    protected void onDestroy() {
        super.onDestroy();
        cameraSource.release();
    }

Running this activity, scanning a text block on the paper, you may have result like this:

Conclusions

Now, I have presented the "official solution" to detecting text from image in Android using device decent camera. By searching on the Internet, you can find out many other third-party libraries which help us to deal with this problem but you should pay attention to this API which written by Google as apart of Google Play Services package. Moreover, To learn more about text recognition, I recommend visiting this API’s documentation.
Read more:

Face detection with Mobile Vision API
Barcode/QR code reading with Mobile Vision API

View on Github

Learn Programming Together