textorient

package module
v1.0.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 4, 2025 License: MIT Imports: 8 Imported by: 3

README

textorient

textorient is a Go package that runs a small embedded NN (using NCNN) to determine the orientation of text in an image. Our NN is trained to produce 1 of 4 outputs:

  • 0: 0 degrees
  • 1: 90 degrees
  • 2: 180 degrees
  • 3: 270 degrees

The training program and data for this package can be found at textorient-train. The neural network weights and params files text_angle_classifier.ncnn.bin and text_angle_classifier.param are included in this package. The model is a small, efficient model that runs fine on a CPU.

The model's accuracy on the validation set is 80%, which is why we run the model on approximately 100 randomly sampled tiles from the image, and choose the majority vote of the predictions.

How it works

The function Orient.StraightenAndMakeUpright() consists of a few steps:

  1. Run docangle to compute the angle of the page (see docangle notes below).
  2. Rotate the image by the inverse of the angle found in step 1. The image is now straight, but it could be rotated by -90, 90, or 180 degrees (-90 is the same as 270 degrees).
  3. Extract a sample of 200 tiles, each tile 32x32 pixels, from the image. Run the neural network on each of these tiles, and get their orientation (0, 90, 180, or 270 degrees). Pick the majority vote of their orientations, and rotate the image by the inverse, so that the page is upright and straight.

About docangle:
docangle runs a dumb algorithm that knows nothing about text or its orientation, but it is quite good at identifying a few degrees of rotation of the page. Note that in its default setting, it only looks for slight rotation (-2.5 to +2.5 degrees). The algorithm is brute force, and would be very slow if it was run with much more range.

Example

import (
	"github.com/bmharper/textorient"
	"github.com/bmharper/cimg"
)

func example(img *cimg.Image) error {
	// Load the neural network.
	// You'll typically do this once, because loading has a fixed cost.
	orient, err := textorient.NewOrient()
	if err != nil {
		return err
	}

	param := textorient.NewWhiteLinesParams()
	// Tweak the search range if necessary
	params.MinDeltaDegrees = -2.7
	params.MaxDeltaDegrees = 2.7
	straight, err := orient.StraightenImage(img, params)
	if err != nil {
		return err
	}

	// ... Send straight onto an OCR service, etc.

	return nil
}

NCNN dependency

This package depends on the NCNN library, which we interface with using cgo.

Prebuilt libraries and include files are included in this repo, inside the include and lib directories. If your platform is not included, then please submit a PR.

Documentation

Index

Constants

View Source
const (
	Angle0   = 0
	Angle90  = 1
	Angle180 = 2
	Angle270 = 3
)
View Source
const TileSize = 32

Variables

This section is empty.

Functions

func NewWhiteLinesParams

func NewWhiteLinesParams() *docangle.WhiteLinesParams

Create a new WhiteLinesParams with defaults

func Perplexity

func Perplexity(img *cimg.Image) float32

Return a measure of how "interesting" the image is. When selecting tiles for training or inference, we choose the tiles with the highest perplexity. This allows us to ignore blank tiles, or tiles with very little visual information.

func SplitImage

func SplitImage(img *cimg.Image, numTiles, size int) []*cimg.Image

Split an image up into size x size square tiles, and return numTiles samples

Types

type Orient

type Orient struct {
	// contains filtered or unexported fields
}

An Orientation neural network

func NewOrient

func NewOrient() (*Orient, error)

NewOrient creates a new Orient struct and loads the neural network. The network must be closed after use, or you will leak C++ memory.

func (*Orient) Close

func (o *Orient) Close()

Close the neural network (free the C++ NCNN object)

func (*Orient) GetImageOrientation

func (o *Orient) GetImageOrientation(img *cimg.Image) (int, error)

Run on a whole image, and return one of 4 angles (Angle0, Angle90, Angle180, Angle270)

func (*Orient) MakeUpright added in v1.0.1

func (o *Orient) MakeUpright(img *cimg.Image) (*cimg.Image, error)

MakeUpright runs the neural network to determine if the page is upright. If necessary, rotate the page by -90, 90, or 180 degrees and return the upright image. If the page is already upright, return 'img'

func (*Orient) Straighten added in v1.0.1

func (o *Orient) Straighten(img *cimg.Image, params *docangle.WhiteLinesParams) *cimg.Image

Use github.com/bmharper/docangle to compute the angle of the page, and rotate the image to negate that angle. If the angle is 0, return 'img'

func (*Orient) StraightenAndMakeUpright added in v1.0.1

func (o *Orient) StraightenAndMakeUpright(img *cimg.Image, params *docangle.WhiteLinesParams) (*cimg.Image, error)

Combine Straighten and MakeUpright

Directories

Path Synopsis
cmd
orient command
straighten command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL