Documentation
¶
Overview ¶
Package contentdiff provides HTML comparison and difference detection functionality.
This package compares HTML documents to identify changes between different versions of a bookmarked web page. It extracts and compares:
- Text content (rendered text from the page)
- Links (href and anchor text)
- Multimedia elements (images, videos)
The comparison uses the Myers diff algorithm (via sergi/go-diff) to compute differences in text content. For links and multimedia, it performs set-based comparison to identify additions and removals.
The package is used to show users what has changed on a bookmarked page between snapshots, making it easy to track content updates, new links, or removed sections.
Example usage:
reader1 := strings.NewReader(oldHTML)
reader2 := strings.NewReader(newHTML)
diffs, err := contentdiff.DiffHTML(reader1, reader2)
if err != nil {
return err
}
// Display text changes
for _, d := range diffs.Text {
fmt.Printf("[%s] %s\n", d.Type, d.Text)
}
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Diffs ¶
type Diffs struct {
Text TextDiffs `json:"text"`
Multimedia TextDiffs `json:"multimedia"`
Link LinkDiffs `json:"link"`
}
Diffs contains all types of differences between two HTML documents.
type HTMLContent ¶
type HTMLContent struct {
Links []Link `json:"links"`
Multimedia []string `json:"multimedia"`
Text string `json:"text"`
}
HTMLContent represents extracted content from an HTML document.
func ExtractHTMLContent ¶
func ExtractHTMLContent(r io.Reader) *HTMLContent
ExtractHTMLContent extracts text, links, and multimedia from an HTML document.
type LinkDiffs ¶ added in v0.8.0
type LinkDiffs []LinkDiff
LinkDiffs is a list of LinkDiff items