parser

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 29, 2025 License: MIT Imports: 5 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ExtractionStats

type ExtractionStats struct {
	TotalFound          int // Total anchor tags with href found
	Valid               int // Valid links extracted
	EmptyHrefs          int // Empty href attributes
	FilteredOut         int // Links filtered out (javascript:, mailto:, etc.)
	RelativeURLs        int // Relative URLs that were resolved
	ResolutionErrors    int // Errors during relative URL resolution
	InvalidURLs         int // Invalid URLs after resolution
	NormalizationErrors int // Errors during URL normalization
}

ExtractionStats holds statistics about link extraction

func (*ExtractionStats) String

func (s *ExtractionStats) String() string

String returns a human-readable representation of the stats

type LinkExtractor

type LinkExtractor struct {
	// contains filtered or unexported fields
}

LinkExtractor provides functionality to extract and filter links from HTML content

func NewLinkExtractor

func NewLinkExtractor(logger *slog.Logger) *LinkExtractor

NewLinkExtractor creates a new LinkExtractor instance

func (le *LinkExtractor) ExtractLinks(baseURL, htmlContent string) ([]string, error)

ExtractLinks extracts and filters links from HTML content baseURL is used to resolve relative URLs to absolute URLs htmlContent is the HTML content to parse Returns a slice of valid, filtered absolute URLs

func (*LinkExtractor) ExtractLinksWithStats

func (le *LinkExtractor) ExtractLinksWithStats(baseURL, htmlContent string) ([]string, *ExtractionStats, error)

ExtractLinksWithStats extracts links and returns statistics

func (le *LinkExtractor) ExtractSameDomainLinks(baseURL, htmlContent string) ([]string, error)

ExtractSameDomainLinks extracts links that belong to the same domain as the base URL

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL