Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
Functions ¶
This section is empty.
Types ¶
type Webinfo ¶
type Webinfo struct {
URL string `json:"url,omitempty"` // Original page URL
Location string `json:"location,omitempty"` // Location
Canonical string `json:"canonical,omitempty"` // Canonical URL
Title string `json:"title,omitempty"` // Page title
Description string `json:"description,omitempty"` // Meta description
ImageURL string `json:"image_url,omitempty"` // Representative image URL
UserAgent string `json:"user_agent,omitempty"` // User-Agent used to fetch the page
}
Webinfo holds common metadata extracted from a web page. It captures information useful for previews or link metadata:
- URL: the original page URL. - Location: the location declared by the page (if any). - Canonical: the canonical URL declared by the page (if any). - Title: the page title. - Description: a short summary or meta description for the page. - ImageURL: a representative image URL suitable for previews. - UserAgent: the User-Agent string used to fetch the page.
Fields may be empty or nil when the corresponding metadata is not present.
func Fetch ¶
Fetch retrieves metadata from the web page at urlStr and returns it as a *Webinfo.
Behavior:
- Parses urlStr and performs an HTTP GET using the provided context (ctx).
- If userAgent is empty, a default dummy User-Agent string is used.
- Uses an HTTP client and sets the User-Agent request header.
- Reads up to the first 1024 bytes of the response to detect the page character encoding via charset.DetermineEncoding (also considers the response Content-Type). If an encoding is detected or inferred by name, the response body is decoded accordingly before HTML parsing.
Parsing and extracted fields: - Parses the document head with goquery and extracts:
- Title: from <title>, then overridden by meta[property="twitter:title"] or meta[property="og:title"] if present.
- Description: from meta[name="description"], then overridden by meta[property="twitter:description"] or meta[property="og:description"].
- ImageURL: from meta[property="twitter:image"] or meta[property="og:image"].
- Canonical: from link[rel="canonical"].
- The returned Webinfo contains at least:
- URL: the original urlStr (string form).
- Location: the final request URL (after redirects) from the response.
- UserAgent: the User-Agent actually used.
Error handling and resource cleanup: - Network, URL parsing, encoding detection, and HTML parsing errors are wrapped with contextual information (including the URL). - The response body is closed in a deferred function; any close error is joined with the returned error. - On error, Fetch returns a nil *Webinfo and a non-nil error.
Notes and guarantees: - The first 1024 bytes are peeked (without advancing the reader) to determine encoding. - DetermineEncoding's boolean return value is ignored (some encodings like Shift_JIS may be reported inconsistently); the detected encoding or a named encoding (via encoding.GetEncoding) is preferred. - The function honors context cancellation for the HTTP request. - Caller should assume that a non-nil *Webinfo is returned only on success; otherwise, info is nil.
func (*Webinfo) DownloadImage ¶
func (w *Webinfo) DownloadImage(ctx context.Context, destDir string, temporary bool) (outPath string, err error)
DownloadImage downloads the image pointed to by w.ImageURL and saves it to destDir, returning the path of the saved file (outPath) or an error.
Behavior:
- The method is a receiver on *Webinfo and will return an error if w is nil or if ImageURL is empty.
- ctx is used to control/cancel the underlying HTTP request.
- destDir is cleaned with filepath.Clean. If it is non-empty, the directory (and any required parents) will be created with mode 0750. If destDir is empty, file creation uses the system/default behavior for temporary or current directories.
- If `temporary` is true, the image is written to a temporary file (created via the package-level `createFile` helper which wraps `os.CreateTemp`) and the temporary file path is returned. If the URL path does not contain a filename, `temporary` is forced true.
- If `temporary` is false, the image is written to `destDir` with the filename taken from the URL path. If the URL filename has no extension, an extension is appended (see extension resolution below). Existing files will be truncated by the underlying `createFile`/`os.Create` behavior.
HTTP download and content-type/extension resolution:
- The image is fetched using an HTTP GET performed with the provided context; the request User-Agent is set via getUserAgent(w.UserAgent).
- Extension resolution order: 1) Extension from the URL path (if present). 2) Extension(s) derived from the Content-Type response header via mime.ExtensionsByType. 3) If still unknown, the first up-to-512 bytes of the body are read and http.DetectContentType is used to guess the content type, then mime.ExtensionsByType. 4) If no extension can be determined, ".img" is used as a fallback.
- When bytes are sniffed from the body, they are prepended back to the reader so the full image is written to disk. When multiple extensions are returned by mime.ExtensionsByType the implementation picks the last returned extension.
- File creation is performed via the package-level `createFile` variable which tests may override to simulate create failures.
Resource management and errors:
- The response body and any created file are closed using deferred cleanup; any close errors are joined into the returned error.
- I/O, network and OS errors are returned (wrapped with contextual information).
- On success, outPath contains the absolute/relative path to the saved image file; on error, outPath will be empty and err will describe the failure.
Notes:
- The function may truncate an existing destination file with the same name.
- The exact behavior of temporary file placement when destDir is empty follows the semantics of os.CreateTemp.
func (*Webinfo) DownloadThumbnail ¶
func (w *Webinfo) DownloadThumbnail(ctx context.Context, destDir string, width int, temporary bool) (outPath string, err error)
DownloadThumbnail downloads the image referenced by the Webinfo receiver, scales it to the requested width (preserving aspect ratio), and writes the resulting thumbnail image to disk.
The method returns the path to the created thumbnail file or an error. Behavior details:
- If the receiver is nil, ErrNullPointer is returned.
- If width <= 0, a default width of 150 pixels is used.
- destDir is cleaned and, if non-empty, created with mode 0750 (os.MkdirAll).
- The original image is always downloaded to a temporary file via DownloadImage(..., true). That temporary original file is removed when the function returns (even on error).
- The original image file is opened and decoded. If decoding fails, an error is returned.
- The thumbnail height is computed to preserve aspect ratio: newH = round(width * origH / origW). newH is clamped to at least 1 pixel.
- The image is resized using a Catmull-Rom resampler into an RGBA image of size width x newH.
- The output format/extension is chosen from the decoded format: jpeg/jpg → .jpg, png → .png, gif → .gif. Unknown formats fall back to PNG.
- If `temporary` is true, the thumbnail file is created via the package-level `createFile` helper (which wraps `os.CreateTemp`) in `destDir` using the pattern "webinfo-thumb-*<ext>"; the temporary file path is returned.
- If `temporary` is false, the output filename is derived from the original image URL basename (falling back to "webinfo-image") and named "<base>-thumb<ext>" in `destDir`.
- The encoder used to write the thumbnail is the package-level `outputImage` function variable; tests may replace this variable to simulate encoder failures. The image decoding step uses the package-level `decodeImage` wrapper around `image.Decode`, which tests may also override.
- Files are properly closed with deferred cleanup; any close/remove errors are joined into the returned error using the errs package.
- All filesystem, download, and image-processing errors are wrapped with contextual information (e.g., paths, URL) before being returned.
Parameters:
- ctx: context for cancellation and timeouts passed to DownloadImage and other operations.
- destDir: destination directory for the thumbnail (cleaned). If empty, creation uses the current directory semantics of os.Create/os.CreateTemp.
- width: desired thumbnail width in pixels (defaults to 150 if <= 0).
- temporary: if true, create a uniquely-named temporary file; otherwise create a stable filename based on the original image basename.
Returns:
- outPath: filesystem path to the created thumbnail file (valid when err == nil).
- err: non-nil on failure; common failure reasons include download errors, decode errors, filesystem errors, and invalid image dimensions (ErrNoImageURL).
