Performing Text Searches
To perform full-text searches across bookmarks in this project, use the BookmarkService.search method or the /api/bookmarks/search REST endpoint. The search system uses an in-memory inverted index (SearchIndex) that ranks results based on keyword frequency in titles and descriptions.
Searching via the Service
The BookmarkService provides a high-level facade for searching. It delegates the query to the SearchIndex and returns a list of Bookmark objects.
from app.services.bookmark_service import BookmarkService
service = BookmarkService()
# Search for bookmarks containing "python" and "tutorial"
# Results are ranked by how often these words appear in title/description
results = service.search(query="python tutorial", limit=10)
for bookmark in results:
print(f"Found: {bookmark.title} ({bookmark.url})")
Searching via the REST API
The search functionality is exposed via a GET request to the /api/bookmarks/search endpoint.
# Search for "recipes" with a limit of 5 results
curl "http://localhost:5000/api/bookmarks/search?q=recipes&limit=5"
The API implementation in app/routes/bookmarks.py handles the request as follows:
@bookmarks_bp.route("/search", methods=["GET"])
def search_bookmarks():
query = request.args.get("q", "")
limit = request.args.get("limit", 20, type=int)
results = _service.search(query, limit=limit)
return jsonify({"results": [b.to_dict() for b in results], "count": len(results)})
How Search Works
The SearchIndex class (found in app/services/search_service.py) manages the search logic using the following rules:
- Tokenization: Queries and bookmark content (title + description) are converted to lowercase and split into tokens using the regex
[a-z0-9]+. - Stop Word Removal: Common words that do not add search value are ignored. The following stop words are filtered out:
the,a,an,and,or,but,in,on,at,to,for,is,it. - AND Logic: The search uses an "AND" strategy. If you search for "python tutorial", a bookmark must contain both "python" and "tutorial" to be included in the results.
- Ranking: Results are ranked by a simple relevance score. The score is calculated by counting the total occurrences of all query tokens within the bookmark's title and description.
Automatic Index Updates
You do not need to manually update the search index. The BookmarkService automatically keeps the SearchIndex in sync during standard operations:
- Creation:
create_bookmarkcallsself._search.index_bookmark(bookmark). - Updates:
update_bookmarkre-indexes the bookmark whenever the title or description changes. - Initialization: When the application starts,
SearchIndexperforms a one-time rebuild by loading up to 10,000 bookmarks from the repository.
Troubleshooting and Limitations
Missing Results for Common Words
If a search query consists entirely of stop words (e.g., searching for "the and"), the search method will return an empty list because all tokens were filtered out.
Fields Not Indexed
The search index only processes the title and description fields. Queries targeting the following fields will not return results:
- URLs (e.g., searching for "github.com")
- Tags (e.g., searching for a tag name like "work")
- Collection names
Case Sensitivity
Search is entirely case-insensitive. Both the index and the queries are normalized to lowercase during the tokenization phase in SearchIndex._tokenize.
Result Limits
The default limit for search results is 20. While you can increase this via the limit parameter, the system is optimized for small to medium datasets as the index is stored entirely in memory.