Making String Search Easier Across Databases

[ad_1]

Searching for information in applications is rarely as simple as matching an exact string. Users don’t always remember the full text; instead, they rely on fragments. When buying a product online, for instance, they might type only the brand (“Samsung”) or only the model (“Galaxy S24”), but rarely both together. In financial systems, the same happens when looking up a transaction by just part of the description.

This type of partial search has become crucial for modern systems. In e-commerce, it drives product discovery. In finance, it helps locate records quickly. And in countless other domains, it shapes how people interact with data. To meet this demand, databases have evolved to provide capabilities that go beyond strict equality, allowing queries that can check whether text contains, starts with, or ends with a given fragment.

In this article, we will explore these capabilities, the trade-offs of performing such searches directly in the database, and how they compare with the use of dedicated search engines.

Table of Contents

Search Term Capabilities and Their Trade-Offs

Databases have long recognized that exact matches are not enough. To improve usability, they expose operators that allow searching by partial terms — for example, checking if a string contains a keyword, starts with a prefix, or ends with a suffix. These operations make it possible to support the way people actually search: fragmentary, approximate, and often incomplete.

The choice, however, is not just about what is possible but about when it makes sense. Should these searches run directly in the database, or should they be delegated to a specialized search engine?

Approach When It Makes Sense Trade-Offs
Database Search (contains, startsWith, endsWith) Small to medium datasets, transactional systems, simple keyword lookups (e.g., product name, transaction description). ✅ Simple to implement, fewer moving parts, no need to sync data.
⚠️ Performance degrades with massive datasets and limited search features.
Search Engine (Elasticsearch, Lucene, etc.) Large datasets, full-text search, fuzzy matching, autocomplete, ranking, and scoring relevance. ✅ Built for scale, advanced search features, optimized indexing.
⚠️ Extra infrastructure, higher operational cost, and need to sync data with the database.

The decision is not binary but contextual. For many use cases, database search is efficient enough and far more straightforward. For others, especially where scalability and advanced search features are critical, a search engine becomes essential.

With this background in place, we can now move from theory to practice. Let’s explore a concrete example to see how these search term capabilities can be applied directly in a database-driven Java application.

Preparing the Setup

To move from theory to practice, let’s create a small application that demonstrates how partial search works in a real scenario. Movies are a perfect example: we rarely remember their titles in full. Sometimes we recall just the beginning (“Star Wars…”), sometimes the end (“…Endgame”), and other times only a fragment in the middle (“War”). Our goal will be to build a sample that helps us recall movie titles more naturally.

For this, we will use Eclipse JNoSQL, which, in its latest release, introduces three essential improvements:

  • Performance enhancements: Queries are now executed more efficiently, particularly for string operations.
  • Driver updates: All major NoSQL drivers have been refreshed to align with the most recent stable versions. This includes ArangoDB, Cassandra, Couchbase, Neo4j, OrientDB, Elasticsearch, HBase, Jedis, and Apache Tinkerpop.
  • New string capabilities: Support for contains, startsWith, and endsWith operators, aligned with the minimal expression set defined in Jakarta Query.

For this demonstration, we’ll use ArangoDB to explore document capabilities. However, the setup is flexible: by switching the driver, you can use another document database such as MongoDB or Oracle NoSQL without changing the core logic.

Our project will be a Java SE Maven application, with the reference sample available here:
https://github.com/JNOSQL/demos-se/tree/main/arangodb.

Beyond CDI and JSON-B, the additional requirement is the ArangoDB driver dependency:


    org.eclipse.jnosql.databases
    jnosql-arangodb
    ${jnosql.version}

Once this dependency is in place, we need a running database instance. For simplicity, we can use Docker to start ArangoDB locally:

docker run -e ARANGO_NO_AUTH=1 -d --name arangodb-instance -p 8529:8529 arangodb/arangodb

With the setup complete, we are now ready to design the model that will represent our movies. This model will allow us to explore the new search capabilities in action.

Creating the Model and Repository

With the setup complete, the next step is to define the model that will represent our movies. Since a movie entity in this context is not meant to be updated once stored — we don’t “edit” the movie title or its release year — we can model it as an immutable type using a Java record.

import jakarta.nosql.Entity;
import jakarta.nosql.Id;
import jakarta.nosql.Column;

@Entity
public record Movie(@Id String id, @Column String title, @Column int releaseYear) {

    public static Movie of(String title, int releaseYear) {
        return new Movie(null, title, releaseYear);
    }
}

This code uses Jakarta NoSQL annotations:

  • @Entity marks the class as a persistent entity to be managed by the database.
  • @Id identifies the primary key of the entity. In this case, it’s a string that the database will handle.
  • @Column maps each property of the record to a column (or field) in the database document.

With the entity defined, the next step is to enable database communication. Jakarta Data makes this process simple: instead of writing custom data access layers, we can declare a repository interface and let the framework generate the implementation.

import jakarta.data.repository.CrudRepository;
import jakarta.data.repository.Repository;

import java.util.List;

@Repository
public interface MovieRepository extends CrudRepository {

    List findByTitleContains(String title);

    List findByTitleStartsWith(String title);

    List findByTitleEndsWith(String title);
}

  • CrudRepository provides basic operations such as save, delete, and find by ID.
  • findByTitleContains enables substring searches, matching any movie title that contains the given fragment.
  • findByTitleStartsWith finds all movies where the title begins with the specified prefix.
  • findByTitleEndsWith returns movies whose titles end with the given suffix.

These methods illustrate the new string search operators: contains, startsWith, and endsWith. They are part of the minimal expression set defined in Jakarta Query, making queries more consistent across databases.

With both the model and repository in place, we are ready to bring everything together. In the next section, we will execute the application, insert sample data, and demonstrate how these new search capabilities work in practice.

Executing the Application

With the model and repository defined, the final step is to execute the application and observe how the search expressions behave in practice. The following code sets up a simple example using CDI to bootstrap the container, inserts a few well-known movie titles, performs searches with the new operators, and prints the results:

import jakarta.enterprise.inject.se.SeContainer;
import jakarta.enterprise.inject.se.SeContainerInitializer;
import org.eclipse.jnosql.mapping.DatabaseQualifier;

import java.util.List;

public class App6 {

    public static void main(String[] args) {

        try (SeContainer container = SeContainerInitializer.newInstance().initialize()) {

            var repository = container
                    .select(MovieRepository.class, DatabaseQualifier.ofDocument())
                    .get();

            var avengers1 = repository.save(Movie.of("The Avengers", 2012));
            var avengers2 = repository.save(Movie.of("Avengers: Age of Ultron", 2015));
            var avengers3 = repository.save(Movie.of("Avengers: Infinity War", 2018));
            var avengers4 = repository.save(Movie.of("Avengers: Endgame", 2019));

            var starWars1 = repository.save(Movie.of("Star Wars: A New Hope", 1977));
            var starWars2 = repository.save(Movie.of("Star Wars: The Empire Strikes Back", 1980));
            var starWars3 = repository.save(Movie.of("Star Wars: Return of the Jedi", 1983));
            var starWars4 = repository.save(Movie.of("Star Wars: The Phantom Menace", 1999));
            var starWars5 = repository.save(Movie.of("Star Wars: Attack of the Clones", 2002));
            var starWars6 = repository.save(Movie.of("Star Wars: Revenge of the Sith", 2005));
            var starWars7 = repository.save(Movie.of("Star Wars: The Force Awakens", 2015));
            var starWars8 = repository.save(Movie.of("Star Wars: The Last Jedi", 2017));
            var starWars9 = repository.save(Movie.of("Star Wars: The Rise of Skywalker", 2019));

            List warMovies = repository.findByTitleContains("War");
            List startMovies = repository.findByTitleStartsWith("Star");
            List jediMovies = repository.findByTitleEndsWith("Jedi");

            System.out.println("War Movies: " + warMovies);
            System.out.println("Start Movies: " + startMovies);
            System.out.println("Jedi Movies: " + jediMovies);

            repository.deleteAll(List.of(
                avengers1, avengers2, avengers3, avengers4,
                starWars1, starWars2, starWars3, starWars4,
                starWars5, starWars6, starWars7, starWars8, starWars9));
        }
    }

    private App6() {
    }
}

This simple program highlights the three new search operators in action:

  • findByTitleContains("War") retrieves all movies with War anywhere in the title (Infinity War, Star Wars).
  • findByTitleStartsWith("Star") returns every movie title beginning with Star (Star Wars series).
  • findByTitleEndsWith("Jedi") locates movies that finish with Jedi (Return of the Jedi, The Last Jedi).

Running the application shows how these minimal expressions allow us to query data in a way that reflects real user behavior — remembering only part of a title rather than its entirety. The example also demonstrates how Jakarta Data and Eclipse JNoSQL simplify persistence logic by reducing it to declarative repository methods.

With this, we’ve seen how to set up the environment, define the model, create the repository, and execute queries using the new expressions. In the final section, we will step back to reflect on when it makes sense to rely on these database capabilities and when to consider specialized search engines.

Conclusion

In this article, we explored the challenge of searching strings in databases, where users often remember only fragments of information. We saw how operations like contains, starts with, and ends with can make applications more intuitive, and we demonstrated these capabilities with a simple movie catalog example using Java and Eclipse JNoSQL.

The difficulty with these expressions is the lack of consistency across databases. While SQL relies on LIKE, its behavior is not always uniform, and in the NoSQL world, each system defines its own syntax — MongoDB with $regex, ArangoDB with distinct operators, and so on. This fragmentation makes applications harder to maintain and reduces portability.

To solve this, Jakarta EE 12 introduces Jakarta Query, an effort to establish a common query language across databases, starting with a minimal set of string expressions (discussion here). The new release of Eclipse JNoSQL 1.1.10 aligns with this vision by supporting these operators, enhancing performance, and updating drivers. This allows developers to build portable, efficient applications that start simple with database-native search and can scale toward dedicated engines like Elasticsearch only when truly needed.

[ad_2]

Share this content:

I am a passionate blogger with extensive experience in web design. As a seasoned YouTube SEO expert, I have helped numerous creators optimize their content for maximum visibility.

Leave a Comment