Author: Manuel Lemos
Updated on: 2023-05-19
Posted on: 2023-05-19
Package: PHP NGram Comparator
When people ask questions to a software application, the software needs to understand how people express the same question.
This package can parse sentences in a way that can determine that a question is very similar to another that asks about the same problem.
This way, this package can implement the base of artificial intelligence applications that can understand what humans are asking in specific languages.
About the PHP NGram Comparator Package
The basic purpose is: Compare strings to find the level of similarity
Here follows in more detail what it does:
This package can compare strings to find the level of similarity.
It can take a string and parses it to get the shingles and ngram words in an array.
The package can also compare the respective ngram word arrays of two strings and return the level of similarity as a percentage.
It can also compare two strings and return the number of ngram words that match.
The package also takes arrays of words of two phrases and generates arrays suitable for training with language models.
N-grams are contiguous sequences of n items from a given sample of text.
Shingles are overlapping sequences of words.
The class includes the following methods:
- get_ngrams($text, $n): This method takes a string of text and an integer n as input and returns an array of n-grams. The method splits the input text into n-grams and returns an array of these n-grams.
- compare_strings_ngram_pct($string1, $string2, $n): This method takes two strings and an integer n as input and returns the percentage of matching n-grams between the two strings. The method splits the two input strings into n-grams and calculates the percentage of matching n-grams.
- compare_strings_ngram_max_size($string1, $string2): This method takes two strings as input and returns the maximum matching n-gram size between the two strings. The method splits the two input strings into n-grams of varying lengths and returns the size of the largest matching n-gram.
- get_shingles($text, $shingle_size): This method takes a string of text and an integer shingle_size as input and returns an array of shingles. The method splits the input text into shingles of the specified size and returns an array of these shingles.
- train_ngram_model($tokenized_text=, $n=): This method takes an array of tokenized text and an integer n as input and returns an array of n-gram counts. The method loops through each sentence in the tokenized text and creates n-grams of length n. It then counts the frequency of each n-gram and returns an array of n-gram counts.
This package was considered notable for implementing its benefits in a way that is worth noticing.
Notable PHP packages can be often considered innovative. If this package is also innovative, it can be nominated to the PHP Innovation Award and the author may win prizes and recognition for sharing innovative packages.
If you also developed your own notable or innovative packages consider sharing them, so you can also earn more visibility for your package as well nice prizes.
One nice prize that many PHP developers want and you may like is the PHP elePHPant mascot plush.
You need to be a registered user or login to post a comment
1,611,081 PHP developers registered to the PHP Classes site.
Be One of Us!
Login Immediately with your account on:
No comments were submitted yet.