Urutan leksikografik
Dalam matematika, urutan leksikografik, (biasa dikenal sebagai urutan leksikal atau urutan alfabet, adalah bentuk umum dari urutan alfabet kata yang berdasarkan pada pengurutan huruf depan.
Definisi
Diberikan dua himpunan terurut sebagian yaitu A dan B,. Leksikografik urutan di dalam Cartesian product A × B is defined as
- (a,b) ≤ (a′,b′) if and only if a < a′ or (a = a′ and b ≤ b′).
The result is a partial order. If A and B are totally ordered, then the result is a total order as well.
More generally, one can define the lexicographic order on the Cartesian product of n ordered sets, on the Cartesian product of a countably infinite family of ordered sets, and on the union of such sets.
Motivation and uses
The name of the lexicographic order comes from its generalizing the order given to words in a dictionary: a sequence of letters (that is, a word)
- a1a2 ... ak
appears in a dictionary before a sequence
- b1b2 ... bk
if and only if the first ai, which is different from bi, comes before bi in the alphabet.
That comparison assumes both sequences are the same length. To ensure they are the same length, the shorter sequence is usually padded at the end with enough "blanks" (a special symbol that is treated as coming before any other symbol). This also allows ordering of phrases. For the purpose of dictionaries, etc., padding with blank spaces is always done. See alphabetical order.
For example, the word "Thomas" appears before "Thompson" in dictionaries because the letter 'a' comes before the letter 'p' in the alphabet. The 5th letter is the first that is different in the two words; the first 4 letters are "Thom" in both. Because it is the first difference, the 5th letter is the most significant difference (for an alphabetical ordering).
A lexicographical ordering may not coincide with conventional alphabetical ordering. For example, the numerical order of Unicode codepoints does not always correspond to traditional alphabetic orderings of the characters, which vary from language to language. So the lexicographic ordering induced by codepoint value sorts strings in an unambiguous canonical order, but it does not necessarily "alphabetize" them in the conventional sense.
An important property of the lexicographical order is that it preserves well-orders, that is, if A and B are well-ordered sets, then the product set A × B with the lexicographical order is also well-ordered.
An important exploitation of lexicographical ordering is expressed in the ISO 8601 date formatting scheme, which expresses a date as YYYY-MM-DD. This date ordering lends itself to straightforward computerized sorting of dates such that the sorting algorithm does not need to treat the numeric parts of the date string any differently from a string of non-numeric characters, and the dates will be sorted into chronological order. Note, however, that for this to work, there must always be four digits for the year, two for the month, and two for the day, so for example single-digit days must be padded with a zero yielding '01', '02', ..., '09'.
Another example of digits ordered lexicographically is 101,102,103,104,105,106,107,108,109,110,111,112... 200, 201, 202 etc.
Another generalization of lexical ordering occurs in social choice theory (the theory of elections). Consider an election in which there are 4 candidates A, B, C and D, each voter expresses a top-to-bottom ordering of the candidates, and the voters' orderings are as follows:
18% | 17% | 33% | 32% |
---|---|---|---|
A | B | C | D |
B | A | D | B |
C | C | A | A |
D | D | B | C |
The MinMax voting method is a simple Condorcet method that counts the votes as in a round-robin tournament (all possible pairings of candidates) and judges each candidate according to its largest "pairwise" defeat. The winner is the candidate whose largest defeat is the smallest. In the example:
- The largest defeat of A is by D: 65% (33%+32%) rank D over A.
- The largest defeat of B is by D: 65% (33%+32%) rank D over B.
- The largest defeat of C is by A (or B): 67% (18%+17%+32%) rank A over C (and B over C).
- The largest defeat of D is by C: 68% (18%+17%+33%) rank C over D.
MinMax declares a tie between A and B since the largest defeats for both are the same size, 65%. This is like saying "Thomas" and "Thompson" should be at the same position because they have the same first letter. However, if the defeats are compared lexically, we have the MinLexMax method. With MinLexMax, because the largest defeats of A and B are the same size, their next largest defeats are then compared:
- A's next largest defeat is 0%. (This is a padding, since A has only one defeat.)
- B's next largest defeat is by A: 51% (18%+33%) rank A over B.
Since B's next largest defeat is larger than A's, MinLexMax elects A, which makes more sense than the MinMax tie since a majority rank A over B.
Another usage in social choice theory is the Ranked Pairs voting method. Although usually defined by a procedure that constructs the order of finish, Ranked Pairs is equivalent to finding which of all possible orders of finish is best according to a minlexmax comparison of the majorities they reverse. In the example above, the Ranked Pairs order of finish is ABCD (which elects A). ABCD affirms the majorities who rank A over B, A over C, B over C and C over D, and reverses the majorities who rank D over A and D over B. The largest majority that ABCD reverses is 65%. The only other ordering that wouldn't reverse a larger majority is BACD (which also reverses 65%). ABCD is a better order of finish than BACD because the lexically relevant set of majorities—the majorities on which ABCD and BACD disagree—is {A over B} and BACD reverses the largest majority in this set.
Case of multiple products
Suppose
is an n-tuple of sets, with respective total orderings
The dictionary ordering
of
is then
That is, if one of the terms
and all the preceding terms are equal.
Informally,
represents the first letter,
the second and so on when looking up a word in a dictionary, hence the name.
This could be more elegantly stated by recursively defining the ordering of any set
represented by
This will satisfy
where
To put it more simply, compare the first terms. If they are equal, compare the second terms – and so on. The relationship between the first corresponding terms that are not equal determines the relationship between the entire elements.
Groups and vector spaces
If the component sets are ordered groups then the result is a non-Archimedean group, because e.g. n(0,1) < (1,0) for all n.
If the component sets are ordered vector spaces over R (in particular just R), then the result is also an ordered vector space.
Ordering of sequences of various lengths
Given a partially ordered set A, the above considerations allow to define naturally a lexicographical partial order over the free monoid A* formed by the set of all finite sequences of elements in A, with sequence concatenation as the monoid operation, as follows:
- if
- is a prefix of , or
- and , where is the longest common prefix of and , and are members of A such that , and and are members of A*.
If < is a total order on A, then so is the lexicographic order <d on A*. If A is a finite and totally ordered alphabet, A* is the set of all words over A, and we retrieve the notion of dictionary ordering used in lexicography that gave its name to the lexicographic orderings. However, in general this is not a well-order, even though it is on the alphabet A; for instance, if A = {a, b}, the language {anb | n ≥ 0} has no least element: ... <d aab <d ab <d b. A well-order for strings, based on the lexicographical order, is the shortlex order.
Similarly we can also compare a finite and an infinite string, or two infinite strings.
Comparing strings of different lengths can also be modeled as comparing strings of infinite length by right-padding finite strings with a special value that is less than any element of the alphabet.
This ordering is the ordering usually used to order character strings, including in dictionaries and indexes.
Quasi-lexicographic order
The quasi-lexicographic order on the free monoid A∗ over an ordered alphabet A orders strings firstly by length, so that the empty string comes first, and then within strings of fixed length n, by lexicographic order on An.[1]
Generalization
Consider the set of functions f from a well-ordered set X to a totally ordered set Y. For two such functions f and g, the order is determined by the values for the smallest x such that f(x) ≠ g(x).
If Y is also well-ordered and X is finite, then the resulting order is a well-order. As already shown above, if X is infinite this is in general not the case.
If X is infinite and Y has more than one element, then the resulting set YX is not a countable set, see also cardinal exponentiation.
Alternatively, consider the functions f from an inversely well-ordered X to a well-ordered Y with minimum 0, restricted to those that are non-zero at only a finite subset of X. The result is well-ordered. Correspondingly we can also consider a well-ordered X and apply lexicographical order where a higher x is a more significant position. This corresponds to exponentiation of ordinal numbers YX. If X and Y are countable then the resulting set is also countable.
Monomials
In algebra it is traditional to order terms in a polynomial, by ordering the monomials in the indeterminates. This is fundamental, to have a normal form. Such matters are typically left implicit in discussion between humans, but must of course be dealt with exactly in computer algebra. In practice one has an alphabet of indeterminates X, Y, ... and orders all monomials formed from them by a variant of lexicographical order. For example if one decides to order the alphabet by
- X < Y < ...
and also to look at higher terms first, that means ordering
- ... < X3 < X2 < X
and also
- X < Yk for all k.
There is some flexibility in ordering monomials, and this can be exploited in Gröbner basis theory.
Decimal fractions
For decimal fractions from the decimal point, a < b applies equivalently for the numerical order and the lexicographic order, provided that numbers with a recurring decimal 9 like .399999... are not included in the set of strings representing numbers. With that restriction there is an order-preserving bijection between the strings and the numbers.
Reverse lexicographic order
In a common variation of lexicographic order, one compares elements by reading from the right instead of from the left, i.e., the right-most component is the most significant, e.g. applied in a rhyming dictionary.
In the case of monomials one may sort the exponents downward, with the exponent of the first base variable as primary sort key, e.g.:
- .
Alternatively, sorting may be done by the sum of the exponents, downward.
See also
- Collation
- Colexicographical order
- Kleene–Brouwer order
- Lexicographic preferences
- Orders on the Cartesian product of totally ordered sets
- Lexicographic order on the Rn
- Lexicographic order topology on the unit square
- Long line (topology)
- Product order
- Lyndon word
- Lexicographically minimal string rotation
References
- ^ Calude, Cristian (1994). Information and randomness. An algorithmic perspective. EATCS Monographs on Theoretical Computer Science. Springer-Verlag. hlm. 1. ISBN 3-540-57456-5. Zbl 0922.68073.