I'm having trouble getting my head around conditionals in regex (standard info). For example, here's a query for parsing opuses out of the titles of classical pieces. Ultimately I want to make an UPDATE out of it, but the SELECT suffices to see what's going on. The parts I'm interested in are in the first lines of the hairy, concatenated regex patterns. In reality, I don't think I want all of these groups, but it helps to see what's happening if they're there now.
The query's three columns are:
- ok: A working demonstration of a parsed title using a lookahead in the first group to capture everything before "Op", "BMW", or the like. Heaven knows how hairy this will grow when trying to work with many albums, and the options in the first line of reg.lookahead need to be written twice the way I have it now.
- notok: A conditional approach that fails in all cases. I'm trying to catch everything before the opus by seeing if the opus group participates in the overall match. I haven't been able to do this.
- worksortorder: A sorting field formed from the numerical parts of the opus and the number, if present. By that I mean that "Op. 1" will yield "0001-00" and "Op. 2 No. 1" is "0002-01".
Here's the query. It looks at everything in the active playlist.
WITH re AS (SELECT '^(.*(?=[Oo]p|BMW|bmw|KV?))?'
|| '(([Oo]p|BMW|bmw|KV?)\.? ?(?:[Pp]osth?\.? ?)?(\d+))?'
|| ' ?([Nn]o\.? (\d+))?(.*)$'
AS lookahead
, '^((?(2).*))' -- This is the broken part
|| '(([Oo]p|BMW|bmw|KV?)\.? ?(?:[Pp]osth?\.? ?)?(\d+))?'
|| ' ?([Nn]o\.? (\d+))?(.*)$'
AS conditional
)
SELECT regexp_replace(re.lookahead, '$1—$2—$3—$4—$5—$6—$7', title)
AS ok
, regexp_replace(re.conditional, '$1—$2—$3—$4—$5—$6—$7', title)
AS notok
, nullif(padl(regexp_replace(re.lookahead, '$4', title), 4, '0')
|| '-'
|| padl(regexp_replace(re.lookahead, '$6', title), 2, '0'), '0000-00')
AS worksortorder
FROM Playlist, re
WHERE playlist_index = active_playlist()
And here's some sample output. I've omitted the notok column for space considerations, but it fails in every row as the second and third rows of the ok column do here—lacking opuses altogether. Basically, nothing is found and the whole title goes into the final (.*) group. That is how I want such lines handled, but not every line! I've also edited the "No. 1" out of the first part of "Op. 9"'s title just to see what happens.
ok worksortorder
Nocturne in E Minor, —Op. Post. 72—Op—72—No. 1—1—: Andante 0072-01
——————Nocturne in C-sharp minor (1830): Lento con gran espressione
——————Nocturne in C minor (1837)
3 Nocturnes, —Op. 9—Op—9———in B-flat minor: Larghetto 0009-00
3 Nocturnes, —Op. 9—Op—9—No. 2—2— in E-flat major: Andante 0009-02
3 Nocturnes, —Op. 9—Op—9—No. 3—3— in B major: Allegretto 0009-03
3 Nocturnes, —Op. 15—Op—15—No. 1—1— in F major: Andante cantabile 0015-01
3 Nocturnes, —Op. 15—Op—15—No. 2—2— in F-sharp major: Larghetto 0015-02
3 Nocturnes, —Op. 15—Op—15—No. 3—3— in G minor: Lento 0015-03
2 Nocturnes, —Op. 27—Op—27—No. 1—1— in C-sharp minor: Larghetto 0027-01
2 Nocturnes, —Op. 27—Op—27—No. 2—2— in D-flat major: Lento sostenuto 0027-02
2 Nocturnes, —Op. 32—Op—32—No. 1—1— in B major: Andante sostenuto 0032-01
2 Nocturnes, —Op. 32—Op—32—No. 2—2— in A-flat major: Lento 0032-02
2 Nocturnes, —Op. 37—Op—37—No. 1—1— in G minor: Andante sostenuto 0037-01
2 Nocturnes, —Op. 37—Op—37—No. 2—2— in G major: Andantino 0037-02
2 Nocturnes, —Op. 48—Op—48—No. 1—1— in C minor: Lento 0048-01
2 Nocturnes, —Op. 48—Op—48—No. 2—2— in F-sharp minor: Andantino 0048-02
2 Nocturnes, —Op. 55—Op—55—No. 1—1— in F minor: Andante 0055-01
2 Nocturnes, —Op. 55—Op—55—No. 2—2— in E-flat major: Lento sostenuto 0055-02
2 Nocturnes, —Op. 62—Op—62—No. 1—1— in B major: Andante 0062-01
2 Nocturnes, —Op. 62—Op—62—No. 2—2— in E major: Lento 0062-02
So, any tips on how I can get that conditional approach to work? Barring that, are there any other good ways to accomplish this sort of thing? I imagine that better minds than mine have done something like this.