Thread Tools Display Modes
01-30-15, 07:23 PM   #1
Resike
A Pyroguard Emberseer
AddOn Author - Click to view addons
Join Date: Mar 2010
Posts: 1,290
String strip

Calling this function on strings which containts "ê" character will broke other string functions like string.len and they won't return anything.

Lua Code:
  1. print(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+()]+", ""), string.len(x))

Anyone know a hack to make this work properly?
  Reply With Quote
01-30-15, 08:09 PM   #2
Choonstertwo
A Chromatic Dragonspawn
 
Choonstertwo's Avatar
AddOn Author - Click to view addons
Join Date: Jan 2011
Posts: 194
You could use a slightly less restrictive pattern that matches everything after the first dash:
lua Code:
  1. ("PLAYERNAME-Aggra(Português)"):gsub("%-.+", "")

You could also use the WoW-specific strsplit function:
lua Code:
  1. (strsplit("-", "PLAYERNAME-Aggra(Português)", 2))

Note the extra set of parentheses to discard all return values except the first.
  Reply With Quote
01-31-15, 12:34 AM   #3
Phanx
Cat.
 
Phanx's Avatar
AddOn Author - Click to view addons
Join Date: Mar 2006
Posts: 5,617
@Choonster: There's nothing wrong with the gsub part. The problem is that Lua's string.len is not UTF8-aware. Blizzard does provide a strlenutf8 function that is UTF8-aware, however.

But the given example code should not work anyway, unless you've defined x somewhere else:
Code:
print(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+()]+", ""), string.len(x))
__________________
Retired author of too many addons.
Message me if you're interested in taking over one of my addons.
Don’t message me about addon bugs or programming questions.
  Reply With Quote
01-31-15, 01:08 AM   #4
Banknorris
A Chromatic Dragonspawn
 
Banknorris's Avatar
AddOn Author - Click to view addons
Join Date: Oct 2014
Posts: 153
Originally Posted by Phanx View Post
There's nothing wrong with the gsub part.
The result of the gsub part is
Code:
PLAYERNAMEês)
Looks broken to me.
%a does not account for ê, only e or E (and other letters of course).
  Reply With Quote
01-31-15, 03:38 AM   #5
Resike
A Pyroguard Emberseer
AddOn Author - Click to view addons
Join Date: Mar 2010
Posts: 1,290
Originally Posted by Banknorris View Post
The result of the gsub part is
Code:
PLAYERNAMEês)
Looks broken to me.
%a does not account for ê, only e or E (and other letters of course).
It's stips the string properly for me, but i actually used the "%-[%a+é'()]+" pattern in my real function.

Last edited by Resike : 01-31-15 at 03:45 AM.
  Reply With Quote
01-31-15, 03:39 AM   #6
Resike
A Pyroguard Emberseer
AddOn Author - Click to view addons
Join Date: Mar 2010
Posts: 1,290
Originally Posted by Phanx View Post
@Choonster: There's nothing wrong with the gsub part. The problem is that Lua's string.len is not UTF8-aware. Blizzard does provide a strlenutf8 function that is UTF8-aware, however.

But the given example code should not work anyway, unless you've defined x somewhere else:
Code:
print(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+()]+", ""), string.len(x))
Yeah my bad:

Lua Code:
  1. print(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", ""), string.len(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", "")))

I think the string library should be always UTF8-ready. The Blizzard strxxx ones are not tho.

Last edited by Resike : 01-31-15 at 03:54 AM.
  Reply With Quote
01-31-15, 03:41 AM   #7
Resike
A Pyroguard Emberseer
AddOn Author - Click to view addons
Join Date: Mar 2010
Posts: 1,290
Originally Posted by Choonstertwo View Post
You could use a slightly less restrictive pattern that matches everything after the first dash:
lua Code:
  1. ("PLAYERNAME-Aggra(Português)"):gsub("%-.+", "")

You could also use the WoW-specific strsplit function:
lua Code:
  1. (strsplit("-", "PLAYERNAME-Aggra(Português)", 2))

Note the extra set of parentheses to discard all return values except the first.
The reason i used the %a+ pattern because sometime i also have to handle string like this:

"PetName-ServerName <OwnerName-ServerName>"

I havn't tried the built in strsplit but i don't think it would make a difference, usually the default Blizzard functions handle utf8 stuff even worse.
  Reply With Quote
01-31-15, 03:47 AM   #8
Resike
A Pyroguard Emberseer
AddOn Author - Click to view addons
Join Date: Mar 2010
Posts: 1,290
I managed to make it work with a pattern like this:

Lua Code:
  1. print(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+êé'()]+", ""), string.len(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+êé'()]+", "")))

Returns: "PLAYERNAME", 10

The weird part is this function strips properly too, however it brokes the string.len:

Lua Code:
  1. print(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", ""), string.len(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", "")))

Returns: "PLAYERNAME"

And i only used the "é" pattern here to properly handle some French server names like: "Chants éternels".

Last edited by Resike : 01-31-15 at 03:58 AM.
  Reply With Quote
01-31-15, 04:41 AM   #9
Choonstertwo
A Chromatic Dragonspawn
 
Choonstertwo's Avatar
AddOn Author - Click to view addons
Join Date: Jan 2011
Posts: 194
Originally Posted by Resike View Post
The weird part is this function strips properly too, however it brokes the string.len:

Lua Code:
  1. print(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", ""), string.len(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", "")))

Returns: "PLAYERNAME"

And i only used the "é" pattern here to properly handle some French server names like: "Chants éternels".
This isn't actually stripping properly, it's just broken in such a way that it looks like it's working.

é and ê are both two-byte characters that share their first byte. When you use é in the pattern, you're actually using \195\169. Since Lua's string functions operate on bytes rather than UTF-8 characters, the first byte (195) matches the first byte of ê (\195\170) and leaves behind the second byte (170), which is invalid by itself. When WoW's print function encounters this invalid byte, it simply ignores anything after it.

This snippet escapes any bytes > 127 (the end of the ASCII-compatible section of UTF-8):
lua Code:
  1. local stripped = string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", "")
  2. local escaped = stripped:gsub(".", function(c) local b=c:byte()if b > 127 then return "\\" .. b end end)
  3. print(escaped, #stripped) -- Output: PLAYERNAME\170s) 13
  Reply With Quote
01-31-15, 05:01 AM   #10
Lombra
A Molten Giant
 
Lombra's Avatar
AddOn Author - Click to view addons
Join Date: Nov 2006
Posts: 554
Why not "%-[^ ]+"?
__________________
Grab your sword and fight the Horde!
  Reply With Quote
01-31-15, 05:10 AM   #11
Resike
A Pyroguard Emberseer
AddOn Author - Click to view addons
Join Date: Mar 2010
Posts: 1,290
Originally Posted by Choonstertwo View Post
This isn't actually stripping properly, it's just broken in such a way that it looks like it's working.

é and ê are both two-byte characters that share their first byte. When you use é in the pattern, you're actually using \195\169. Since Lua's string functions operate on bytes rather than UTF-8 characters, the first byte (195) matches the first byte of ê (\195\170) and leaves behind the second byte (170), which is invalid by itself. When WoW's print function encounters this invalid byte, it simply ignores anything after it.

This snippet escapes any bytes > 127 (the end of the ASCII-compatible section of UTF-8):
lua Code:
  1. local stripped = string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", "")
  2. local escaped = stripped:gsub(".", function(c) local b=c:byte()if b > 127 then return "\\" .. b end end)
  3. print(escaped, #stripped) -- Output: PLAYERNAME\170s) 13
I guessed something like this would be in the background, i guess there is noting to do about it, since thats how gsub should work.

It's just a bad method to escape similar characters with the same starting byte, at least for this case.

Last edited by Resike : 01-31-15 at 05:14 AM.
  Reply With Quote
01-31-15, 05:11 AM   #12
Resike
A Pyroguard Emberseer
AddOn Author - Click to view addons
Join Date: Mar 2010
Posts: 1,290
Originally Posted by Lombra View Post
Why not "%-[^ ]+"?
It would strip down my ending ">" too.
  Reply With Quote
01-31-15, 06:11 AM   #13
Lombra
A Molten Giant
 
Lombra's Avatar
AddOn Author - Click to view addons
Join Date: Nov 2006
Posts: 554
Originally Posted by Resike View Post
It would strip down my ending ">" too.
So you need to process the following:
Code:
PlayerName-ServerName
PetName-ServerName <OwnerName-ServerName>
and the result in either case should be what exactly?
__________________
Grab your sword and fight the Horde!
  Reply With Quote
01-31-15, 07:46 AM   #14
Banknorris
A Chromatic Dragonspawn
 
Banknorris's Avatar
AddOn Author - Click to view addons
Join Date: Oct 2014
Posts: 153
Lua Code:
  1. local function name_split(fullname)
  2.     local name,server,ownername
  3.     local ownerstring = fullname:match(" <(.+)>$")
  4.     if ownerstring then
  5.         name,server = fullname:match("([^-]+)-?(.*) <")
  6.         ownername = ownerstring:match("[^-]+")
  7.     else
  8.         name,server = fullname:match("([^-]+)-?(.*)")
  9.     end
  10.     return name,server,ownername
  11. end

Last edited by Banknorris : 01-31-15 at 10:12 AM.
  Reply With Quote
01-31-15, 08:16 AM   #15
Voyager
A Fallenroot Satyr
AddOn Author - Click to view addons
Join Date: Dec 2009
Posts: 22
Originally Posted by Resike View Post
It would strip down my ending ">" too.
Try this one:
Code:
    local name = "PêtName-Aggra'mar(Português) <Ownêrname-Aggra'mar(Português)>"
    if strfind(name, " <") then
        name = gsub(name, "%-[^ ]+([ >])", "%1")
    else
        name = gsub(name, "%-[^ ]+", "")
    end
It returns "PêtName <Ownêrname>" for me.

Last edited by Voyager : 01-31-15 at 08:25 AM.
  Reply With Quote
01-31-15, 08:25 AM   #16
Resike
A Pyroguard Emberseer
AddOn Author - Click to view addons
Join Date: Mar 2010
Posts: 1,290
Originally Posted by Lombra View Post
So you need to process the following:
Code:
PlayerName-ServerName
PetName-ServerName <OwnerName-ServerName>
and the result in either case should be what exactly?
PlayerName
PetName <OwnerName>

There are 4 special server names which could interfere mostly:

-Azjol-Nerub
-Aggra(Português)
-Blade's Edge
-Marécage de Zangar

Last edited by Resike : 01-31-15 at 08:31 AM.
  Reply With Quote
01-31-15, 08:30 AM   #17
Resike
A Pyroguard Emberseer
AddOn Author - Click to view addons
Join Date: Mar 2010
Posts: 1,290
Originally Posted by Voyager View Post
Try this one:
Code:
    local name = "PêtName-Aggra'mar(Português) <Ownêrname-Aggra'mar(Português)>"
    if strfind(name, " <") then
        name = gsub(name, "%-[^ ]+([ >])", "%1")
    else
        name = gsub(name, "%-[^ ]+", "")
    end
It returns "PêtName <Ownêrname>" for me.
I think this could work also, i'm just not sure it's faster then the single gsub call or not. Need to test it.
  Reply With Quote
01-31-15, 09:48 AM   #18
Voyager
A Fallenroot Satyr
AddOn Author - Click to view addons
Join Date: Dec 2009
Posts: 22
Originally Posted by Resike View Post
I think this could work also, i'm just not sure it's faster then the single gsub call or not. Need to test it.
Seems like this could do the trick:
Code:
gsub(name, "%-[^ >]+", "")
Upvaluing gsub instead of using string.gsub or name:gsub is also faster.
  Reply With Quote
01-31-15, 11:07 AM   #19
Banknorris
A Chromatic Dragonspawn
 
Banknorris's Avatar
AddOn Author - Click to view addons
Join Date: Oct 2014
Posts: 153
Code:
gsub(name,"%-[^ >]+","")
Very nice! What is the purpose of the % in this pattern? I tried without it "-[^ >]+" and it apparently still works.
  Reply With Quote
01-31-15, 11:26 AM   #20
Resike
A Pyroguard Emberseer
AddOn Author - Click to view addons
Join Date: Mar 2010
Posts: 1,290
Originally Posted by Voyager View Post
Seems like this could do the trick:
Code:
gsub(name, "%-[^ >]+", "")
Upvaluing gsub instead of using string.gsub or name:gsub is also faster.
Yeah this look nice, in the previous one the extra strfind call was the downside.
  Reply With Quote

WoWInterface » Developer Discussions » Lua/XML Help » String strip

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off