WoWInterface

WoWInterface (https://www.wowinterface.com/forums/index.php)
-   Lua/XML Help (https://www.wowinterface.com/forums/forumdisplay.php?f=16)
-   -   String strip (https://www.wowinterface.com/forums/showthread.php?t=51804)

Resike 01-30-15 07:23 PM

String strip
 
Calling this function on strings which containts "ê" character will broke other string functions like string.len and they won't return anything.

Lua Code:
  1. print(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+()]+", ""), string.len(x))

Anyone know a hack to make this work properly?

Choonstertwo 01-30-15 08:09 PM

You could use a slightly less restrictive pattern that matches everything after the first dash:
lua Code:
  1. ("PLAYERNAME-Aggra(Português)"):gsub("%-.+", "")

You could also use the WoW-specific strsplit function:
lua Code:
  1. (strsplit("-", "PLAYERNAME-Aggra(Português)", 2))

Note the extra set of parentheses to discard all return values except the first.

Phanx 01-31-15 12:34 AM

@Choonster: There's nothing wrong with the gsub part. The problem is that Lua's string.len is not UTF8-aware. Blizzard does provide a strlenutf8 function that is UTF8-aware, however.

But the given example code should not work anyway, unless you've defined x somewhere else:
Code:

print(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+()]+", ""), string.len(x))

Banknorris 01-31-15 01:08 AM

Quote:

Originally Posted by Phanx (Post 305864)
There's nothing wrong with the gsub part.

The result of the gsub part is
Code:

PLAYERNAMEês)
Looks broken to me.
%a does not account for ê, only e or E (and other letters of course).

Resike 01-31-15 03:38 AM

Quote:

Originally Posted by Banknorris (Post 305865)
The result of the gsub part is
Code:

PLAYERNAMEês)
Looks broken to me.
%a does not account for ê, only e or E (and other letters of course).

It's stips the string properly for me, but i actually used the "%-[%a+é'()]+" pattern in my real function.

Resike 01-31-15 03:39 AM

Quote:

Originally Posted by Phanx (Post 305864)
@Choonster: There's nothing wrong with the gsub part. The problem is that Lua's string.len is not UTF8-aware. Blizzard does provide a strlenutf8 function that is UTF8-aware, however.

But the given example code should not work anyway, unless you've defined x somewhere else:
Code:

print(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+()]+", ""), string.len(x))

Yeah my bad:

Lua Code:
  1. print(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", ""), string.len(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", "")))

I think the string library should be always UTF8-ready. The Blizzard strxxx ones are not tho.

Resike 01-31-15 03:41 AM

Quote:

Originally Posted by Choonstertwo (Post 305858)
You could use a slightly less restrictive pattern that matches everything after the first dash:
lua Code:
  1. ("PLAYERNAME-Aggra(Português)"):gsub("%-.+", "")

You could also use the WoW-specific strsplit function:
lua Code:
  1. (strsplit("-", "PLAYERNAME-Aggra(Português)", 2))

Note the extra set of parentheses to discard all return values except the first.

The reason i used the %a+ pattern because sometime i also have to handle string like this:

"PetName-ServerName <OwnerName-ServerName>"

I havn't tried the built in strsplit but i don't think it would make a difference, usually the default Blizzard functions handle utf8 stuff even worse.

Resike 01-31-15 03:47 AM

I managed to make it work with a pattern like this:

Lua Code:
  1. print(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+êé'()]+", ""), string.len(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+êé'()]+", "")))

Returns: "PLAYERNAME", 10

The weird part is this function strips properly too, however it brokes the string.len:

Lua Code:
  1. print(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", ""), string.len(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", "")))

Returns: "PLAYERNAME"

And i only used the "é" pattern here to properly handle some French server names like: "Chants éternels".

Choonstertwo 01-31-15 04:41 AM

Quote:

Originally Posted by Resike (Post 305872)
The weird part is this function strips properly too, however it brokes the string.len:

Lua Code:
  1. print(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", ""), string.len(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", "")))

Returns: "PLAYERNAME"

And i only used the "é" pattern here to properly handle some French server names like: "Chants éternels".

This isn't actually stripping properly, it's just broken in such a way that it looks like it's working.

é and ê are both two-byte characters that share their first byte. When you use é in the pattern, you're actually using \195\169. Since Lua's string functions operate on bytes rather than UTF-8 characters, the first byte (195) matches the first byte of ê (\195\170) and leaves behind the second byte (170), which is invalid by itself. When WoW's print function encounters this invalid byte, it simply ignores anything after it.

This snippet escapes any bytes > 127 (the end of the ASCII-compatible section of UTF-8):
lua Code:
  1. local stripped = string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", "")
  2. local escaped = stripped:gsub(".", function(c) local b=c:byte()if b > 127 then return "\\" .. b end end)
  3. print(escaped, #stripped) -- Output: PLAYERNAME\170s) 13

Lombra 01-31-15 05:01 AM

Why not "%-[^ ]+"?

Resike 01-31-15 05:10 AM

Quote:

Originally Posted by Choonstertwo (Post 305873)
This isn't actually stripping properly, it's just broken in such a way that it looks like it's working.

é and ê are both two-byte characters that share their first byte. When you use é in the pattern, you're actually using \195\169. Since Lua's string functions operate on bytes rather than UTF-8 characters, the first byte (195) matches the first byte of ê (\195\170) and leaves behind the second byte (170), which is invalid by itself. When WoW's print function encounters this invalid byte, it simply ignores anything after it.

This snippet escapes any bytes > 127 (the end of the ASCII-compatible section of UTF-8):
lua Code:
  1. local stripped = string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", "")
  2. local escaped = stripped:gsub(".", function(c) local b=c:byte()if b > 127 then return "\\" .. b end end)
  3. print(escaped, #stripped) -- Output: PLAYERNAME\170s) 13

I guessed something like this would be in the background, i guess there is noting to do about it, since thats how gsub should work.

It's just a bad method to escape similar characters with the same starting byte, at least for this case.

Resike 01-31-15 05:11 AM

Quote:

Originally Posted by Lombra (Post 305874)
Why not "%-[^ ]+"?

It would strip down my ending ">" too.

Lombra 01-31-15 06:11 AM

Quote:

Originally Posted by Resike (Post 305876)
It would strip down my ending ">" too.

So you need to process the following:
Code:

PlayerName-ServerName
PetName-ServerName <OwnerName-ServerName>

and the result in either case should be what exactly?

Banknorris 01-31-15 07:46 AM

Lua Code:
  1. local function name_split(fullname)
  2.     local name,server,ownername
  3.     local ownerstring = fullname:match(" <(.+)>$")
  4.     if ownerstring then
  5.         name,server = fullname:match("([^-]+)-?(.*) <")
  6.         ownername = ownerstring:match("[^-]+")
  7.     else
  8.         name,server = fullname:match("([^-]+)-?(.*)")
  9.     end
  10.     return name,server,ownername
  11. end

Voyager 01-31-15 08:16 AM

Quote:

Originally Posted by Resike (Post 305876)
It would strip down my ending ">" too.

Try this one:
Code:

    local name = "PêtName-Aggra'mar(Português) <Ownêrname-Aggra'mar(Português)>"
    if strfind(name, " <") then
        name = gsub(name, "%-[^ ]+([ >])", "%1")
    else
        name = gsub(name, "%-[^ ]+", "")
    end

It returns "PêtName <Ownêrname>" for me.

Resike 01-31-15 08:25 AM

Quote:

Originally Posted by Lombra (Post 305877)
So you need to process the following:
Code:

PlayerName-ServerName
PetName-ServerName <OwnerName-ServerName>

and the result in either case should be what exactly?

PlayerName
PetName <OwnerName>

There are 4 special server names which could interfere mostly:

-Azjol-Nerub
-Aggra(Português)
-Blade's Edge
-Marécage de Zangar

Resike 01-31-15 08:30 AM

Quote:

Originally Posted by Voyager (Post 305882)
Try this one:
Code:

    local name = "PêtName-Aggra'mar(Português) <Ownêrname-Aggra'mar(Português)>"
    if strfind(name, " <") then
        name = gsub(name, "%-[^ ]+([ >])", "%1")
    else
        name = gsub(name, "%-[^ ]+", "")
    end

It returns "PêtName <Ownêrname>" for me.

I think this could work also, i'm just not sure it's faster then the single gsub call or not. Need to test it.

Voyager 01-31-15 09:48 AM

Quote:

Originally Posted by Resike (Post 305885)
I think this could work also, i'm just not sure it's faster then the single gsub call or not. Need to test it.

Seems like this could do the trick:
Code:

gsub(name, "%-[^ >]+", "")
Upvaluing gsub instead of using string.gsub or name:gsub is also faster.

Banknorris 01-31-15 11:07 AM

Code:

gsub(name,"%-[^ >]+","")
Very nice! What is the purpose of the % in this pattern? I tried without it "-[^ >]+" and it apparently still works.

Resike 01-31-15 11:26 AM

Quote:

Originally Posted by Voyager (Post 305888)
Seems like this could do the trick:
Code:

gsub(name, "%-[^ >]+", "")
Upvaluing gsub instead of using string.gsub or name:gsub is also faster.

Yeah this look nice, in the previous one the extra strfind call was the downside.


All times are GMT -6. The time now is 09:42 AM.

vBulletin © 2024, Jelsoft Enterprises Ltd
© 2004 - 2022 MMOUI