variable tipo sirve que para operaciones ejemplos definicion dato con caracteres cadenas cadena string sas string-comparison

string - tipo - Comparando cadenas con símbolos de diferentes alfabetos



tipo de dato string (2)

Me gustaría hacer eso:

Primero haría un mapa. Me refiero a qué letra en idioma ruso corresponde a qué letra en idioma inglés. Ejemplo:
б = b
v = v
...

Guardaría este mapa en una tabla separada o como macroVars. Luego crearía un macro loop, con la función tranwrd, que recorre el mapa, que fue creado.

El ejemplo aquí podría ser así.

data _null_; stringBefore = "без"; stringAfter = tranwrd(stringBefore,"а","a"); stringAfter = tranwrd(stringAfter,"б","b"); stringAfter = tranwrd(stringAfter,"в","v"); ... run;

Después de esta transformación, creo que puedes comparar tus cadenas.

Quiero comparar dos cadenas que contienen símbolos de diferentes alfabetos (por ejemplo, ruso e inglés). Quiero que los símbolos que se ven de manera similar se consideren iguales entre sí.

Por ejemplo, en la palabra "Mamá" la letra "o" es del alfabeto inglés (código 043E en Unicode), y en el mundo "Möm" la letra "о" es del alfabeto ruso (código 006F en Unicode). Entonces ("Mom" = "Mоm") => falso, pero quiero que sea cierto. ¿Hay alguna función SAS estándar o debería escribir una macro para hacerlo?

¡Gracias!


También codifiqué algunas funciones para tratar con erratas de diseño de teclado. Aquí está el código:

/***************************************************************************/ /* FUNCTION count_rus_letters RETURNS NUMBER OF CYRILLIC LETTERS IN STRING */ /***************************************************************************/ proc fcmp outlib=sasuser.userfuncs.mystring; FUNCTION count_rus_letters(string $); length letter $2; rus_count=0; len=klength(string); do i=1 to len; letter=ksubstr(string,i,1); if letter in ("А","а","Б","б","В","в","Г","г","Д","д","Е","е","Ё","ё","Ж","ж" "З","з","И","и","Й","й","К","к","Л","л","М","м","Н","н","О","о","П","п","Р","р", "С","с","Т","т","У","у","Ф","ф","Х","х","Ц","ц","Ч","ч","Ш","ш","Щ","щ","Ъ","ъ" "Ы","ы","Ь","ь","Э","э","Ю","ю","Я","я") then rus_count+1; end; return(rus_count); endsub; run; /**************************************************************************/ /* FUNCTION count_eng_letters RETURNS NUMBER OF ENGLISH LETTERS IN STRING */ /**************************************************************************/ proc fcmp outlib=sasuser.userfuncs.mystring; FUNCTION count_eng_letters(string $); length letter $2; eng_count=0; len=klength(string); do i=1 to len; letter=ksubstr(string,i,1); if rank(''A'') <= rank(letter) <=rank(''z'') then eng_count+1; end; return(eng_count); endsub; run; /**************************************************************************/ /* FUNCTION is_string_russian RETURNS 1 IF NUMBER OF RUSSIAN SYMBOLS IN */ /* STRING >= NUMBER OF ENGLISH SYMBOLS */ /**************************************************************************/ proc fcmp outlib=sasuser.userfuncs.mystring; FUNCTION is_string_russian(string $); length letter $2 result 8; eng_count=0; rus_count=0; len=klength(string); do i=1 to len; letter=ksubstr(string,i,1); if letter in ("А","а","Б","б","В","в","Г","г","Д","д","Е","е","Ё","ё","Ж","ж" "З","з","И","и","Й","й","К","к","Л","л","М","м","Н","н","О","о","П","п","Р","р", "С","с","Т","т","У","у","Ф","ф","Х","х","Ц","ц","Ч","ч","Ш","ш","Щ","щ","Ъ","ъ" "Ы","ы","Ь","ь","Э","э","Ю","ю","Я","я") then rus_count+1; if rank(''A'') <= rank(letter) <=rank(''z'') then eng_count+1; end; if rus_count>=eng_count then result=1; else result=0; return(result); endsub; run; /**************************************************************************/ /* FUNCTION fix_layout_misprints REPLACES MISPRINTED SYMBOLS BY ANALYSING */ /* LANGUAGE OF THE STRING (FOR ENGLISH STRING RUSSIAN SYMBOLS ARE */ /* REPLACED BY ENGLISH COPIES AND FOR RUSSIAN STRING SYMBOLS ARE */ /* REPLACED BY RUSSIAN COPIES) */ /**************************************************************************/ proc fcmp outlib=sasuser.userfuncs.mystring; FUNCTION fix_layout_misprints(string $) $ 1000; length letter $2 result $1000; eng_count=0; rus_count=0; len=klength(string); do i=1 to len; letter=ksubstr(string,i,1); if letter in ("А","а","Б","б","В","в","Г","г","Д","д","Е","е","Ё","ё","Ж","ж" "З","з","И","и","Й","й","К","к","Л","л","М","м","Н","н","О","о","П","п","Р","р", "С","с","Т","т","У","у","Ф","ф","Х","х","Ц","ц","Ч","ч","Ш","ш","Щ","щ","Ъ","ъ" "Ы","ы","Ь","ь","Э","э","Ю","ю","Я","я") then rus_count+1; if rank(''A'') <= rank(letter) <=rank(''z'') then eng_count+1; end; if rus_count>=eng_count then result=ktranslate(string,"АаВЕеКкМОоРрСсТХх","AaBEeKkMOoPpCcTXx"); else result=ktranslate(string,"AaBEeKkMOoPpCcTXx","АаВЕеКкМОоРрСсТХх"); return(result); endsub; run; /***********/ /* EXAMPLE */ /***********/ options cmplib=sasuser.userfuncs; data _null_; good_str="Иванов"; err_str="Ивaнов"; fixed_str=fix_layout_misprints(err_str); put "Good string=" good_str; put "Error string=" err_str; put "Fixed string=" fixed_str; rus_count_in_err=count_rus_letters(err_str); put "Count or Cyrillic symbols in error string=" rus_count_in_err; eng_count_in_err=count_eng_letters(err_str); put "Count or English symbols in error string=" eng_count_in_err; is_error_str_russian=is_string_russian(err_str); put "Is error string language Russian=" is_error_str_russian; if (good_str ne err_str) then put "Before clearing - strings are not equal to each other"; if (good_str = fixed_str) then put "After clearing - strings are equal to each other"; run;