algorithm - sincronización - ¿Alguien tiene un buen algoritmo de Caso Propio?

semaforos sistemas operativos pdf (12)

¿Alguien tiene un algoritmo de PCase o PCase de confianza confiable (similar a un UCase o Upper)? Estoy buscando algo que tenga un valor como "GEORGE BURDELL" o "george burdell" y lo convierta en "George Burdell" .

Tengo uno simple que maneja los casos simples. Lo ideal sería tener algo que pueda manejar cosas como "O''REILLY" y convertirlo en "O''Reilly" , pero sé que es más difícil.

Me enfoco principalmente en el idioma inglés si eso simplifica las cosas.

ACTUALIZACIÓN: estoy usando C # como el idioma, pero puedo convertir casi cualquier cosa (suponiendo que existe la funcionalidad).

Estoy de acuerdo en que el McDonald''s scneario es difícil. Quise mencionar eso junto con mi ejemplo de O''Reilly, pero no en la publicación original.

¿Qué lenguaje de programación usas? Muchos idiomas permiten funciones de devolución de llamada para las coincidencias de expresiones regulares. Estos se pueden usar para adaptar la coincidencia fácilmente. La expresión regular que se usaría es bastante simple, solo tienes que hacer coincidir todos los caracteres de las palabras, así:

//w+/

Alternativamente, ya puede extraer el primer carácter para que sea una coincidencia adicional:

/(/w)(/w*)/

Ahora puede acceder al primer personaje y a los personajes sucesivos en el partido por separado. La función de devolución de llamada puede simplemente devolver una concatenación de los hits. En pseudo Python (en realidad no conozco Python):

def make_proper(match): return match[1].to_upper + match[2]

Dicho sea de paso, esto también manejaría el caso de "O''Reilly" porque "O" y "Reilly" se emparejarían por separado y ambos serían apropiados. Sin embargo, hay otros casos especiales que el algoritmo no maneja bien, por ejemplo, "McDonald''s" o, en general, cualquier palabra apostroficada. El algoritmo produciría "Mcdonald''S" para este último. Se podría implementar un manejo especial para el apóstrofo, pero eso interferiría con el primer caso. Encontrar una solución perfecta no es posible. En la práctica, podría ser útil considerar la longitud de la parte después del apóstrofo.

Aquí hay una implementación de C # quizás ingenua:

public class ProperCaseHelper { public string ToProperCase(string input) { string ret = string.Empty; var words = input.Split('' ''); for (int i = 0; i < words.Length; ++i) { ret += wordToProperCase(words[i]); if (i < words.Length - 1) ret += " "; } return ret; } private string wordToProperCase(string word) { if (string.IsNullOrEmpty(word)) return word; // Standard case string ret = capitaliseFirstLetter(word); // Special cases: ret = properSuffix(ret, "''"); ret = properSuffix(ret, "."); ret = properSuffix(ret, "Mc"); ret = properSuffix(ret, "Mac"); return ret; } private string properSuffix(string word, string prefix) { if(string.IsNullOrEmpty(word)) return word; string lowerWord = word.ToLower(), lowerPrefix = prefix.ToLower(); if (!lowerWord.Contains(lowerPrefix)) return word; int index = lowerWord.IndexOf(lowerPrefix); // If the search string is at the end of the word ignore. if (index + prefix.Length == word.Length) return word; return word.Substring(0, index) + prefix + capitaliseFirstLetter(word.Substring(index + prefix.Length)); } private string capitaliseFirstLetter(string word) { return char.ToUpper(word[0]) + word.Substring(1).ToLower(); } }

No mencionas en qué idioma deseas la solución, así que aquí hay un pseudo código.

Loop through each character If the previous character was an alphabet letter Make the character lower case Otherwise Make the character upper case End loop

También existe este limpio script de Perl para el texto de la cubierta del título.

http://daringfireball.net/2008/08/title_case_update

Pero parece que con el caso propio quieres decir ... solo para los nombres de las personas.

una forma simple de escribir en mayúscula la primera letra de cada palabra (separada por un espacio)

$words = explode(” “, $string); for ($i=0; $i<count($words); $i++) { $s = strtolower($words[$i]); $s = substr_replace($s, strtoupper(substr($s, 0, 1)), 0, 1); $result .= “$s “; } $string = trim($result);

en términos de capturar el ejemplo "O''REILLY" que diste dividir la cadena en ambos espacios y "no funcionaría, ya que capitalizaría cualquier letra que apareciera después de un apostraphe, es decir, la s en Fred''s

así que probablemente intente algo así como

$words = explode(” “, $string); for ($i=0; $i<count($words); $i++) { $s = strtolower($words[$i]); if (substr($s, 0, 2) === "o''"){ $s = substr_replace($s, strtoupper(substr($s, 0, 3)), 0, 3); }else{ $s = substr_replace($s, strtoupper(substr($s, 0, 1)), 0, 1); } $result .= “$s “; } $string = trim($result);

Esto debería atrapar a O''Reilly, O''Clock, O''Donnell, etc. espero que ayude

Tenga en cuenta que este código no ha sido probado.

Kronoz, gracias. Encontré en su función que la línea:

`if (!lowerWord.Contains(lowerPrefix)) return word`;

debo decir

if (!lowerWord.StartsWith(lowerPrefix)) return word;

por lo que "información" no se cambia a "InforMacIón"

mejor,

Enrique

Lo uso como el controlador de eventos text-change de cuadros de texto. Entrada de soporte de "McDonald"

Public Shared Function DoProperCaseConvert(ByVal str As String, Optional ByVal allowCapital As Boolean = True) As String Dim strCon As String = "" Dim wordbreak As String = " ,.1234567890;//-()#$%^&*€!~+=@" Dim nextShouldBeCapital As Boolean = True ''Improve to recognize all caps input ''If str.Equals(str.ToUpper) Then '' str = str.ToLower ''End If For Each s As Char In str.ToCharArray If allowCapital Then strCon = strCon & If(nextShouldBeCapital, s.ToString.ToUpper, s) Else strCon = strCon & If(nextShouldBeCapital, s.ToString.ToUpper, s.ToLower) End If If wordbreak.Contains(s.ToString) Then nextShouldBeCapital = True Else nextShouldBeCapital = False End If Next Return strCon End Function

@Zack: Lo publicaré como una respuesta separada.

Aquí hay un ejemplo basado en la publicación de kronoz.

void Main() { List<string> names = new List<string>() { "bill o''reilly", "johannes diderik van der waals", "mr. moseley-williams", "Joe VanWyck", "mcdonald''s", "william the third", "hrh prince charles", "h.r.m. queen elizabeth the third", "william gates, iii", "pope leo xii", "a.k. jennings" }; names.Select(name => name.ToProperCase()).Dump(); } // Define other methods and classes here // http://.com/questions/32149/does-anyone-have-a-good-proper-case-algorithm public static class ProperCaseHelper { public static string ToProperCase(this string input) { if (IsAllUpperOrAllLower(input)) { // fix the ALL UPPERCASE or all lowercase names return string.Join(" ", input.Split('' '').Select(word => wordToProperCase(word))); } else { // leave the CamelCase or Propercase names alone return input; } } public static bool IsAllUpperOrAllLower(this string input) { return (input.ToLower().Equals(input) || input.ToUpper().Equals(input) ); } private static string wordToProperCase(string word) { if (string.IsNullOrEmpty(word)) return word; // Standard case string ret = capitaliseFirstLetter(word); // Special cases: ret = properSuffix(ret, "''"); // D''Artagnon, D''Silva ret = properSuffix(ret, "."); // ??? ret = properSuffix(ret, "-"); // Oscar-Meyer-Weiner ret = properSuffix(ret, "Mc"); // Scots ret = properSuffix(ret, "Mac"); // Scots // Special words: ret = specialWords(ret, "van"); // Dick van Dyke ret = specialWords(ret, "von"); // Baron von Bruin-Valt ret = specialWords(ret, "de"); ret = specialWords(ret, "di"); ret = specialWords(ret, "da"); // Leonardo da Vinci, Eduardo da Silva ret = specialWords(ret, "of"); // The Grand Old Duke of York ret = specialWords(ret, "the"); // William the Conqueror ret = specialWords(ret, "HRH"); // His/Her Royal Highness ret = specialWords(ret, "HRM"); // His/Her Royal Majesty ret = specialWords(ret, "H.R.H."); // His/Her Royal Highness ret = specialWords(ret, "H.R.M."); // His/Her Royal Majesty ret = dealWithRomanNumerals(ret); // William Gates, III return ret; } private static string properSuffix(string word, string prefix) { if(string.IsNullOrEmpty(word)) return word; string lowerWord = word.ToLower(); string lowerPrefix = prefix.ToLower(); if (!lowerWord.Contains(lowerPrefix)) return word; int index = lowerWord.IndexOf(lowerPrefix); // If the search string is at the end of the word ignore. if (index + prefix.Length == word.Length) return word; return word.Substring(0, index) + prefix + capitaliseFirstLetter(word.Substring(index + prefix.Length)); } private static string specialWords(string word, string specialWord) { if(word.Equals(specialWord, StringComparison.InvariantCultureIgnoreCase)) { return specialWord; } else { return word; } } private static string dealWithRomanNumerals(string word) { List<string> ones = new List<string>() { "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX" }; List<string> tens = new List<string>() { "X", "XX", "XXX", "XL", "L", "LX", "LXX", "LXXX", "XC", "C" }; // assume nobody uses hundreds foreach (string number in ones) { if (word.Equals(number, StringComparison.InvariantCultureIgnoreCase)) { return number; } } foreach (string ten in tens) { foreach (string one in ones) { if (word.Equals(ten + one, StringComparison.InvariantCultureIgnoreCase)) { return ten + one; } } } return word; } private static string capitaliseFirstLetter(string word) { return char.ToUpper(word[0]) + word.Substring(1).ToLower(); } }

Lo escribí hoy para implementarlo en una aplicación en la que estoy trabajando. Creo que este código es bastante auto explicativo con comentarios. No es 100% exacto en todos los casos, pero manejará la mayoría de tus nombres occidentales fácilmente.

Ejemplos:

mary-jane => Mary-Jane

o''brien => O''Brien

Joël VON WINTEREGG => Joël von Winteregg

jose de la acosta => Jose de la Acosta

El código es extensible, ya que puede agregar cualquier valor de cadena a las matrices en la parte superior para satisfacer sus necesidades. Estudie y agregue cualquier característica especial que pueda ser necesaria.

A menos que haya entendido mal tu pregunta, no creo que debas hacer tu propia, la clase TextInfo puede hacerlo por ti.

using System.Globalization; CultureInfo.InvariantCulture.TextInfo.ToTitleCase("GeOrGE bUrdEll")

Devolveré "George Burdell. Y puedes usar tu propia cultura si hay algunas reglas especiales involucradas.

Actualización: Michael (en un comentario a esta respuesta) señaló que esto no funcionará si la entrada es total, ya que el método asumirá que es un acrónimo. La solución ingenua para esto es .ToLower () el texto antes de enviarlo a ToTitleCase.

Hice un puerto rápido de C # de https://github.com/tamtamchik/namecase , que se basa en Lingua :: EN :: NameCase.

public static class NameCase { static Dictionary<string, string> _exceptions = new Dictionary<string, string> { {@"/bMacEdo" ,"Macedo"}, {@"/bMacEvicius" ,"Macevicius"}, {@"/bMacHado" ,"Machado"}, {@"/bMacHar" ,"Machar"}, {@"/bMacHin" ,"Machin"}, {@"/bMacHlin" ,"Machlin"}, {@"/bMacIas" ,"Macias"}, {@"/bMacIulis" ,"Maciulis"}, {@"/bMacKie" ,"Mackie"}, {@"/bMacKle" ,"Mackle"}, {@"/bMacKlin" ,"Macklin"}, {@"/bMacKmin" ,"Mackmin"}, {@"/bMacQuarie" ,"Macquarie"} }; static Dictionary<string, string> _replacements = new Dictionary<string, string> { {@"/bAl(?=/s+/w)" , @"al"}, // al Arabic or forename Al. {@"/b(Bin|Binti|Binte)/b" , @"bin"}, // bin, binti, binte Arabic {@"/bAp/b" , @"ap"}, // ap Welsh. {@"/bBen(?=/s+/w)" , @"ben"}, // ben Hebrew or forename Ben. {@"/bDell([ae])/b" , @"dell$1"}, // della and delle Italian. {@"/bD([aeiou])/b" , @"d$1"}, // da, de, di Italian; du French; do Brasil {@"/bD([ao]s)/b" , @"d$1"}, // das, dos Brasileiros {@"/bDe([lrn])/b" , @"de$1"}, // del Italian; der/den Dutch/Flemish. {@"/bEl/b" , @"el"}, // el Greek or El Spanish. {@"/bLa/b" , @"la"}, // la French or La Spanish. {@"/bL([eo])/b" , @"l$1"}, // lo Italian; le French. {@"/bVan(?=/s+/w)" , @"van"}, // van German or forename Van. {@"/bVon/b" , @"von"} // von Dutch/Flemish }; static string[] _conjunctions = { "Y", "E", "I" }; static string _romanRegex = @"/b((?:[Xx]{1,3}|[Xx][Ll]|[Ll][Xx]{0,3})?(?:[Ii]{1,3}|[Ii][VvXx]|[Vv][Ii]{0,3})?)/b"; /// <summary> /// Case a name field into it''s approrpiate case format /// e.g. Smith, de la Cruz, Mary-Jane, O''Brien, McTaggart /// </summary> /// <param name="nameString"></param> /// <returns></returns> public static string NameCase(string nameString) { // Capitalize nameString = Capitalize(nameString); nameString = UpdateIrish(nameString); // Fixes for "son (daughter) of" etc foreach (var replacement in _replacements.Keys) { if (Regex.IsMatch(nameString, replacement)) { Regex rgx = new Regex(replacement); nameString = rgx.Replace(nameString, _replacements[replacement]); } } nameString = UpdateRoman(nameString); nameString = FixConjunction(nameString); return nameString; } /// <summary> /// Capitalize first letters. /// </summary> /// <param name="nameString"></param> /// <returns></returns> private static string Capitalize(string nameString) { nameString = nameString.ToLower(); nameString = Regex.Replace(nameString, @"/b/w", x => x.ToString().ToUpper()); nameString = Regex.Replace(nameString, @"''/w/b", x => x.ToString().ToLower()); // Lowercase ''s return nameString; } /// <summary> /// Update for Irish names. /// </summary> /// <param name="nameString"></param> /// <returns></returns> private static string UpdateIrish(string nameString) { if(Regex.IsMatch(nameString, @".*?/bMac[A-Za-z^aciozj]{2,}/b") || Regex.IsMatch(nameString, @".*?/bMc")) { nameString = UpdateMac(nameString); } return nameString; } /// <summary> /// Updates irish Mac & Mc. /// </summary> /// <param name="nameString"></param> /// <returns></returns> private static string UpdateMac(string nameString) { MatchCollection matches = Regex.Matches(nameString, @"/b(Ma?c)([A-Za-z]+)"); if(matches.Count == 1 && matches[0].Groups.Count == 3) { string replacement = matches[0].Groups[1].Value; replacement += matches[0].Groups[2].Value.Substring(0, 1).ToUpper(); replacement += matches[0].Groups[2].Value.Substring(1); nameString = nameString.Replace(matches[0].Groups[0].Value, replacement); // Now fix "Mac" exceptions foreach (var exception in _exceptions.Keys) { nameString = Regex.Replace(nameString, exception, _exceptions[exception]); } } return nameString; } /// <summary> /// Fix roman numeral names. /// </summary> /// <param name="nameString"></param> /// <returns></returns> private static string UpdateRoman(string nameString) { MatchCollection matches = Regex.Matches(nameString, _romanRegex); if (matches.Count > 1) { foreach(Match match in matches) { if(!string.IsNullOrEmpty(match.Value)) { nameString = Regex.Replace(nameString, match.Value, x => x.ToString().ToUpper()); } } } return nameString; } /// <summary> /// Fix Spanish conjunctions. /// </summary> /// <param name=""></param> /// <returns></returns> private static string FixConjunction(string nameString) { foreach (var conjunction in _conjunctions) { nameString = Regex.Replace(nameString, @"/b" + conjunction + @"/b", x => x.ToString().ToLower()); } return nameString; } }

Este es mi método de prueba, todo parece pasar bien:

[TestMethod] public void Test_NameCase_1() { string[] names = { "Keith", "Yuri''s", "Leigh-Williams", "McCarthy", // Mac exceptions "Machin", "Machlin", "Machar", "Mackle", "Macklin", "Mackie", "Macquarie", "Machado", "Macevicius", "Maciulis", "Macias", "MacMurdo", // General "O''Callaghan", "St. John", "von Streit", "van Dyke", "Van", "ap Llwyd Dafydd", "al Fahd", "Al", "el Grecco", "ben Gurion", "Ben", "da Vinci", "di Caprio", "du Pont", "de Legate", "del Crond", "der Sind", "van der Post", "van den Thillart", "von Trapp", "la Poisson", "le Figaro", "Mack Knife", "Dougal MacDonald", "Ruiz y Picasso", "Dato e Iradier", "Mas i Gavarró", // Roman numerals "Henry VIII", "Louis III", "Louis XIV", "Charles II", "Fred XLIX", "Yusof bin Ishak", }; foreach(string name in names) { string name_upper = name.ToUpper(); string name_cased = CIQNameCase.NameCase(name_upper); Console.WriteLine(string.Format("name: {0} -> {1} -> {2}", name, name_upper, name_cased)); Assert.IsTrue(name == name_cased); } }

Muchas buenas respuestas aquí. El mío es bastante simple y solo toma en cuenta los nombres que tenemos en nuestra organización. Puede expandirlo como lo desee. Esta no es una solución perfecta y cambiará Vancouver a VanCouver, lo cual es incorrecto. Así que ajústelo si lo usa.

Aquí estaba mi solución en C #. Esto codifica los nombres en el programa pero con un poco de trabajo puede mantener un archivo de texto fuera del programa y leer en el nombre de excepciones (es decir, Van, Mc, Mac) y recorrerlos.

public static String toProperName(String name) { if (name != null) { if (name.Length >= 2 && name.ToLower().Substring(0, 2) == "mc") // Changes mcdonald to "McDonald" return "Mc" + Regex.Replace(name.ToLower().Substring(2), @"/b[a-z]", m => m.Value.ToUpper()); if (name.Length >= 3 && name.ToLower().Substring(0, 3) == "van") // Changes vanwinkle to "VanWinkle" return "Van" + Regex.Replace(name.ToLower().Substring(3), @"/b[a-z]", m => m.Value.ToUpper()); return Regex.Replace(name.ToLower(), @"/b[a-z]", m => m.Value.ToUpper()); // Changes to title case but also fixes // appostrophes like O''HARE or o''hare to O''Hare } return ""; }