guardar - manejo de cadenas en java

Explicar el uso de un vector de bits para determinar si todos los caracteres son únicos (10)

Creo que todas estas respuestas sí explican cómo funciona esto; sin embargo, quise dar mi opinión sobre cómo lo vi mejor, al cambiar el nombre de algunas variables, agregar algunas otras y agregarle comentarios:

public static boolean isUniqueChars(String str) { /* checker is the bit array, it will have a 1 on the character index that has appeared before and a 0 if the character has not appeared, you can see this number initialized as 32 0 bits: 00000000 00000000 00000000 00000000 */ int checker = 0; //loop through each String character for (int i = 0; i < str.length(); ++i) { /* a through z in ASCII are charactets numbered 97 through 122, 26 characters total with this, you get a number between 0 and 25 to represent each character index 0 for ''a'' and 25 for ''z'' renamed ''val'' as ''characterIndex'' to be more descriptive */ int characterIndex = str.charAt(i) - ''a''; //char ''a'' would get 0 and char ''z'' would get 26 /* created a new variable to make things clearer ''singleBitOnPosition'' It is used to calculate a number that represents the bit value of having that character index as a 1 and the rest as a 0, this is achieved by getting the single digit 1 and shifting it to the left as many times as the character index requires e.g. character ''d'' 00000000 00000000 00000000 00000001 Shift 3 spaces to the left (<<) because ''d'' index is number 3 1 shift: 00000000 00000000 00000000 00000010 2 shift: 00000000 00000000 00000000 00000100 3 shift: 00000000 00000000 00000000 00001000 Therefore the number representing ''d'' is 00000000 00000000 00000000 00001000 */ int singleBitOnPosition = 1 << characterIndex; /* This peforms an AND between the checker, which is the bit array containing everything that has been found before and the number representing the bit that will be turned on for this particular character. e.g. if we have already seen ''a'', ''b'' and ''d'', checker will have: checker = 00000000 00000000 00000000 00001011 And if we see ''b'' again: ''b'' = 00000000 00000000 00000000 00000010 it will do the following: 00000000 00000000 00000000 00001011 & (AND) 00000000 00000000 00000000 00000010 ----------------------------------- 00000000 00000000 00000000 00000010 Since this number is different than ''0'' it means that the character was seen before, because on that character index we already have a 1 bit value */ if ((checker & singleBitOnPosition) > 0) { return false; } /* Remember that checker |= singleBitOnPosition is the same as checker = checker | singleBitOnPosition Sometimes it is easier to see it expanded like that. What this achieves is that it builds the checker to have the new value it hasnt seen, by doing an OR between checker and the value representing this character index as a 1. e.g. If the character is ''f'' and the checker has seen ''g'' and ''a'', the following will happen ''f'' = 00000000 00000000 00000000 00100000 checker(seen ''a'' and ''g'' so far) = 00000000 00000000 00000000 01000001 00000000 00000000 00000000 00100000 | (OR) 00000000 00000000 00000000 01000001 ----------------------------------- 00000000 00000000 00000000 01100001 Therefore getting a new checker as 00000000 00000000 00000000 01100001 */ checker |= singleBitOnPosition; } return true; }

Estoy confundido acerca de cómo funcionaría un poco el vector para hacer esto (no demasiado familiar con los vectores de bits). Aquí está el código dado. ¿Podría alguien guiarme por esto?

public static boolean isUniqueChars(String str) { int checker = 0; for (int i = 0; i < str.length(); ++i) { int val = str.charAt(i) - ''a''; if ((checker & (1 << val)) > 0) return false; checker |= (1 << val); } return true; }

Particularmente, ¿qué está haciendo el checker ?

Hay un par de respuestas excelentes ya proporcionadas arriba. Entonces no quiero repetir todo lo que ya dije. Pero si quería agregar algunas cosas para ayudar con el programa anterior, ya que trabajé en el mismo programa y tuve algunas preguntas, pero después de pasar un tiempo, tengo más claridad sobre este programa.

Primero que nada, "checker" se usa para seguir al personaje que ya está atravesado en String para ver si hay algún personaje que se repita.

Ahora "checker" es un tipo de datos int, por lo que solo puede tener 32 bits o 4 bytes (dependiendo de la plataforma), por lo que este programa solo puede funcionar correctamente para un juego de caracteres dentro de un rango de 32 caracteres. Esa es la razón, este programa resta "a" de cada personaje para hacer que este programa se ejecute solo para caracteres en minúsculas. Sin embargo, si mezcla caracteres de mayúsculas y minúsculas, entonces no funcionaría.

Por cierto, si no resta "a" de cada carácter (ver la declaración a continuación), este programa funcionará correctamente solo para Cadena con mayúsculas o Cadena con solo minúsculas. Por lo tanto, el alcance del programa anterior aumenta de solo minúsculas a caracteres en mayúscula, pero no se pueden mezclar.

int val = str.charAt(i) - ''a'';

Sin embargo, quería escribir un programa genérico utilizando Bitwise Operation, que debería funcionar para cualquier carácter ASCII sin preocuparse por mayúsculas, minúsculas, números o cualquier carácter especial. Para hacer esto, nuestro "verificador" debe ser lo suficientemente grande como para almacenar 256 caracteres (tamaño de conjunto de caracteres ASCII). Pero una int en Java no funcionaría, ya que solo puede almacenar 32 bits. Por lo tanto, en el programa a continuación, estoy utilizando la clase BitSet disponible en JDK que puede tener cualquier tamaño definido por el usuario al crear una instancia de un objeto BitSet.

Aquí hay un programa que hace lo mismo que el programa anterior escrito utilizando el operador Bitwise, pero este programa funcionará para una Cadena con cualquier carácter del juego de caracteres ASCII.

public static boolean isUniqueStringUsingBitVectorClass(String s) { final int ASCII_CHARACTER_SET_SIZE = 256; final BitSet tracker = new BitSet(ASCII_CHARACTER_SET_SIZE); // if more than 256 ASCII characters then there can''t be unique characters if(s.length() > 256) { return false; } //this will be used to keep the location of each character in String final BitSet charBitLocation = new BitSet(ASCII_CHARACTER_SET_SIZE); for(int i = 0; i < s.length(); i++) { int charVal = s.charAt(i); charBitLocation.set(charVal); //set the char location in BitSet //check if tracker has already bit set with the bit present in charBitLocation if(tracker.intersects(charBitLocation)) { return false; } //set the tracker with new bit from charBitLocation tracker.or(charBitLocation); charBitLocation.clear(); //clear charBitLocation to store bit for character in the next iteration of the loop } return true; }

Leer la respuesta de Iván anterior realmente me ayudó, aunque lo expresaría de manera algo diferente.

El << in (1 << val) es un operador de cambio de bit. Toma 1 (que en binario se representa como 000000001 , con tantos ceros precedentes como desee / están asignados por la memoria) y lo desplaza a la izquierda por val espacios. Como solo asumimos az y restamos cada vez, cada letra tendrá un valor de 0 a 25, que será el índice de esa letra desde la derecha en la representación booleana del entero del checker , ya que moveremos el 1 a la izquierda en checker val veces.

Al final de cada control, vemos el operador |= . Esto combina dos números binarios, reemplazando todos los 0 con 1 si existe un 1 en cualquier operando en ese índice. Aquí, eso significa que siempre que exista un 1 en (1 << val) , ese 1 se copiará en el checker , mientras que se conservarán todos los 1 existentes del checker .

Como probablemente pueda adivinar, a 1 funciona aquí como una bandera booleana para verdadero. Cuando verificamos si un carácter ya está representado en la cadena, comparamos el checker , que en este punto es esencialmente una matriz de indicadores booleanos ( 1 valores) en los índices de los caracteres que ya han sido representados, con lo que es esencialmente un matriz de valores booleanos con una bandera 1 en el índice del carácter actual.

El operador & realiza esta comprobación. Similar al |= , el operador & copiará sobre un 1 solo si ambos operandos tienen un 1 en ese índice. Entonces, esencialmente, solo se copiarán las banderas ya presentes en el checker que también están representadas en (1 << val) . En este caso, eso significa que solo si el personaje actual ya ha sido representado habrá un 1 presente en cualquier parte del resultado de checker & (1 << val) . Y si hay un 1 en cualquier parte del resultado de esa operación, entonces el valor del booleano devuelto es > 0 y el método devuelve falso.

Esto es, supongo, por qué los vectores de bits también se llaman matrices de bits . Porque, aunque no son del tipo de datos de matriz, se pueden usar de forma similar a como se usan las matrices para almacenar indicadores booleanos.

Publicaciones anteriores explican bien lo que hace el bloque de código y quiero agregar mi solución simple usando la estructura de datos BitSet java:

private static String isUniqueCharsUsingBitSet(String string) { BitSet bitSet =new BitSet(); for (int i = 0; i < string.length(); ++i) { int val = string.charAt(i); if(bitSet.get(val)) return "NO"; bitSet.set(val); } return "YES"; }

También asumo que su ejemplo proviene del libro Cracking The Code Interview y mi respuesta está relacionada con este contexto.

Para usar este algoritmo para resolver el problema, debemos admitir que solo vamos a pasar caracteres de aa z (minúscula).

Como solo hay 26 letras y están correctamente ordenadas en la tabla de codificación que usamos, esto nos garantiza que todas las diferencias posibles str.charAt(i) - ''a'' serán inferiores a 32 (el tamaño del checker variables int) .

Como explicó Snowbear, estamos a punto de utilizar la variable de checker como una matriz de bits. Vamos a tener un enfoque con el ejemplo:

Digamos que str equals "test"

Primer pase (i = t)

corrector == 0 (00000000000000000000000000000000)

In ASCII, val = str.charAt(i) - ''a'' = 116 - 97 = 19 What about 1 << val ? 1 == 00000000000000000000000000000001 1 << 19 == 00000000000010000000000000000000 checker |= (1 << val) means checker = checker | (1 << val) so checker = 00000000000000000000000000000000 | 00000000000010000000000000000000 checker == 524288 (00000000000010000000000000000000)

Segundo pase (i = e)

corrector == 524288 (00000000000010000000000000000000)

val = 101 - 97 = 4 1 == 00000000000000000000000000000001 1 << 4 == 00000000000000000000000000010000 checker |= (1 << val) so checker = 00000000000010000000000000000000 | 00000000000000000000000000010000 checker == 524304 (00000000000010000000000000010000)

y así sucesivamente ... hasta que encontremos un bit ya establecido en el corrector para un personaje específico a través de la condición

(checker & (1 << val)) > 0

Espero eso ayude

Tengo la sospecha de que obtuviste este código del mismo libro que estoy leyendo ... El código aquí mismo no es tan críptico como los operadores- | =, & y << que normalmente no usan Nosotros, el profano, el autor no se molestó en tomarse el tiempo extra para explicar el proceso, ni en cuál es la mecánica real involucrada aquí. Estaba contento con la respuesta anterior sobre este hilo al principio, pero solo a nivel abstracto. Volví a hacerlo porque sentía que tenía que haber una explicación más concreta: la falta de uno siempre me deja con una sensación incómoda.

Este operador << es una palanca de desplazamiento izquierda que toma la representación binaria de ese número u operando y la desplaza en todos los lugares especificados por el operando o número a la derecha como en números decimales solo en binarios. Estamos multiplicando por la base 2, cuando nos movemos hacia arriba, sin embargo, hay muchos lugares que no son la base 10, por lo que el número de la derecha es el exponente y el número de la izquierda es un múltiplo base de 2.

Este operador | = toma el operando de la izquierda y o lo hace con el operando de la derecha, y este - ''&'' y los bits de ambos operandos a la izquierda y derecha de él.

Entonces, lo que tenemos aquí es una tabla hash que se almacena en un número binario de 32 bits cada vez que se obtiene or check ( checker |= (1 << val) ) con el valor binario designado de una letra, su bit correspondiente se está estableciendo en verdadero. El valor del carácter es and''d con el verificador ( checker & (1 << val)) > 0 ) - si es mayor que 0, sabemos que tenemos un dupe- porque dos bits idénticos configurados como true y''d juntos volverán verdadero o ''1'' ''.

Hay 26 lugares binarios, cada uno de los cuales corresponde a una letra minúscula; el autor dijo que asumía que la cadena solo contenía letras minúsculas, y esto se debe a que solo tenemos 6 lugares más (en el entero de 32 bits) que debemos consumir, y que obtener una colisión

00000000000000000000000000000001 a 2^0 00000000000000000000000000000010 b 2^1 00000000000000000000000000000100 c 2^2 00000000000000000000000000001000 d 2^3 00000000000000000000000000010000 e 2^4 00000000000000000000000000100000 f 2^5 00000000000000000000000001000000 g 2^6 00000000000000000000000010000000 h 2^7 00000000000000000000000100000000 i 2^8 00000000000000000000001000000000 j 2^9 00000000000000000000010000000000 k 2^10 00000000000000000000100000000000 l 2^11 00000000000000000001000000000000 m 2^12 00000000000000000010000000000000 n 2^13 00000000000000000100000000000000 o 2^14 00000000000000001000000000000000 p 2^15 00000000000000010000000000000000 q 2^16 00000000000000100000000000000000 r 2^17 00000000000001000000000000000000 s 2^18 00000000000010000000000000000000 t 2^19 00000000000100000000000000000000 u 2^20 00000000001000000000000000000000 v 2^21 00000000010000000000000000000000 w 2^22 00000000100000000000000000000000 x 2^23 00000001000000000000000000000000 y 2^24 00000010000000000000000000000000 z 2^25

Entonces, para una cadena de entrada ''azya'', a medida que avanzamos paso a paso

cadena ''a''

a =00000000000000000000000000000001 checker=00000000000000000000000000000000 checker=''a'' or checker; // checker now becomes = 00000000000000000000000000000001 checker=00000000000000000000000000000001 a and checker=0 no dupes condition

cadena ''az''

checker=00000000000000000000000000000001 z =00000010000000000000000000000000 z and checker=0 no dupes checker=z or checker; // checker now becomes 00000010000000000000000000000001

cadena ''azy''

checker= 00000010000000000000000000000001 y = 00000001000000000000000000000000 checker and y=0 no dupes condition checker= checker or y; // checker now becomes = 00000011000000000000000000000001

cadena ''azya''

checker= 00000011000000000000000000000001 a = 00000000000000000000000000000001 a and checker=1 we have a dupe

Ahora, declara un duplicado

Vamos a desglosar el código línea por línea.

int checker = 0; Estamos iniciando un corrector que nos ayudará a encontrar valores duplicados.

int val = str.charAt (i) - ''a''; Estamos obteniendo el valor ASCII del carácter en la posición ''i''th'' de la cadena y restando el valor ASCII de ''a''. Como la suposición es que la cadena tiene caracteres más bajos, el número de caracteres está limitado a 26. Hece, el valor de ''val'' siempre será> = 0.

if ((checker & (1 << val))> 0) return false;

verificador | = (1 << val);

Ahora esta es la parte difícil. Permite considerar un ejemplo con la cadena "abcda". Esto idealmente debería devolver falso.

Para iteración de bucle 1:

Comprobador: 00000000000000000000000000000000

val: 97-97 = 0

1 << 0: 00000000000000000000000000000001

corrector & (1 << val): 00000000000000000000000000000000 no es> 0

Por lo tanto, corrector: 00000000000000000000000000000001

Para iteración de bucle 2:

Comprobador: 00000000000000000000000000000001

val: 98-97 = 1

1 << 0: 00000000000000000000000000000010

corrector & (1 << val): 00000000000000000000000000000000 no es> 0

Por lo tanto, corrector: 00000000000000000000000000000011

Para iteración de bucle 3:

Comprobador: 00000000000000000000000000000011

val: 99-97 = 0

1 << 0: 00000000000000000000000000000100

corrector & (1 << val): 00000000000000000000000000000000 no es> 0

Por lo tanto, corrector: 00000000000000000000000000000111

Para iteración de bucle 4:

Comprobador: 00000000000000000000000000000111

val: 100-97 = 0

1 << 0: 00000000000000000000000000001000

corrector & (1 << val): 00000000000000000000000000000000 no es> 0

Por lo tanto, corrector: 00000000000000000000000000001111

Para iteración de bucle 5:

Comprobador: 00000000000000000000000000001111

val: 97-97 = 0

1 << 0: 00000000000000000000000000000001

corrector & (1 << val): 00000000000000000000000000000001 es> 0

Por lo tanto, devuelve falso.

int checker se utiliza aquí como un almacenamiento de bits. Cada bit en un valor entero puede tratarse como una bandera, por lo que eventualmente int es una matriz de bits (bandera). Cada bit en su código indica si el carácter con el índice del bit se encontró en cadena o no. Podría usar un vector de bits por la misma razón en lugar de int . Hay dos diferencias entre ellos:

Tamaño . int tiene un tamaño fijo, generalmente 4 bytes, lo que significa 8 * 4 = 32 bits (banderas). El vector de bits generalmente puede ser de diferente tamaño o debe especificar el tamaño en el constructor.
API . Con los vectores de bits, tendrá un código más fácil de leer, probablemente algo como esto:
vector.SetFlag(4, true); // set flag at index 4 as true

para int , tendrá un código de lógica de bit de nivel inferior:
checker |= (1 << 5); // set flag at index 5 to true

También probablemente, int puede ser un poco más rápido, porque las operaciones con bits son de muy bajo nivel y pueden ejecutarse tal cual por la CPU. BitVector permite escribir un código menos críptico en lugar de eso, además puede almacenar más banderas.

Para referencia futura: bit vector también se conoce como bitSet o bitArray. Aquí hay algunos enlaces a esta estructura de datos para diferentes idiomas / plataformas:

CPP: BitSet
Java: BitSet
C #: BitVector32 y BitArray

Explicación simple (con el código JS a continuación)

Una variable entera por código de máquina es una matriz de 32 bits
Todas las operaciones de bits son de 32-bit
Son independientes de la arquitectura del sistema operativo / CPU o del sistema numérico elegido del idioma, por ejemplo, DEC64 para JS.
Este enfoque de búsqueda de duplicación es similar al almacenamiento de caracteres en una matriz de tamaño 32 , donde establecemos el 0th índice si encontramos a en la cadena, 1st para b & y así sucesivamente.
Un personaje duplicado en la cadena tendrá su bit correspondiente ocupado, o, en este caso, establecido en 1.
Ivan ya explicó : cómo funciona este cálculo de índice en esta respuesta anterior .

Resumen de operaciones:

Realice la operación Y entre el checker y el index del personaje
Internamente, ambos son Int-32-Arrays
Es una operación poco sabia entre estos 2.
Verificar if el resultado de la operación fue 1
si output == 1
- La variable checker tiene ese bit index-th particular establecido en ambas matrices
- Por lo tanto, es un duplicado.
si output == 0
- Este personaje no se ha encontrado hasta el momento
- Realice una operación OR entre el checker y el index del personaje
- De este modo, actualizando el bit del índice-th a 1
- Asignar la salida al checker

Suposiciones

Suponemos que obtendremos todos los caracteres en minúsculas
Y, ese tamaño 32 es suficiente
Por lo tanto, comenzamos nuestro conteo de índice a partir de 96 como punto de referencia teniendo en cuenta que el código ascii para a es 97

A continuación se muestra el código fuente de JavaScript .

Tenga en cuenta que en JS, a pesar de que los enteros son de 64 bits, siempre se realiza una operación poco precisa en 32 bits.

Ejemplo: Si la cadena es aa entonces:

// checker is intialized to 32-bit-Int(0) // therefore, checker is checker= 00000000000000000000000000000000

i = 0

str[0] is ''a'' str[i].charCodeAt(0) - 96 = 1 checker ''AND'' 32-bit-Int(1) = 00000000000000000000000000000000 Boolean(0) == false // So, we go for the ''`OR`'' operation. checker = checker OR 32-bit-Int(1) checker = 00000000000000000000000000000001

i = 1

str[1] is ''a'' str[i].charCodeAt(0) - 96 = 1 checker= 00000000000000000000000000000001 a = 00000000000000000000000000000001 checker ''AND'' 32-bit-Int(1) = 00000000000000000000000000000001 Boolean(1) == true // We''ve our duplicate now

public static void main (String[] args) { //In order to understand this algorithm, it is necessary to understand the following: //int checker = 0; //Here we are using the primitive int almost like an array of size 32 where the only values can be 1 or 0 //Since in Java, we have 4 bytes per int, 8 bits per byte, we have a total of 4x8=32 bits to work with //int val = str.charAt(i) - ''a''; //In order to understand what is going on here, we must realize that all characters have a numeric value for (int i = 0; i < 256; i++) { char val = (char)i; System.out.print(val); } //The output is something like: // !"#$%&''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[/]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ //There seems to be ~15 leading spaces that do not copy paste well, so I had to use real spaces instead //To only print the characters from ''a'' on forward: System.out.println(); System.out.println(); for (int i=0; i < 256; i++) { char val = (char)i; //char val2 = val + ''a''; //incompatible types. required: char found: int int val2 = val + ''a''; //shift to the ''a'', we must use an int here otherwise the compiler will complain char val3 = (char)val2; //convert back to char. there should be a more elegant way of doing this. System.out.print(val3); } //Notice how the following does not work: System.out.println(); System.out.println(); for (int i=0; i < 256; i++) { char val = (char)i; int val2 = val - ''a''; char val3 = (char)val2; System.out.print(val3); } //I''m not sure why this spills out into 2 lines: //EDIT I cant seem to copy this into ! System.out.println(); System.out.println(); //So back to our original algorithm: //int val = str.charAt(i) - ''a''; //We convert the i''th character of the String to a character, and shift it to the right, since adding shifts to the right and subtracting shifts to the left it seems //if ((checker & (1 << val)) > 0) return false; //This line is quite a mouthful, lets break it down: System.out.println(0<<0); //00000000000000000000000000000000 System.out.println(0<<1); //00000000000000000000000000000000 System.out.println(0<<2); //00000000000000000000000000000000 System.out.println(0<<3); //00000000000000000000000000000000 System.out.println(1<<0); //00000000000000000000000000000001 System.out.println(1<<1); //00000000000000000000000000000010 == 2 System.out.println(1<<2); //00000000000000000000000000000100 == 4 System.out.println(1<<3); //00000000000000000000000000001000 == 8 System.out.println(2<<0); //00000000000000000000000000000010 == 2 System.out.println(2<<1); //00000000000000000000000000000100 == 4 System.out.println(2<<2); // == 8 System.out.println(2<<3); // == 16 System.out.println("3<<0 == "+(3<<0)); // != 4 why 3??? System.out.println(3<<1); //00000000000000000000000000000011 == 3 //shift left by 1 //00000000000000000000000000000110 == 6 System.out.println(3<<2); //00000000000000000000000000000011 == 3 //shift left by 2 //00000000000000000000000000001100 == 12 System.out.println(3<<3); // 24 //It seems that the - ''a'' is not necessary //Back to if ((checker & (1 << val)) > 0) return false; //(1 << val means we simply shift 1 by the numeric representation of the current character //the bitwise & works as such: System.out.println(); System.out.println(); System.out.println(0&0); //0 System.out.println(0&1); //0 System.out.println(0&2); //0 System.out.println(); System.out.println(); System.out.println(1&0); //0 System.out.println(1&1); //1 System.out.println(1&2); //0 System.out.println(1&3); //1 System.out.println(); System.out.println(); System.out.println(2&0); //0 System.out.println(2&1); //0 0010 & 0001 == 0000 = 0 System.out.println(2&2); //2 0010 & 0010 == 2 System.out.println(2&3); //2 0010 & 0011 = 0010 == 2 System.out.println(); System.out.println(); System.out.println(3&0); //0 0011 & 0000 == 0 System.out.println(3&1); //1 0011 & 0001 == 0001 == 1 System.out.println(3&2); //2 0011 & 0010 == 0010 == 2, 0&1 = 0 1&1 = 1 System.out.println(3&3); //3 why?? 3 == 0011 & 0011 == 3??? System.out.println(9&11); // should be... 1001 & 1011 == 1001 == 8+1 == 9?? yay! //so when we do (1 << val), we take 0001 and shift it by say, 97 for ''a'', since any ''a'' is also 97 //why is it that the result of bitwise & is > 0 means its a dupe? //lets see.. //0011 & 0011 is 0011 means its a dupe //0000 & 0011 is 0000 means no dupe //0010 & 0001 is 0011 means its no dupe //hmm //only when it is all 0000 means its no dupe //so moving on: //checker |= (1 << val) //the |= needs exploring: int x = 0; int y = 1; int z = 2; int a = 3; int b = 4; System.out.println("x|=1 "+(x|=1)); //1 System.out.println(x|=1); //1 System.out.println(x|=1); //1 System.out.println(x|=1); //1 System.out.println(x|=1); //1 System.out.println(y|=1); // 0001 |= 0001 == ?? 1???? System.out.println(y|=2); // ??? == 3 why??? 0001 |= 0010 == 3... hmm System.out.println(y); //should be 3?? System.out.println(y|=1); //already 3 so... 0011 |= 0001... maybe 0011 again? 3? System.out.println(y|=2); //0011 |= 0010..... hmm maybe.. 0011??? still 3? yup! System.out.println(y|=3); //0011 |= 0011, still 3 System.out.println(y|=4); //0011 |= 0100.. should be... 0111? so... 11? no its 7 System.out.println(y|=5); //so we''re at 7 which is 0111, 0111 |= 0101 means 0111 still 7 System.out.println(b|=9); //so 0100 |= 1001 is... seems like xor?? or just or i think, just or... so its 1101 so its 13? YAY! //so the |= is just a bitwise OR! } public static boolean isUniqueChars(String str) { int checker = 0; for (int i = 0; i < str.length(); ++i) { int val = str.charAt(i) - ''a''; //the - ''a'' is just smoke and mirrors! not necessary! if ((checker & (1 << val)) > 0) return false; checker |= (1 << val); } return true; } public static boolean is_unique(String input) { int using_int_as_32_flags = 0; for (int i=0; i < input.length(); i++) { int numeric_representation_of_char_at_i = input.charAt(i); int using_0001_and_shifting_it_by_the_numeric_representation = 1 << numeric_representation_of_char_at_i; //here we shift the bitwise representation of 1 by the numeric val of the character int result_of_bitwise_and = using_int_as_32_flags & using_0001_and_shifting_it_by_the_numeric_representation; boolean already_bit_flagged = result_of_bitwise_and > 0; //needs clarification why is it that the result of bitwise & is > 0 means its a dupe? if (already_bit_flagged) return false; using_int_as_32_flags |= using_0001_and_shifting_it_by_the_numeric_representation; } return true; }