The Dark side of JS (part 2“Data Types”)
In these series of posts, I want to tell you about the most mystifying parts of the old good JavaScript. This part is dedicated to types in JavaScript.
Type System Overview
You may think that there are no types in JS, well there are. You don’t use any type definitions in vanilla JS, and you can write your code without any bothering regarding data types:
It might seem that JS doesn’t care about data types at all, but it does. JavaScript analyses the code before running and defines types under the hood. Actually JS is dynamically typed language so types can change during the script lifecycle. But more on that later.
So how JS understands which type has one or another variable? It’s pretty simple while you don’t mix types. It looks on some hints e.g. braces, quotes etc. and makes decisions if there are quotes then probably it’s a string if there are curly braces then it’s an object, and so on.
Saying more precise JS both dynamically typed and weakly typed. So what does it mean?
We said earlier that JS could change types of the variables during the script lifecycle, here is an example:
As you can see here, we are creating someNumber
and checking it’s type on the next line. As you might expect, it says"number"
. Let’s then create a someString
variable and assign to someNumber
its current value (of type number) plus someString
's value (of type string). As a result we’ll have in someNumber
variable string "128"
and it’s because of JS dynamic typing system. So how it works? It has rules for each case. In this particular situation we have sum of two variables one of which is a number
and second one a string
. When you are trying to sum up number and string, JS runtime will convert number to a string and will make a concatenation of two resulting strings. Pretty simple, right? But it’s not the case when we are trying to make more exotic operations like extracting string
from a number
or vice versa.
In case of subtracting number
from a string
JS attempts to convert string
to a number
using ToNumber
(it’s an embedded method in JS runtime) . If you take a look to the EcmaScript specification you’ll see the following rule:
ToNumber applied to Strings applies the following grammar to the input String interpreted as a sequence of UTF-16 encoded code points (6.1.4). If the grammar cannot interpret the String as an expansion of StringNumericLiteral, then the result of ToNumber is NaN.
That means that if your string
is convertible to a number
then it will be converted to a number
otherwise you’ll end up with a NaN
(special data type which we’ll discuss later).
I think that is pretty clear in situation when conversion to a number
was successful: we just subtract one number
from another. But in the case when conversion was unsuccessful we are trying to subtract string
from NaN
or vice versa. For this case there is a simple rule in standard:
If either operand is NaN, the result is NaN.
Actually, this rule also applies to additive operation as the below rules as well, since the subtraction operation a — b
is the same as a + (-b)
(just like it is in math !).
The sum of two infinities of opposite sign is NaN.
The sum of two infinities of the same sign is the infinity of that sign.
The sum of an infinity and a finite value is equal to the infinite operand.
The sum of two negative zeroes is -0. The sum of two positive zeroes, or of two zeroes of opposite sign, is +0.
The sum of a zero and a nonzero finite value is equal to the nonzero operand.
The sum of two nonzero finite values of the same magnitude and opposite sign is +0.
In the remaining cases, where neither an infinity, nor a zero, nor NaN is involved, and the operands have the same sign or have different magnitudes, the sum is computed and rounded to the nearest representable value using IEEE 754–2008 round to nearest, ties to even mode. If the magnitude is too large to represent, the operation overflows and the result is then an infinity of appropriate sign. The ECMAScript language requires support of gradual underflow as defined by IEEE 754–2008.
(from EcmaScript Standard)
As you can see JS does care about types and makes it in a smart way. More about other operations you can read in the spec.
JS Types
As you may know JS has 7 types:
- Number — any numeric and a NaN
- String — any string value
- Boolean —
true
orfalse
- Undefined —
undefined
- Symbol — specific type for objects created with Symbol constructor these object represent unique value
- Object — everything else in JS
- Null* — following the spec it is a separate type which represents exactly one value
null
however if you try to get type ofnull
withtypeof
then you’ll get"object"
more on that you can get by this link.
Almost all of the types that are listed above are primitives except an Object
. So what does it mean? It means that these types are neither has methods nor properties. But you could argue with me and say ”Hey what about this code?”:
Here you can see that we do have split
method on primitive string
value. How is that? It is the under-the-hood work of JavaScript. In runtime all of the primitive literals (literal expressions) that are using some methods of it’s corresponding object type, are getting wrapped with it’s constructor implicitly. As a result, we have non-primitive values with properties and methods. In other words the code above gets transformed in something like this by interpreter:
We’ve mentioned one more interesting word above which is “literal”. You can think about it as literally defined value. Here is an example of a few literals:
As you can see literal is nothing but values of certain type written literally as they are.
Number
Number
in JS is used to represent numerical values e.g. (11, 3.14, 0x16, 124e10, 421e-5, Infinity, NaN, etc). As you may notice those values are more than just numbers. There are some special values like Infinity or NaN. Unlike many other languages in JavaScript numbers are always stored as 64-bits Floating Point values. Where first 52 bits are storing number (the fraction), bits 52 to 62 — the exponent and 63rd bit is storing a sign.
While you are working with integers in JS it’s all good, actually while your integers consist from less than 16 digits (this limitation comes from the way how numbers are stored), but when it comes to numbers with point then the fun part begins:
This example shows how floating point numbers are working in JS. That strange behaviour has been explained in details here, but long story short it’s not a problem of a JS itself rather it’s a problem of the computers in general and in the way how they are storing numbers in their memory. To fix that as you can see you could just multiply all numbers that you sum up to ten and then divide result of the addition to 10.
Strings
Strings are quite simple in JS. Their main purpose is to represent some textual data. To create a string primitive you can use "here is your string"
, 'here is another string'
, `here is a template ${"string"}`
or String("here is also a primitive string")
. Notice that the expression String("some string")
returns a primitive string not a string object. To get a string object you should use new
operator like with any other object: new String("now it's an object")
. Basically there is no difference between string primitive and string object in everyday use. But you should understand that there are two types of strings in JS.
You may be wondering why do we have two different types of strings. Why can’t we just have all of our strings as objects? The answer is very simple — objects are memory consuming. JS prefers to keep strings as primitives and converts them to objects on demand e.g. calling some methods or accessing some property. The same concept is applied to other primitives in JS.
Special types and values NaN, null, undefined
Let’s start here from NaN because it’s not a separate type, it’s just a special value of number
type (alike Infinity). It might be counterintuitive because NaN literally means “Not-a-Number” but it has a number
type. Well, try to think about it in this way: You could only get NaN when you trying to make mathematical operations that supposed to be done with number
using any different from number
data types (subtracting string
from a number
, multiplying string
by string
, etc.).
As you already know null is a separate data type in JS which represents link to an empty value. null is frequently used to identify that something is absent, for example if you’ll try to query some element from DOM which doesn’t exist at the moment the result of such query will be null.
Definition of the undefined is similar to null in some way, probably that’s why people often don’t understand the difference between them. The only purpose undefined is to say that the variable hasn’t been yet initialied. You can see this behaviour if you’ll try to get access to an object property that doesn’t exist:
Also an easy way to get an undefined
. Is to try to get access to a var
defined variable before it was initialized or/and defined:
Object
Now it’s time to talk about the main type in the JS which (IMO) is Object
.This is my personal opinion based on the fact that all other non-primitive data types are inheriting it. And thanks toObject
any other non-primitive data type has toString
and valueOf
methods. Using these methods you can do some magic tricks:
Cool, right? Actually, this is one of the things that interpreter does when you try to add primitive to a non primitive value. It calls the valueOf
method of an additive entity. By default valueOf
returns a link to the object itself. toString
is a second candidate to be called if valueOf
wasn’t redefined in the descender.
As you can see above we created an object that has implementation of toString
method and doesn’t have valueOf
. So interpreter tries first to get primitive value of an object, calling it’s valueOf
method sees that it returns links to an object itself (it’s not really useful when we trying to make operations with primitive) and then calls toString
which in our case is implemented. BTW, by default toString
returns "[object Object]"
string so it’s very unlikely that you end up with some error. But for the sake of experiment lets try to return this
from toString
not implementing the valueOf
method.
In this case, the interpreter breaks down because it cannot coerce magicObject
to a primitive type.
Array
Array
s in JavaScript are objects. Exotic objects to be exact.
Ordinary objects are the most common form of objects and have the default object semantics. An exotic object is any form of object whose property semantics differ in any way from the default semantics.
(from EcmaScript Standard)
Since Array
s are just objects you may be wondering why am I putting them as a separate type. I want to talk about them as a separate type because we are using them more than any other data type in JS. And knowing it’s characteristics will help you to make your programs better.
Arrays are commonly used in JS to represent collection of values which usually (but not mandatory) has the same type. Arrays has a lot of useful methods, that are making it a bit more than just a regular Array
in other programming languages. For example methods shift/unshift
, pop/push
aren’t the methods of the classic Array
they are the methods of a Queue
and a Stack
. Moreover, JS arrays are dynamic by default and you can change their length on the fly. If you worked with arrays in languages like Java or C/C++ you know that you cannot do such thing there. To be precise JavaScript’s Array
s are called “Deque” in computer science. And it is much more powerful than a regular Array
in some other programming languages. But this power has its price. You should be careful when you’re working with arrays and follow some unwritten rules, since JS engines are making a lot of optimisations under the hood to achieve the best performance possible with Array
s. For example the engine tries to store arrays’ data in one contiguous memory area so that is why you should avoid working with arrays as you work with regular objects e.g. accessing non-numerical properties or start filling array with holes:
As you can see Arrays
in JS is not as simple as they may look at first glance and you definitely should know their features to use them wisely.
Symbols
Symbols — from my perspective is the most mysterious type in JS. Let’s try to figure out what it’s for and how is could be applied.
The Symbol type is the set of all non-String values that may be used as the key of an Object property.
Each possible Symbol value is unique and immutable.
Each Symbol value immutably holds an associated value called [[Description]] that is either undefined or a String value.
(from EcmaScript Standard)
So basically Symbols
are unique values in your JS program that help you define unique properties in objects protecting them from accidental overriding.
You can create a Symbol
as follows
Here we are creating two symbols that have same description but they aren’t equal. Description is used just to give to the Symbol
some human readable value, but that value doesn’t identify the Symbol
and we can’t retrieve the Symbol
by it’s description.
Symbols are designed to be a unique value across the whole app. It might become useful when you are using some data structure from some library and you need to add a field to that data structure and to be sure that the field won’t break anything.
JS provides us with predefined Symbols
they are called Well-Known Symbols. JS knows about them when tries to make some operations against data structures:
Example above might look quite complex.
At the very top we have function called iter
. What it does is just extracts all keys from this
into array, defines a counter variable and returns an object. Lets take a deeper look at the object that we return here. It’s a literal object that has a property which is called next
. next
is a function that is bound to the outer context (Spoiler: this
will point to the obj
object). The main goal of the next
function is to return an object to us with properties value
and done
.
After declaration of the function we create an obj
object with three properties. And after that the main step of the example — declaring of the [Symbol.iterator]
property. Symbol.iterator
is one of the well-known Symbols.
And the last step in the example is iterating through the obj
object. I think that is pretty straightforward.
So why have we added that magic Symbol.iterator
property? The thing is that when we iterating through any data structure, JS will look for Symbol.iterator
property inside this data structure and this property should be a function that follows convention regarding iterating of the data structure. Convention is pretty simple. The function has to return an object with next
property which is function that returns an object with two properties value
and done
. value
represents the value on which we currently iterating. done
if we reached the end and there are no more values in this structure. If our iterator function follows the convention then iterator e.g. for ... of
will handle all these internal calls for us. As you can see in the gist example above we will get each value of our obj
object in the var value
variable one by one.
There are plenty of other well-known Symbols
that you can use.
Let’s get back to the example with custom field in library’s data structure. What if we want to define type with ability to refer to it in another module:
Here we used code from one of our previous examples and added export of modified AwesomeDataStructure
.
We don’t able to get access to the property that was defined using Symbol
. (Remember that Symbols are unique across the app and you cannot create the same Symbol twice or more).
How could we handle this situation? There is functionality for that in JS. It’s called Symbol Registry:
Here we created our Symbol
inside of the Symbol Registry
it’s a registry which available across the whole app and it keeps all created Symbols
inside of it. So now we can do something like this:
In this case we are able to get access to our field.
In this article we’ve reviewed all JS types and took a deeper look on how they work. Hope that was helpful. See you in the next articles. Thanks!