2020-01-30

So what is (digital) Identity?


What is identity is a philosophical question that's foundational in many areas of study for centuries. There are quite literally thousands of books on the subject with amazon showing over 60,000 hits. By definition, identity is “The distinguishing character or personality of an individual” according to Merriam-Webster. Something “distinguishing” means there is some single or grouping of “attributes” that can be used to separately identify one object from another. Therefor to be distinguishing, it needs to be both measurable and known attributes.

A measurable attribute can involve 1 or more inputs to your 5 senses of sight, sounds, smell, taste or touch in incredibly complex combinations. It could also be calculated or known. Aka they always order vanilla or employees over 1 year. It can also be in context or relation. The one you spoke with or the one behind that one. Just like a visually impaired person may prefer to use voice tone to distinguish the identity of someone over the shape of their nose, digital identity generally places different values on distinguishing characteristics than the physical world. For decades security folks have used something you are, have or know as the basis of authentication, but there are many contextual (e.g. in the office), calculated (e.g. PKI) and possibly other attributes that could be used.
 
The usual nature of identity is that these same attributes that make us unique, can be used to group individuals. Grouping users by attribute allows for simplification of policy and assignment. Along with grouping comes the risk of assumed bias. This can be to the group as a whole or the understanding of the grouping. 
For example, I may group all people in over 5 feet tall or under 5 feet tall (not very practical perhaps). But is this sitting down or standing up, fully grown or include those still growing (how is that measured??). I may also have a personal unreasonable fear of anyone over 5 feet tall, resulting in this attribute to be used as a general negative to anyone in this category. Treating 1000,000 customers as 100% unique, maybe impractical, but grouping them too widely, by the incorrect attribute or with bias would lessen the value of the grouping.

Identity is not only about humans, as we use the same principals of using distinguishing characteristics to identify any object. It is a dog, that black cat, the yellow bike, the house next to mine. Digitally, it can be extended to applications, servers, physical devices, buildings, virtual devices, fields, data, IP addresses and anything else you may need to reference. Identity is used for everything, either to include or exclude it, individually or as a group.

Even in the physical world, there are many opportunities to mistake someone's identity, when there may be an absence in the accuracy or number of distinguishing attributes. You may see a shape in a dark living room, and due to the lack of available attributes and a dose of bias, assume it is your spouse. The human shape, the height, smell, location, context (since your spouse stood there last), experience (it's usually your spouse in the living room) leading you to assume incorrectly. Only to discover the second the person spoke (new attribute) that it was actually not your spouse, but their parent. Whew, good thing you did not mention last Thanksgiving turkey again! “Identity assurance”, is used to express the level of confidence that you have correctly identified the person by validating the known attributes that make up their identity. 

While in the physical world, we often have thousands of attributes available, in the digital world this is often reduced to a few fields of data (see the Identity Bottleneck). Companies often expend immense effort in knowing attributes about their customers, employees, market, and partners. Every day you hear of huge digital projects and hype around analytics, big data, AI, building supply chains or building their brand. Only to then use a username and single secret (e.g. pa$$w0rd) as the only attribute before giving access. This is probably a relic from the first computer systems, that assumed attributes from context. If Bob is one of 20 users on the mainframe and uses terminal 5. We just need an attribute to know out of the 20 users, that this is Bob. In today’s world, there are billions of connected people, accessing systems scattered across the world from multiple devices, yet for the most part, the industry still relies on the same approach to identify a user. It worked for 50 years after-all – well at least kind of worked, as few today have not wondered why technology battles with security incidents like they do.

Identifying attributes change at different rates. The one you own may be the one I lost tomorrow, but my name probably remains the same for some period of time. I am logged in from the office, then go home and continue to work. If location and IP address were used as attributes to identify the user, although the user's identity did not change, the digital system may not be aware that it is the same user. Rapidly changing attributes can also be red flags, for example, Anne accessed one system at 4pm from Toronto and another at 5pm from Moscow. From these examples, we can see that the change in attributes, can be used as an attribute in identity. Changes in attributes can also kick-off events or tasks. A change in Anne's employment status attribute could trigger the removal of access. A change in IP address could trigger the need to re-authenticate.
In our real lives, we require different levels of identity assurance for different tasks. Anti-money laundering policies require banks to “Known Your Customer” (KYC) and you may need an ID if you near of age to get into a bar. You would probably bulk though if someone needed to know more than that you are a paying customer when buying coffee (how many use a fake name at Starbucks?). Depending on the risk of access, the level of identity assurance may vary. NIST special publication 800-63 spends a considerable about of effort discussing this idea of adapting identity assurance levels, depending on what is being accessed.

Balancing privacy with identity assurance may seem in conflict, but this is were the selecting and storing of the attributes and the relationship with the user become important. Whenever I do an assessment, I like to ask what the organization uses for an Identity Book of Record (IBoR) and what attributes are stored for each identity. So often the response is the HR system or Active directory. In these cases the “HR System” is usually a basic report sent to security, strictly scrubbed for privacy, with a couple of fields, many which are useless to establish digital identity. Active directory is usually, outdated, partial and credentials, not identities. With anonymous dating profiles often having more stored identifying attributes than IBoR’s! With HR, marketing, sales, asset management, service catalogue etc all operating in isolation to IT and security, how can anyone be surprised when malicious parties take advantage?

What are you doing to build a more capable IBoR and attributes are you collecting for your users to improve your identity assurance levels?

No comments: