Privacy and Ethics

You are starting to get an idea of how value is added from business data analytics. But recent history has taught us that diving too deeply into the remarkable capabilities of machine learning and analytics without balancing these strategies against a resolve for consumer privacy and ethical data management can be a costly mistake.

Privacy and ethics build consumer trust and enhance the reputation of a business. When consumers believe their data is handled responsibly, they are more likely to engage with the business. This is one reason that companies clearly communicate their data use policies in letters and emails.

Adhering to privacy laws and regulations is mandatory. Non-compliance can result in significant fines, legal actions, and loss of business. There have been over 4 billion (USD) in GDPR (discussed next) fines (https://www.eqs.com/compliance-blog/biggest-gdpr-fines/) thus far with Facebook (Meta) getting the top two.

Besides honoring the law, ethical data practices are about doing what is right. This includes obtaining explicit consent, minimizing data collection, and ensuring data security. There have always been unintended consequences of data collection as will be discussed more later. Data should only be used for the sole purpose that it was collected. Companies must avoid misusing sensitive information.

Importantly, poor and unethical data practices can be exposed when data breaches inevitably happen. If hackers break into a company database, and the world sees their lack of ethics, they lose customer trust and suffer massive financial losses.

To hammer these principles home, let's review relevant privacy regulations and examples of (un)ethical data uses.

Privacy Regulation

In the United States, health data is protected by the Health Insurance Portability and Accountability Act (HIPAA). It apples to healthcare providers, insurers, and their business associates in the US. HIPAA protects sensitive patient health information from being disclosed without the patient's consent or knowledge.

The Children's Online Privacy Protection Act (COPPA) applies to websites and online services directed at children under the age of 13, and to operators of other websites or online services that knowingly collect personal information from children under 13. COPPA requires parental consent for the collection or use of personal information from children. It mandates privacy policies and practices to protect children’s privacy and safety online.

The Family Educational Rights and Privacy Act (FERPA) is a federal law in the United States that protects the privacy of student education records. The law applies to all schools that receive funds under an applicable program of the U.S. Department of Education.

The Gramm-Leach-Bliley Act (GLBA) applies to financial institutions, including banks, insurance companies, and investment firms. It requires financial institutions to explain their information-sharing practices to their customers and to safeguard sensitive data. Includes the Financial Privacy Rule and the Safeguards Rule.

The Fair Credit Reporting Act (FCRA) applies to consumer reporting agencies, users of consumer reports, and furnishers of information to consumer reporting agencies. It regulates the collection, dissemination, and use of consumer information, including credit information. Ensures accuracy, fairness, and privacy of information in consumer reporting.

The Electronic Communications Privacy Act (ECPA) applies to electronic communications service providers and users. It protects wire, oral, and electronic communications while being made, in transit, and when stored on computers. Includes the Wiretap Act, the Stored Communications Act, and the Pen Register Act.

The Driver’s Privacy Protection Act (DPPA) applies to state Departments of Motor Vehicles (DMVs) and those who have access to DMV records. It restricts the disclosure and use of personal information obtained from motor vehicle records, with certain exceptions.

These are only a few of the US federal laws regarding data privacy. There are many, many state level laws governing the collection, management, and sharing of personal data that companies must be aware of, and adhere to, if they operate in those states.

If companies want to do business outside of the US, there are many more regulations to be aware of. Most significantly, the General Data Protection Regulation (GDPR) applies to all European Union (EU) member states and organizations handling the data of EU residents. It is a comprehensive set of regulations covering data protection principles, rights of data subjects, and obligations of data controllers/processors. Similar regulations exist in Canada with their Personal Information Protection and Electronic Documents Act (PIPEDA), Brazil with their General Data Protection Law (LGPD), and Australia with the Privacy Act.

Ethics of Analytics

As state above, there are ethical reasons to use data genuinely (i.e. the way it was intended) and carefully to minimize or even eliminate the potential for unintended negative consequences. One of the best examples comes from the unexpected bias from machine learning algorithms. It will be worth your time to revisit this issue once you understand how machine learning works. But essentially, if the training data used in the ML pipeline reflects historical inequalities or biases, the ML models will learn and replicate these biases. For example, if past hiring practices were biased against certain groups, an ML system trained on this data might continue to favor those same groups. In other words, if minority groups are underrepresented in the training data, the ML models may not perform well for these groups. This can lead to poorer outcomes or unfair treatment for those underrepresented. This mistake has actually occured with companies like Amazon, LinkedIn, and many others. These companies took steps to eliminate the problem, but a certain amount of damage is inevitabley done.

Bias can also occur based on the algorithm used to make predictions. For example, Apple received criticism because their credit card favored men over women. This was not intentional. It occured because the firm they hired to build their credit approval process included gender as a data feature predicting the likelihood of credit card payments. The dataset they trained their predictive model with happened to indicate that women were less likely to pay back their credit cards. However, this is not true of the entire population; it was a random occurance in their dataset. They could have avoided this problem simply by not including gender as a data feature in their predictive model--regardless of their biased training dataset.

Bias can also occur due to feedback loops. ML systems that learn from user interactions can reinforce existing biases. For example, a recommendation system that suggests popular content might end up amplifying dominant cultural norms while sidelining minority voices. ML specialists must take steps to ensure that new, "minority" content is included in recommendations.

Finally, bias can occur simply because the ML development teams themselves lack diversity. If everyone working on a project comes from the same background, they are more likely to have similar ideas less likely to produce creative outcomes that succeed in the market. This finding has been supported by a wide variety of academic research and is worth looking into further on your own.

Ask Copilot or ChatGPT

If you want to review the research on project team diversity, try asking some generative AI, "Can you cite research showing that devleopment teams that are more diverse produce better outcomes?"

Examples of Bias

Here are a few real world examples of unintentional bias from data-driven ML systems:

  1. Criminal Justice System (COMPAS): COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is an algorithm used to assess the risk of recidivism.

    • Bias: A 2016 investigation by ProPublica found that COMPAS was biased against African-Americans. The algorithm was more likely to incorrectly label black defendants as high-risk compared to white defendants.
    • Impact: This bias can lead to unfair sentencing and parole decisions, disproportionately affecting African-American individuals.
  2. Hiring Algorithms: Various companies have used ML algorithms to screen job applicants.

    • Bias: Amazon’s recruiting tool, which was trained on resumes submitted over a 10-year period, exhibited bias against women. The tool downgraded resumes that included the word "women’s" and favored resumes similar to those of the predominantly male applicant pool.
    • Impact: Such biases can lead to fewer employment opportunities for women and other underrepresented groups.
  3. Facial Recognition Technology: Facial recognition systems are used in various applications, from law enforcement to smartphone unlocking.

    • Bias: Studies, such as one by the National Institute of Standards and Technology (NIST), have shown that facial recognition algorithms have higher error rates for people with darker skin tones. For example, the systems were found to be less accurate in identifying African-American and Asian faces compared to Caucasian faces.
    • Impact: This can lead to misidentification and potential discrimination in surveillance, policing, and other areas.
  4. Healthcare Algorithms: Predictive algorithms used in healthcare to allocate resources and predict patient outcomes.

    • Bias: A study published in Science in 2019 found that an algorithm used to predict which patients would benefit from additional healthcare resources was biased against black patients. The algorithm relied on healthcare costs as a proxy for health needs, which resulted in underestimating the health needs of black patients who typically incurred lower healthcare costs due to systemic barriers.
    • Impact: This bias can lead to unequal access to healthcare services and poorer health outcomes for minority groups.
  5. Mortgage Lending: Algorithms used by financial institutions to determine creditworthiness and approve mortgage applications.

    • Bias: A 2018 study by the National Bureau of Economic Research found that African-American and Hispanic borrowers were more likely to be charged higher interest rates and have their loan applications rejected compared to white borrowers, even when controlling for financial factors.
    • Impact: This leads to systemic financial inequalities, making it harder for minorities to own homes and build wealth.

Eliminating Bias

Okay, so we're starting to understand how unintended biases can occur and the need for proactive ethical thinking in our business data analytics. But how do we systematically reduce or eliminate potential biases when we don't always see them coming? Mistakes may happen, but let's summarize everything we have already learned to be as forward-thinking as possible:

  1. Diverse and Representative Data: Ensure that training datasets are diverse and representative of all possible groups that the information system is designed for. This involves actively seeking out and including data from minority groups.

  2. Bias Detection and Correction: Implement techniques to detect and correct biases during the model development process. This includes fairness metrics and bias correction algorithms. It also means ignoring or intentionally leaving out data features that can lead to bias (e.g. gender, ethnicity, age, disabilities, sexual orientation, religion, political affiliation, and others) unless there is a justifiable reason to keep them.

  3. Inclusive Development Teams: Foster diversity within development teams to bring multiple perspectives to the design and testing phases, helping to identify and mitigate biases.

  4. Transparency and Accountability: Increase transparency around how ML models are built and how decisions are made. This can include documenting data sources, model parameters, and potential biases.

  5. Continuous Monitoring: Regularly monitor deployed models for bias and unfair outcomes, and update them as necessary to ensure they remain fair and equitable.