> For the complete documentation index, see [llms.txt](https://www.csprinciples.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://www.csprinciples.com/big-idea-2/2.3-extracting-information-from-data.md).

# 2.3 Extracting Information from Data

## Enduring Understanding

Programs can be used to process data, which allows users to discover information and create new knowledge.

## Learning Objective

Describe what information can be extracted from data. 

## Essential Knowledge

Information is the collection of facts and patterns extracted from data.

Data provide opportunities for identifying trends, making connections, and addressing problems.

Digitally processed data may show correlation between variables. A correlation found in data does not necessarily indicate that a causal relationship exists. Additional research is needed to understand the exact nature of the relationship.

Often, a single source does not contain the data needed to draw a conclusion. It may be necessary to combine data from a variety of sources to formulate a conclusion.

## Learning Objective

Describe what information can be extracted from metadata.

## Essential Knowledge

Metadata are data about data. For example, the piece of data may be an image, while the metadata may include the date of creation or the file size of the image.

Changes and deletions made to metadata do not change the primary data.

Metadata are used for finding, organizing, and managing information.

Metadata can increase the effective use of data or data sets by providing additional information.

Metadata allow data to be structured and organized.

## Learning Objective

Identify the challenges associated with processing data.

## Essential Knowledge

The ability to process data depends on the capabilities of the users and their tools.

Data sets pose challenges regardless of size, such as:

* the need to clean data
* incomplete data
* invalid data
* the need to combine data source

Depending on how data were collected, they may not be uniform. For example, if users enter data into an open field, the way they choose to abbreviate, spell, or capitalize something may vary from user to user.

Cleaning data is a process that makes the data uniform without changing their meaning (e.g., replacing all equivalent abbreviations, spellings, and capitalizations with the same word).

Problems of bias are often created by the type or source of data being collected. Bias is not eliminated by simply collecting more data.

The size of a data set affects the amount of information that can be extracted from it.

Large data sets are difficult to process using a single computer and may require parallel systems.

Scalability of systems is an important consideration when working with data sets, as the computational capacity of a system affects how data sets can be processed and stored.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://www.csprinciples.com/big-idea-2/2.3-extracting-information-from-data.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
