Introduction to Python for Data Science

Numbers, strings, lists

Numbers (both integer and fractional), strings and Lists are the fundamental data types in Python. These data types are key to understanding the language and are used in most code examples in guides and tutorials. Knowing these basic types will help you write and optimize Python code effectively.

Fractal numbers in Python are written using a period, unlike integers. For example, 23.0 denotes a fractional number, while 23 is an integer. Strings are sequences of letters, symbols, and numbers that are placed in single or double quotes. Lists, on the other hand, consist of multiple items separated by commas and enclosed in square brackets. Understanding these basics is key to working with data in Python.

Let's start by running the command to print the string "Hello, Skillbox." Enter the following command in a cell in your Google Colab notebook:

Then click the button.

either the "Run code" triangle to the left of the cell;
or the key combination Ctrl+Enter.

In either case, the code in the cell will be executed. In this example, the code in the notebook cell displays the string "Hello, Skillbox." The expected result will look like this:

There are two main types of cells in notebooks: "code" and "text". We just activated the code cell. Text cells are used to add explanations, headings, or tables of contents to your notebook. These cells use Markdown markup, which allows you to format text. For more information on Markdown markup, please see the dedicated cheat sheet.

Practice creating and deleting cells with code and text. Use code cells as a calculator: enter expressions such as 2 + 2, 3 * 5 (where the asterisk denotes multiplication), or 5 / 20, and execute the cell. Finally, rename your notebook from Untitled0.ipynb to The_Greatest_DS_Project_Ever.ipynb or choose a more modest name.

Variables in Python are named objects used to store values or data. They function as containers into which we place information, similar to how we place objects in a labeled box. Variables are assigned values using the equality operator (=). This allows programmers to easily manage and manipulate data while coding in Python, providing greater flexibility and convenience in working with variables.

This code, executed in a cell, first assigns the variable a_number the value 20 and then displays this value using the print() function. This approach allows you to effectively work with variables and display their contents on the screen, which is the basis for further data manipulation in programming.

Use meaningful and descriptive variable names. This practice will save you a significant amount of time - days and weeks, if not months. Use lowercase Latin letters, underscores, and numbers to form names. Proper variable naming improves the readability of code and makes it easier to maintain.

Variables can also store string values. Strings are sequences of characters that can be used to store textual information. Using strings in variables plays an important role in programming, as it allows you to manipulate text, perform comparisons, and format output. Programming with string variables allows you to create more complex and functional applications, providing flexibility and ease of working with text data.

You can use variables directly within the parentheses of the print function, separating them with commas. This allows you to output multiple values at once, which simplifies the process of displaying data. This approach improves the readability of the code and makes it easier to understand. Using the print function with multiple variables is widely used in Python programming, allowing you to quickly and efficiently display the desired information on the screen.

Python supports the use of both single and double quotes to denote strings. It is important to follow the rule: single quotes must be closed with single quotes, and double quotes with double quotes. This ensures correct syntax and prevents errors in the code. Following this rule helps keep your Python projects organized and readable.

A more modern and efficient way to output data in Python involves using formatted strings, known as f-strings. This method allows you to easily and clearly embed variables and expressions within a string, simplifying the formatting process. F-strings support expressions within curly braces, making code more readable and compact. Using f-strings significantly simplifies working with text and improves code performance, which is especially important for developers looking to optimize their projects.

Python uses the f prefix before the quotation marks to create formatted strings. This allows the string to be interpreted as formatted, allowing you to embed variables and expressions within the string using curly braces. This approach simplifies working with text and improves code readability.

Copying and pasting code from the text of this article into a cell may result in Python errors. This is because quotes and other special characters may differ depending on your browser's font settings. To avoid such issues, it is recommended to retype the code manually.

F-strings are a convenient way to format strings in Python, allowing you to easily change the output without constantly calling the print() function. Instead, you can simply update the value of a variable. This makes the code more readable and simplifies the debugging process, since changes can be made directly to the string without disturbing the overall structure of the code. Using F-strings contributes to more efficient code writing and improves its maintainability.

Let's change the value of the b_string variable to achieve the desired result. This change will allow us to adapt the code to specific requirements and improve its functionality. The new approach to handling the b_string variable will provide more flexible data management in the project. It is important to keep in mind that the correct value of the variable plays a key role in the correct operation of the entire system.

A list in Python is a data structure that allows you to store arbitrary elements separated by commas. Each list item is enclosed in square brackets, making it easily accessible and convenient to work with. Lists in Python can contain items of different types, including numbers, strings, and other objects, making them a versatile tool for organizing and managing data.

Notice that the last item in freddy_list is another list containing the items [3, 4, "Lock the door to the apartment"]. Lists can contain both numeric and string values, as well as other nested lists. This demonstrates the flexibility of the data structure and the ability to use different types of items in a single list.

Let's create a list income_list, which will include both income and expenses. This list will allow you to effectively track financial flows, which is an important aspect of personal finance management. A properly organized list will help you control your budget, analyze financial results, and make informed decisions.

In the example, the phrase "will print a list" serves as an inline comment. In the Python programming language, such comments are used to clarify code. They are separated from the main body of the program by two spaces, a pound sign (#), and a single space. Using appropriate comments improves code readability and helps other developers (or you in the future) quickly understand the program's logic. Comments play an important role in the development process, as they make code easier to maintain and modify.

The income_list variable stores an array of numbers representing incomes.

It can be used to perform various actions depending on your needs. You can use it to create new objects, edit existing ones, or analyze data. It can also be integrated with other systems to enhance its functionality. If you are interested in using it in a specific area, you can consider additional capabilities such as process automation, work optimization, or functionality expansion. It is important to keep in mind that proper use can significantly improve the efficiency and effectiveness of your work.

Run each command in a separate cell to avoid problems with outputting results—this is an important feature of working in notebooks. In addition, we can calculate the average by dividing the sum of the list elements by their number.

Loops, Indentation, and Slicing in Python

Loops in programming contribute to a more compact organization of code, and proper indentation improves the structure and increases readability. Slicing plays a vital role in isolating specific ranges of values, which is one of the most common operations when working with data. Using loops and slices allows you to optimize your code, making it more efficient and easier to read.

Python uses loops to perform repetitive operations. This not only makes the code more concise but also adheres to good programming practices. For example, to print the phrase "I'm studying Data Science" five times, you could use the following code:

To optimize text for SEO, it's important to use keywords and phrases that are relevant to the topic. Here is the refactored text:

Loops play a key role in programming, allowing you to efficiently perform repetitive tasks. Without loops, code becomes cumbersome and less readable. Loops allow you to reduce the number of lines of code and improve its performance. Most programming languages offer different types of loops, such as for, while, and do-while, each with its own characteristics and applications. Proper use of loops helps simplify program logic and increase its flexibility, which is especially important when working with large amounts of data or complex algorithms. Knowing how to effectively use loops is an essential skill for any programmer, as it directly impacts the quality and efficiency of written code.

Using a loop allows you to effectively perform repetitive operations in programming. Loops allow you to automate tasks, which significantly simplifies code and makes it more readable. For example, when processing arrays or lists, loops provide the ability to process each element without duplicating code. It is important to choose the right loop type depending on the specific task. The main types of loops include for, while, and do-while. Each has its own characteristics and applicability, making them a powerful tool for developers. Proper use of loops allows you to optimize program execution and improve their performance.

In the first line of correct code, we specify: for each value i (iterator, or "enumerator") in the range from 0 to 5, do the following, followed by a colon. Python iterates through the integers of the iterator i, from 0 to 5, repeatedly executing the command after the colon. This is the body of the loop, which prints the required phrase.

Note the indentation before the print() command in the loop example. Indentation is crucial: it lets Python know that this command belongs to the loop and isn't just a standalone command. If the Python interpreter does not see the indentation where it thinks it should be, it will return an Indentation Error.

Colab, like many modern code editors, automatically sets indents when you press Enter after colons. If you want to start a new command that is not related to the current loop, you must return to the beginning of the line using the Backspace key. This simplifies the process of writing code and helps maintain the structure of the program.

Let's start a basic loop that will allow you to perform repetitive actions in a program. Loops are an important tool in programming, as they simplify the execution of tasks that require repeated execution. The main types of loops include for, while, and do-while. Each of them has its own characteristics and areas of application. Using loops makes your code more efficient and easier to read. Proper implementation of loops allows you to avoid unnecessary code duplication and improve its performance.

In the first line, we define an iterator that will cycle through values from 0 to 5. Zero is not specified explicitly, since Python will automatically start with this value. In the second line, we print the current value of the iterator i. As a result, we see the following:

Notice where the number 5 is? It's clearly stated in parentheses!

In Python, when specifying the start and end values of a range, it's important to remember that the start value is included in the range, while the end value is excluded. This behavior is typical for functions like range(), where the first argument represents the start bound and the second represents the end bound. Understanding this rule will help you avoid mistakes when working with loops and lists, ensuring that ranges are used correctly.

The best way to explain this concept is with an illustration. A picture clearly illustrates the main ideas and improves understanding. Visual elements help simplify the perception of information and make the explanation more accessible. Using graphics in teaching and explaining complex topics is an effective method that allows you to focus attention on key aspects and improve knowledge acquisition.

At the beginning of the process, a slice is made, indicated by black dotted lines in the image, which has an index. This is followed by the element itself, represented by a gray square, also with the corresponding index. Since the slice occurs before the element, the element sliced at the index is not included in the range. This is important to keep in mind when working with indices and slices to avoid confusion and ensure proper data display.

The element with index 5 is the sixth element in the sequence. In programming and data science, indexing starts at zero, which is important to keep in mind when working with arrays and lists. A proper understanding of this principle helps avoid errors and facilitates the development process.

A similar rule applies to both lists and strings.

Python has a construct called a slice, which is used to work with strings, lists, and other data types. The main difference between a slice and the range function is that the slice boundaries are specified using a colon and enclosed in square brackets. Slicing allows you to extract subsets of data, making it a powerful tool for manipulating sequences.

What's Next

This is just the beginning. In future articles, we'll demonstrate how to put this knowledge into practice. For those who can't wait, we recommend checking out our Python and Data Science intensive courses on YouTube. Building an AI-powered chatbot, predicting the dollar exchange rate, or recommending cities for tourists—it's all possible!

The "Data Scientist Careers" course includes several modules dedicated to the Python programming language. During the course, you'll gain an in-depth understanding of Python and its capabilities for data analysis. The course covers key aspects of working with data, allowing you to effectively use Python to solve Data Science problems.

Reading is an important part of our daily experience. It enriches our minds, develops our thinking, and expands our horizons. In today's world, access to information has become easier than ever, and it's important to use this opportunity for self-improvement. Regularly reading books, articles, and other materials not only helps improve your vocabulary but also broaden your knowledge in various fields. Scientific research confirms that reading improves concentration and memory. Therefore, make time for reading every day, choosing a variety of genres and topics. It's not only useful, but also enjoyable.

"I was surprised when I wrote the code, and it worked": how a beginner Python developer lives
10 myths about Python
Lists in Python: 11 questions that can be asked at an interview

Profession Python developer

Find out more

Introduction to Python for Data Science

Contents:

Numbers, strings, lists

Loops, Indentation, and Slicing in Python

What's Next