Controlling Symbols and Combining Submodules

Intermediate

This tutorial is from open-source community. Access the source code

Introduction

In this lab, you will learn important concepts related to Python package organization. First, you'll learn how to control exported symbols using __all__ in Python modules. This skill is crucial for managing what gets exposed from your modules.

Secondly, you'll understand how to combine submodules for simpler imports and master the technique of module splitting for better code organization. These practices will enhance the readability and maintainability of your Python code.

This is a Guided Lab, which provides step-by-step instructions to help you learn and practice. Follow the instructions carefully to complete each step and gain hands-on experience. Historical data shows that this is a intermediate level lab with a 62% completion rate. It has received a 100% positive review rate from learners.

Understanding Package Import Complexity

When you start working with Python packages, you'll quickly realize that importing modules can get quite complicated and wordy. This complexity can make your code harder to read and write. In this lab, we'll take a close look at this issue and learn how to simplify the import process.

Current Import Structure

First, let's open the terminal. The terminal is a powerful tool that allows you to interact with your computer's operating system. Once the terminal is open, we need to navigate to the project directory. The project directory is where all the files related to our Python project are stored. To do this, we'll use the cd command, which stands for "change directory".

cd ~/project

Now that we're in the project directory, let's examine the current structure of the structly package. A package in Python is a way to organize related modules. We can use the ls -la command to list all the files and directories within the structly package, including hidden files.

ls -la structly

You'll notice that there are several Python modules inside the structly package. These modules contain functions and classes that we can use in our code. However, if we want to use the functionality from these modules, we currently need to use long import statements. For example:

from structly.structure import Structure
from structly.reader import read_csv_as_instances
from structly.tableformat import create_formatter, print_table

These long import paths can be a hassle to write, especially if you need to use them multiple times in your code. They also make your code less readable, which can be a problem when you're trying to understand or debug your code. In this lab, we'll learn how to organize the package in a way that makes these imports simpler.

Let's start by looking at the content of the package's __init__.py file. The __init__.py file is a special file in Python packages. It's executed when the package is imported, and it can be used to initialize the package and set up any necessary imports.

cat structly/__init__.py

You'll likely find that the __init__.py file is either empty or contains very little code. In the next steps, we'll modify this file to simplify our import statements.

The Goal

By the end of this lab, our goal is to be able to use much simpler import statements. Instead of the long import paths we saw earlier, we'll be able to use statements like:

from structly import Structure, read_csv_as_instances, create_formatter, print_table

Or even:

from structly import *

Using these simpler import statements will make our code cleaner and easier to work with. It will also save us time and effort when writing and maintaining our code.

Controlling Exported Symbols with __all__

In Python, when you use the from module import * statement, you might want to control which symbols (functions, classes, variables) are imported from a module. This is where the __all__ variable comes in handy. The from module import * statement is a way to import all the symbols from a module into the current namespace. However, sometimes you don't want to import every single symbol, especially if there are many or if some are meant to be internal to the module. The __all__ variable allows you to specify exactly which symbols should be imported when using this statement.

What is __all__?

The __all__ variable is a list of strings. Each string in this list represents a symbol (function, class, or variable) that a module exports when someone uses the from module import * statement. If the __all__ variable is not defined in a module, the import * statement will import all symbols that don't begin with an underscore. Symbols starting with an underscore are typically considered private or internal to the module and are not meant to be imported directly.

Modifying Each Submodule

Now, let's add the __all__ variable to each submodule in the structly package. This will help us control which symbols are exported from each submodule when someone uses the from module import * statement.

  1. First, let's modify structure.py:
touch ~/project/structly/structure.py

This command creates a new file named structure.py in the structly directory of your project. After creating the file, we need to add the __all__ variable. Add this line near the top of the file, right after the import statements:

__all__ = ['Structure']

This line tells Python that when someone uses from structure import *, only the Structure symbol will be imported. Save the file and exit the editor.

  1. Next, let's modify reader.py:
touch ~/project/structly/reader.py

This command creates a new file named reader.py in the structly directory. Now, look through the file to find all the functions that start with read_csv_as_. These functions are the ones we want to export. Then, add an __all__ list with all these function names. It should look something like this:

__all__ = ['read_csv_as_instances', 'read_csv_as_dicts', 'read_csv_as_columns']

Note that the actual function names may vary depending on what you find in the file. Make sure to include all the read_csv_as_* functions you find. Save the file and exit the editor.

  1. Now, let's modify tableformat.py:
touch ~/project/structly/tableformat.py

This command creates a new file named tableformat.py in the structly directory. Add this line near the top of the file:

__all__ = ['create_formatter', 'print_table']

This line specifies that when someone uses from tableformat import *, only the create_formatter and print_table symbols will be imported. Save the file and exit the editor.

Unified Imports in __init__.py

Now that each module defines what it exports, we can update the __init__.py file to import all of these symbols. The __init__.py file is a special file in Python packages. It is executed when the package is imported, and it can be used to initialize the package and import symbols from submodules.

touch ~/project/structly/__init__.py

This command creates a new __init__.py file in the structly directory. Change the content of the file to:

## structly/__init__.py

from .structure import *
from .reader import *
from .tableformat import *

These lines import all the exported symbols from the structure, reader, and tableformat submodules. The dot (.) before the module names indicates that these are relative imports, meaning they are imports from within the same package. Save the file and exit the editor.

Testing Our Changes

Let's create a simple test file to verify that our changes work. This test file will try to import the symbols we specified in the __all__ variables and print a success message if the imports are successful.

touch ~/project/test_structly.py

This command creates a new file named test_structly.py in the project directory. Add this content to the file:

## A simple test to verify our imports work correctly

from structly import Structure
from structly import read_csv_as_instances
from structly import create_formatter, print_table

print("Successfully imported all required symbols!")

These lines try to import the Structure class, the read_csv_as_instances function, and the create_formatter and print_table functions from the structly package. If the imports are successful, the program will print the message "Successfully imported all required symbols!". Save the file and exit the editor. Now let's run this test:

cd ~/project
python test_structly.py

The cd ~/project command changes the current working directory to the project directory. The python test_structly.py command runs the test_structly.py script. If everything is working correctly, you should see the message "Successfully imported all required symbols!" printed on the screen.

Exporting Everything from the Package

In Python, package organization is crucial for managing code effectively. Now, we're going to take our package organization a step further. We'll define which symbols should be exported at the package level. Exporting symbols means making certain functions, classes, or variables available to other parts of your code or to other developers who might use your package.

Adding __all__ to the Package

When you're working with Python packages, you might want to control which symbols are accessible when someone uses the from structly import * statement. This is where the __all__ list comes in handy. By adding an __all__ list to the package's __init__.py file, you can precisely control which symbols are available when someone uses the from structly import * statement.

First, let's create or update the __init__.py file. We'll use the touch command to create the file if it doesn't exist.

touch ~/project/structly/__init__.py

Now, open the __init__.py file and add an __all__ list. This list should include all the symbols we want to export. The symbols are grouped based on where they come from, such as the structure, reader, and tableformat modules.

## structly/__init__.py

from .structure import *
from .reader import *
from .tableformat import *

## Define what symbols are exported when using "from structly import *"
__all__ = ['Structure',  ## from structure
           'read_csv_as_instances', 'read_csv_as_dicts', 'read_csv_as_columns',  ## from reader
           'create_formatter', 'print_table']  ## from tableformat

After adding the code, save the file and exit the editor.

Understanding import *

The from module import * pattern is generally not recommended in most Python code. There are several reasons for this:

  1. It can pollute your namespace with unexpected symbols. This means that you might end up with variables or functions in your current namespace that you didn't expect, which can lead to naming conflicts.
  2. It makes it unclear where particular symbols come from. When you use import *, it's hard to tell which module a symbol is coming from, which can make your code harder to understand and maintain.
  3. It can lead to shadowing issues. Shadowing occurs when a local variable or function has the same name as a variable or function from another module, which can cause unexpected behavior.

However, there are specific cases where using import * is appropriate:

  • For packages designed to be used as a cohesive whole. If a package is meant to be used as a single unit, then using import * can make it easier to access all the necessary symbols.
  • When a package defines a clear interface via __all__. By using the __all__ list, you can control which symbols are exported, making it safer to use import *.
  • For interactive use, like in a Python REPL (Read-Eval-Print Loop). In an interactive environment, it can be convenient to import all symbols at once.

Testing with Import *

To verify that we can import all the symbols at once, let's create another test file. We'll use the touch command to create the file.

touch ~/project/test_import_all.py

Now, open the test_import_all.py file and add the following content. This code imports all the symbols from the structly package and then tests if some of the important symbols are available.

## Test importing everything at once

from structly import *

## Try using the imported symbols
print(f"Structure symbol is available: {Structure is not None}")
print(f"read_csv_as_instances symbol is available: {read_csv_as_instances is not None}")
print(f"create_formatter symbol is available: {create_formatter is not None}")
print(f"print_table symbol is available: {print_table is not None}")

print("All symbols successfully imported!")

Save the file and exit the editor. Now, let's run the test. First, navigate to the project directory using the cd command, and then run the Python script.

cd ~/project
python test_import_all.py

If everything is set up correctly, you should see confirmation that all symbols were successfully imported.

Module Splitting for Better Code Organization

As your Python projects grow, you might find that a single module file becomes quite large and contains multiple related but distinct components. When this happens, it's a good practice to split the module into a package with submodules. This approach makes your code more organized, easier to maintain, and more scalable.

Understanding the Current Structure

The tableformat.py module is a good example of a large module. It contains several formatter classes, each responsible for formatting data in a different way:

  • TableFormatter (base class): This is the base class for all the other formatter classes. It defines the basic structure and methods that the other classes will inherit and implement.
  • TextTableFormatter: This class formats data in plain text.
  • CSVTableFormatter: This class formats data in CSV (Comma-Separated Values) format.
  • HTMLTableFormatter: This class formats data in HTML (Hypertext Markup Language) format.

We'll reorganize this module into a package structure with separate files for each formatter type. This will make the code more modular and easier to manage.

Step 1: Clean Up Cache Files

Before we start reorganizing the code, it's a good idea to clean up any Python cache files. These files are created by Python to speed up the execution of your code, but they can sometimes cause issues when you're making changes to your code.

cd ~/project/structly
rm -rf __pycache__

In the above commands, cd ~/project/structly changes the current directory to the structly directory in your project. rm -rf __pycache__ deletes the __pycache__ directory and all its contents. The -r option stands for recursive, which means it will delete all the files and subdirectories inside the __pycache__ directory. The -f option stands for force, which means it will delete the files without asking for confirmation.

Step 2: Create the New Package Structure

Now, let's create a new directory structure for our package. We'll create a directory named tableformat and a subdirectory named formats inside it.

mkdir -p tableformat/formats

The mkdir command is used to create directories. The -p option stands for parents, which means it will create all the necessary parent directories if they don't exist. So, if the tableformat directory doesn't exist, it will be created first, and then the formats directory will be created inside it.

Step 3: Move and Rename the Original File

Next, we'll move the original tableformat.py file into the new structure and rename it to formatter.py.

mv tableformat.py tableformat/formatter.py

The mv command is used to move or rename files. In this case, we're moving the tableformat.py file to the tableformat directory and renaming it to formatter.py.

Step 4: Split the Code into Separate Files

Now we need to create files for each formatter and move the relevant code into them.

1. Create the base formatter file

touch tableformat/formatter.py

The touch command is used to create an empty file. In this case, we're creating a file named formatter.py in the tableformat directory.

We'll keep the TableFormatter base class and any general utility functions like print_table and create_formatter in this file. The file should look something like:

## Base TableFormatter class and utility functions

__all__ = ['TableFormatter', 'print_table', 'create_formatter']

class TableFormatter:
    def headings(self, headers):
        '''
        Emit table headings.
        '''
        raise NotImplementedError()

    def row(self, rowdata):
        '''
        Emit a single row of table data.
        '''
        raise NotImplementedError()

def print_table(objects, columns, formatter):
    '''
    Make a nicely formatted table from a list of objects and attribute names.
    '''
    formatter.headings(columns)
    for obj in objects:
        rowdata = [getattr(obj, name) for name in columns]
        formatter.row(rowdata)

def create_formatter(fmt):
    '''
    Create an appropriate formatter given an output format name.
    '''
    if fmt == 'text':
        from .formats.text import TextTableFormatter
        return TextTableFormatter()
    elif fmt == 'csv':
        from .formats.csv import CSVTableFormatter
        return CSVTableFormatter()
    elif fmt == 'html':
        from .formats.html import HTMLTableFormatter
        return HTMLTableFormatter()
    else:
        raise ValueError(f'Unknown format {fmt}')

The __all__ variable is used to specify which symbols should be imported when you use from module import *. In this case, we're specifying that only the TableFormatter, print_table, and create_formatter symbols should be imported.

The TableFormatter class is the base class for all the other formatter classes. It defines two methods, headings and row, which are meant to be implemented by the subclasses.

The print_table function is a utility function that takes a list of objects, a list of column names, and a formatter object, and prints the data in a formatted table.

The create_formatter function is a factory function that takes a format name as an argument and returns an appropriate formatter object.

Save and exit the file after making these changes.

2. Create the text formatter

touch tableformat/formats/text.py

We'll add only the TextTableFormatter class to this file.

## Text formatter implementation

__all__ = ['TextTableFormatter']

from ..formatter import TableFormatter

class TextTableFormatter(TableFormatter):
    '''
    Emit a table in plain-text format
    '''
    def headings(self, headers):
        print(' '.join('%10s' % h for h in headers))
        print(('-'*10 + ' ')*len(headers))

    def row(self, rowdata):
        print(' '.join('%10s' % d for d in rowdata))

The __all__ variable specifies that only the TextTableFormatter symbol should be imported when you use from module import *.

The from ..formatter import TableFormatter statement imports the TableFormatter class from the formatter.py file in the parent directory.

The TextTableFormatter class inherits from the TableFormatter class and implements the headings and row methods to format the data in plain text.

Save and exit the file after making these changes.

3. Create the CSV formatter

touch tableformat/formats/csv.py

We'll add only the CSVTableFormatter class to this file.

## CSV formatter implementation

__all__ = ['CSVTableFormatter']

from ..formatter import TableFormatter

class CSVTableFormatter(TableFormatter):
    '''
    Output data in CSV format.
    '''
    def headings(self, headers):
        print(','.join(headers))

    def row(self, rowdata):
        print(','.join(str(d) for d in rowdata))

Similar to the previous steps, we're specifying the __all__ variable, importing the TableFormatter class, and implementing the headings and row methods to format the data in CSV format.

Save and exit the file after making these changes.

4. Create the HTML formatter

touch tableformat/formats/html.py

We'll add only the HTMLTableFormatter class to this file.

## HTML formatter implementation

__all__ = ['HTMLTableFormatter']

from ..formatter import TableFormatter

class HTMLTableFormatter(TableFormatter):
    '''
    Output data in HTML format.
    '''
    def headings(self, headers):
        print('<tr>', end='')
        for h in headers:
            print(f'<th>{h}</th>', end='')
        print('</tr>')

    def row(self, rowdata):
        print('<tr>', end='')
        for d in rowdata:
            print(f'<td>{d}</td>', end='')
        print('</tr>')

Again, we're specifying the __all__ variable, importing the TableFormatter class, and implementing the headings and row methods to format the data in HTML format.

Save and exit the file after making these changes.

Step 5: Create Package Initialization Files

In Python, __init__.py files are used to mark directories as Python packages. We need to create __init__.py files in both the tableformat and formats directories.

1. Create one for the tableformat package

touch tableformat/__init__.py

Add this content to the file:

## Re-export the original symbols from tableformat.py
from .formatter import *

This statement imports all the symbols from the formatter.py file and makes them available when you import the tableformat package.

Save and exit the file after making these changes.

2. Create one for the formats package

touch tableformat/formats/__init__.py

You can leave this file empty or add a simple docstring:

'''
Format implementations for different output formats.
'''

The docstring provides a brief description of what the formats package does.

Save and exit the file after making these changes.

Step 6: Test the New Structure

Let's create a simple test to verify that our changes work correctly.

cd ~/project
touch test_tableformat.py

Add this content to the test_tableformat.py file:

## Test the tableformat package restructuring

from structly import *

## Create formatters of each type
text_fmt = create_formatter('text')
csv_fmt = create_formatter('csv')
html_fmt = create_formatter('html')

## Define some test data
class TestData:
    def __init__(self, name, value):
        self.name = name
        self.value = value

## Create a list of test objects
data = [
    TestData('apple', 10),
    TestData('banana', 20),
    TestData('cherry', 30)
]

## Test text formatter
print("\nText Format:")
print_table(data, ['name', 'value'], text_fmt)

## Test CSV formatter
print("\nCSV Format:")
print_table(data, ['name', 'value'], csv_fmt)

## Test HTML formatter
print("\nHTML Format:")
print_table(data, ['name', 'value'], html_fmt)

This test code imports the necessary functions and classes from the structly package, creates formatters of each type, defines some test data, and then tests each formatter by printing the data in the corresponding format.

Save and exit the file after making these changes. Now run the test:

python test_tableformat.py

You should see the same data formatted in three different ways (text, CSV, and HTML). If you see the expected output, it means that your code reorganization was successful.

Summary

In this lab, you have learned several crucial Python package organization techniques. First, you mastered the use of the __all__ variable to explicitly define the symbols exported by a module. Second, you created a more user - friendly package interface by re - exporting submodule symbols from the top - level package.

These techniques are essential for crafting clean, maintainable, and user - friendly Python packages. They enable you to control the user's view, simplify the import process, and logically organize code as your project expands.